Scikit Package#

from diveplane.scikit import ...

The Python API for the Diveplane Scikit Client.

class diveplane.scikit.DiveplaneClassifier#

Bases: DiveplaneEstimator

A DiveplaneEstimator for classification analysis.

Parameters:

features (dict of str: dict, default None) –

The features that will predict the targets(s). Will be generated automatically if not specified.

Example:

{
    "feature_name": {
        "parameter1" : "value1",
        "parameter2" : "value2"
    },
    "length": { "type" : "continuous", "decimal_places": 1 },
    "width": { "type" : "continuous", "significant_digits": 4 },
    "degrees": { "type" : "continuous", "cycle_length": 360 },
    "class": { "type" : "nominal" }
}

targets (dict of str: dict, default None) –

The target(s) to be predicted. Will be generated automatically if not specified.

Example:

{
    "target_name": {
        "parameter1" : "value1",
        "parameter2" : "value2"
    },
    "klass": { "type" : "nominal" }
}

client (AbstractDiveplaneClient, default None) – A subclass of AbstractDiveplaneClient used to interface with Diveplane.
verbose (boolean, default False) – A flag for verbose output.
debug (boolean, default False) – A flag for debug output.
ttl (int, in milliseconds) – The maximum time a server should maintain a connection open for a trainee when processing requests.
client_params (dict, default None) – The parameters with which to instantiate the client.
trainee_params (dict, default None) – The parameters with which to instantiate the client. Intended for use by DiveplaneEstimator.get_params.

__init__(client=None, features=None, targets=None, verbose=False, debug=False, ttl=43200000, client_params=None, trainee_params=None)#

Initialize DiveplaneClassifier.

Parameters:

client (AbstractDiveplaneClient | None) –
features (Dict | None) –
targets (Dict | None) –
verbose (bool) –
debug (bool) –
ttl (int) –
client_params (Dict | None) –
trainee_params (Dict | None) –

fit(X, y, analyze=True)#

Fit a model with Diveplane.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data
y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary
analyze (bool, default=True) –
(Optional) If trainee should be analyzed.
- a user may plan to call analyze themselves after fit() to specify parameters

Returns:

self

Return type:

DiveplaneEstimator

load(trainee_id)#

Load the trainee and re-populates the classes_ variable.

This is based on the available classes in the loaded trainee.

Parameters:: trainee_id (str) – The id of the trainee.

partial_fit(X, y)#

Adds data to an existing Diveplane model.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data
y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary

predict_proba(X)#

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function and normalize these values across all the classes.

NOTE: Only works with single target models at this time.

Parameters:: X (numpy.ndarray, shape (n_samples, n_features)) – Data
Returns:: The probabilities of the classes for the given prediction.
Return type:: numpy.ndarray, shape (n_samples, n_classes)

set_fit_request(*, analyze='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

analyze (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for analyze parameter in fit.
self (DiveplaneClassifier) –

Returns:

self – The updated object.

Return type:

object

class diveplane.scikit.DiveplaneEstimator#

Bases: BaseEstimator

This class is intended for use within scikit-learn only.

This Estimator follows scikit-learn’s conventions. For access to a wider range of Diveplane capabilities, please use the client specified in the diveplane.client module.

Parameters:

features (dict of str: dict, default None) –

The features that will predict the targets(s). Will be generated automatically if not specified.

Example:

{
    "feature_name": {
        "parameter1" : "value1",
        "parameter2" : "value2"
    },
    "length": { "type" : "continuous", "decimal_places": 1 },
    "width": { "type" : "continuous", "significant_digits": 4 },
    "degrees": { "type" : "continuous", "cycle_length": 360 },
    "class": { "type" : "nominal" }
}

targets (dict of str: dict, default None) –

The target(s) to be predicted. Will be generated automatically if not specified.

Example:

{
    "`target_name`": {
        "parameter1" : "value1",
        "parameter2" : "value2"
    },
    "klass": { "type" : "nominal" }
}

client (AbstractDiveplaneClient, default None) – A subclass of AbstractDiveplaneClient used to interface with Diveplane.
method (str) – One of ‘classification’ or ‘regression’.
verbose (boolean, default False) – A flag for verbose output.
debug (boolean, default False) – A flag for debug output.
ttl (int, in milliseconds) – The maximum time a server should maintain a connection open for a trainee when processing requests.
client_params (dict, default None) – The parameters with which to instantiate the client.
trainee_params (dict, default None) – The parameters with which to instantiate the trainee.

Examples

>>> import pandas as pd
>>> from diveplane.scikit import DiveplaneClassifier
>>> from sklearn.model_selection import train_test_split
>>> # Read in the data.
>>> df = pd.read_csv('iris.csv')
>>>
>>> # Split the dataset into the feature (X) and targets (y) and convert
>>> # the string targets into integer hashes.
>>> X = df.drop('class', axis=1).values.astype(float)
>>> y = df['class'].apply(hash).values.astype(int)
>>>
>>> # Split the dataset into an 80/20 train/test set.
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=1)
>>>
>>> # Create a classifier.
>>> dp = DiveplaneClassifier()
>>>
>>> # Fit the training data.
>>> dp.fit(X_train, y_train)
>>>
>>> # Test against the reserved test data.
>>> score = dp.score(X_test, y_test)
>>>
>>> # Print the resulting accuracy.
>>> print(score)
0.9666666666666667

__init__(client=None, features=None, targets=None, method=None, verbose=False, debug=False, ttl=43200000, trainee_params=None, client_params=None)#

Initialize DiveplaneEstimator.

Parameters:

client (AbstractDiveplaneClient | None) –
features (Dict | None) –
targets (Dict) –
method (str | None) –
verbose (bool) –
debug (bool) –
ttl (int) –
trainee_params (Dict | None) –
client_params (Dict | None) –

analyze(seed=None, **kwargs)#

Analyze a trainee.

Parameters:

seed (int, optional) – A random seed.
**kwargs – Refer to docstring in diveplane.client.analyze method for complete reference of all parameters

delete()#: Delete this trainee from the diveplane cloud service.

describe_prediction(X, details=None)#

Describe a prediction in detail.

Parameters:

X (numpy.ndarray) – Feature values.
details (dict, default None) –
(Optional) If details are specified, the response will contain the requested explanation data along with the reaction. Below are the valid keys and data types for the different audit details. Omitted keys, values set to None, or False values for Booleans will not be included in the audit data returned.
- boundary_casesbool, optional
  If True, outputs an automatically determined (when ‘num_boundary_cases’ is not specified) relevant number of boundary cases. Uses both context and action features of the reacted case to determine the counterfactual boundary based on action features, which maximize the dissimilarity of action features while maximizing the similarity of context features. If action features aren’t specified, uses familiarity conviction to determine the boundary instead.
- boundary_cases_familiarity_convictionsbool, optional
  If True, outputs familiarity conviction of addition for each of the boundary cases.
- case_contributionsbool, optional
  If True, outputs each influential case’s differences between the predicted action feature value and the predicted action feature value if each individual case were not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- case_feature_residualsbool, optional
  If True, outputs feature residuals for all (context and action) features for just the specified case. Uses leave-one-out for each feature, while using the others to predict the left out feature with their corresponding values from this case. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- case_mdabool, optional
  If True, outputs each influential case’s mean decrease in accuracy of predicting the action feature in the local model area, as if each individual case were included versus not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- categorical_action_probabilitiesbool, optional
  If True, outputs probabilities for each class for the action. Applicable only to categorical action features.
- distance_contributionbool, optional
  If True, outputs the distance contribution (expected total surprisal contribution) for the reacted case. Uses both context and action feature values.
- distance_ratiobool, optional
  If True, outputs the ratio of distance (relative surprisal) between this reacted case and its nearest case to the minimum distance (relative surprisal) in between the closest two cases in the local area. All distances are computed using only the specified context features.
- feature_contributionsbool, optional
  If True, outputs each context feature’s differences between the predicted action feature value and the predicted action feature value if each context were not in the model for all context features in the local model area. Relies ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- feature_mdabool, optional
  If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature given the context. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- feature_mda_ex_postbool, optional
  If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature as an explanation given that the specified prediction was already made as specified by the action value. Uses both context and action features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- feature_residualsbool, optional
  If True, outputs feature residuals for all (context and action) features locally around the prediction. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- global_case_feature_residual_convictionsbool, optional
  If True, outputs this case’s feature residual convictions for the global model. Computed as: global model feature residual divided by case feature residual. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- hypothetical_valuesdict, optional
  A dictionary of feature name to feature value. If specified, shows how a prediction could change in a what-if scenario where the influential cases’ context feature values are replaced with the specified values. Iterates over all influential cases, predicting the action features each one using the updated hypothetical values. Outputs the predicted arithmetic over the influential cases for each action feature.
- influential_casesbool, optional
  If True, outputs the most influential cases and their influence weights based on the surprisal of each case relative to the context being predicted among the cases. Uses only the context features of the reacted case.
- influential_cases_familiarity_convictionsbool, optional
  If True, outputs familiarity conviction of addition for each of the influential cases.
- influential_cases_raw_weightsbool, optional
  If True, outputs the surprisal for each of the influential cases.
- local_case_feature_residual_convictionsbool, optional
  If True, outputs this case’s feature residual convictions for the region around the prediction. Uses only the context features of the reacted case to determine that region. Computed as: region feature residual divided by case feature residual. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.
- most_similar_casesbool, optional
  If True, outputs an automatically determined (when ‘num_most_similar_cases’ is not specified) relevant number of similar cases, which will first include the influential cases. Uses only the context features of the reacted case.
- num_boundary_casesint, optional
  Outputs this manually specified number of boundary cases.
- num_most_similar_casesint, optional
  Outputs this manually specified number of most similar cases, which will first include the influential cases.
- num_most_similar_case_indices: int, optional
  Outputs this specified number of most similar case indices when ‘distance_ratio’ is also set to True.
- observational_errorsbool, optional
  If True, outputs observational errors for all features as defined in feature attributes.
- outlying_feature_valuesbool, optional
  If True, outputs the reacted case’s context feature values that are outside the min or max of the corresponding feature values of all the cases in the local model area. Uses only the context features of the reacted case to determine that area.
- prediction_residual_conviction: bool, optional
  If True, outputs residual conviction for the reacted case’s action features by computing the prediction residual for the action features in the local model area. Uses both context and action features to determine that area. This is defined as the expected (global) model residual divided by computed local residual.
- prediction_similarity_convictionbool, optional
  If True, outputs similarity conviction for the reacted case. Uses both context and action feature values as the case values for all computations. This is defined as expected (global) distance contribution divided by reacted case distance contribution.
- robust_computationbool, optional
  Default is False, uses leave-one-out for features (or cases, as needed) for all relevant computations. If True, uses uniform sampling from the power set of all combinations of features (or cases, as needed) instead.
```
>>> details = {'num_most_similar_cases': 5,
...            'feature_residuals': True}
```

Returns:

Format of:

{
    'action': list of dicts of action_features -> action_values,
    'explanation': dict with requested audit data
}

Return type:

dict

feature_add(feature=None, value=None)#

Add a feature to a trainee.

Parameters:

feature (str, optional) – The name of the feature. Will be generated automatically if not specified.
value (int or float or str, optional) – The value to populate the feature with.

feature_remove(feature=None)#

Remove a feature from a trainee.

Parameters:: feature (str, default None) – Optional. The name of the feature to remove. Will quietly do nothing if the feature was not found.

fit(X, y, analyze=True)#

Fit a model with Diveplane.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data
y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary
analyze (bool, default=True) –
A flag to not analyze the trainee by default
- A user may plan to call analyze themselves after fit() to specify parameters

Returns:

self

Return type:

DiveplaneEstimator

get_case_conviction(X, features=None)#

Return case conviction.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data
features (str or list of str) – A list of feature names to calculate convictions.

Returns:

The conviction of the cases. Ex: [1.0, 3.2, 0.4]

Return type:

list

get_feature_conviction(features=None)#

Gets the conviction of the features in a model.

Parameters:: features (str or list of str) – Features to return conviction values for.
Returns:: A map of feature convictions and contributions.
Return type:: dict

get_params(deep=True)#

Get parameters for this estimator.

This code is taken from the source of sklearn.base.BaseEstimator and lightly modified to avoid calling the get_params method of self.trainee.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: mapping of string to any

load(trainee_id)#

Load a model from the server.

Parameters:: trainee_id (str) – Id of the trainee. (can be obtained from this class).

partial_fit(X, y)#

Add data to an existing Diveplane model.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Data
y (numpy.ndarray, shape (n_samples,)) – Target. Will be cast to X’s dtype if necessary

partial_unfit(precision, num_cases, criteria=None)#

Remove a training case from a trainee.

The training case will be completely purged from the model and the model will behave as if it had never been trained with this training case.

Parameters:

precision (str) – The precision to use when removing the case. Options are ‘exact’ or ‘similar’.
num_cases (int) – The number of cases to remove; minimum 1 case must be removed.
criteria (dict, default None) – The condition map to select the cases to remove that meet all the provided conditions. Keys - features, values - one of | null (must have the feature) | a value (must match exactly) | an array of two values (a range, feature values must be between)

predict(X)#

Make predictions using Diveplane.

Parameters:: X (numpy.ndarray, shape (n_samples, n_features)) – Data
Returns:: The predicted values based on the feature values provided.
Return type:: numpy.ndarray, shape (n_samples,)

react_into_features(features=None, *, distance_contribution=False, familiarity_conviction_addition=False, familiarity_conviction_removal=False, p_value_of_addition=False, p_value_of_removal=False, use_case_weights=False, weight_feature=None)#

Calculate conviction and other data and stores them into features.

Parameters:

features (list of str) – A list of the feature names to use when calculating conviction.
distance_contribution (bool or str, default False) – The name of the feature to store distance contribution. If set to True the values will be stored to the feature ‘distance_contribution’.
familiarity_conviction_addition (bool or str, default False) – The name of the feature to store conviction of addition values. If set to True the values will be stored to the feature ‘familiarity_conviction_addition’.
familiarity_conviction_removal (bool or str, default False) – The name of the feature to store conviction of removal values. If set to True the values will be stored to the feature ‘familiarity_conviction_removal’.
p_value_of_addition (bool or str, default False) – The name of the feature to store p value of addition values. If set to True the values will be stored to the feature ‘p_value_of_addition’.
p_value_of_removal (bool or str, default False) – The name of the feature to store p value of removal values. If set to True the values will be stored to the feature ‘p_value_of_removal’.
use_case_weights (bool, default False) – When True, will scale influence weights by each case’s weight_feature weight.
weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

release_resources()#

Release trainee resources created by this estimator.

If this estimator’s trainee is named (self._trainee_name is not None) then we’ll make an effort to persist the trainee to disk and release it’s resources. If the data persistence policy forbids this, that call will return an error. Upon error, delete_trainee() instead.

NOTE: Errors are handled immediately because this is the instance’s: destructor. There is no further recourse at this point.

Return type:: None

save()#

Persist the trainee.

By default model resources are released after a short period of time. This method saves the model persistently to allow releasing trainee resources while keeping the model available for use later.

If this trainee has not already been named, then this method will set a randomly generated one.

Raises:

DiveplaneNotUniqueError: – If unable to set the trainee name w/up to RENAME_RETRIES retries.
Exception: – if unable to persist the trainee.

Return type:

None

score(X, y)#

Score Diveplane.

For classifiers, accuracy is calculated. For regressors, R^2 is calculated.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – Test samples.
y (numpy.ndarray, shape (n_samples) or (n_samples, n_outputs)) – True values for X.

Returns:

The mean squared error or accuracy

Return type:

float

set_fit_request(*, analyze='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

analyze (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for analyze parameter in fit.
self (DiveplaneEstimator) –

Returns:

self – The updated object.

Return type:

object

property trainee_id: str | None#: Return the trainee’s ID, if possible.

property trainee_name: str | None#: Return the trainee name (getter).

class diveplane.scikit.DiveplaneRegressor#

Bases: DiveplaneEstimator

A DiveplaneEstimator for regression analysis.

Parameters:

features (dict of str: dict, default None) –

The features that will predict the targets(s). Will be generated automatically if not specified.

Example:

{
    "feature_name": {
        "parameter1" : "value1",
        "parameter2" : "value2"
    },
    "length": { "type" : "continuous", "decimal_places": 1 },
    "width": { "type" : "continuous", "significant_digits": 4 },
    "degrees": { "type" : "continuous", "cycle_length": 360 },
    "class": { "type" : "nominal" }
}

targets (dict of str: dict, default None) –

The target(s) to be predicted. Will be generated automatically if not specified.

Example:

{
    "target_name": {
        "parameter1" : "value1",
        "parameter2" : "value2"
    },
    "klass": { "type" : "nominal" }
}

client (AbstractDiveplaneClient, default None) – A subclass of AbstractDiveplaneClient used to interface with Diveplane.
verbose (boolean, default False) – A flag for verbose output.
debug (boolean, default False) – A flag for debug output.
ttl (int, in milliseconds) – The maximum time a server should maintain a connection open for a trainee when processing requests.
client_params (dict, default None) – The parameters with which to instantiate the client.
trainee_params (dict, default None) – The parameters with which to instantiate the client. Intended for use by DiveplaneEstimator.get_params.

__init__(client=None, features=None, targets=None, verbose=False, debug=False, ttl=43200000, client_params=None, trainee_params=None)#

Initialize a DiveplaneRegressor.

Parameters:

features (Dict | None) –
targets (Dict | None) –
verbose (bool) –
debug (bool) –
ttl (int) –
client_params (Dict | None) –
trainee_params (Dict | None) –

set_fit_request(*, analyze='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters:

analyze (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for analyze parameter in fit.
self (DiveplaneRegressor) –

Returns:

self – The updated object.

Return type:

object