Reactor Package#

from diveplane import reactor

The Python API for the Diveplane Reactor Client.

class diveplane.reactor.Session#

Bases: Session

A Diveplane Session.

Parameters:
  • name (str, optional) – The name of the session.

  • metadata (dict, optional) – Any key-value pair to store custom metadata for the session.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

__init__(name=None, *, id=None, metadata=None, client=None)#

Implement the constructor.

Parameters:
  • name (str | None) –

  • id (str | None) –

  • metadata (dict | None) –

  • client (AbstractDiveplaneClient | None) –

Return type:

None

property client: AbstractDiveplaneClient#

The client instance used by the session.

Returns:

The client instance.

Return type:

AbstractDiveplaneClient

property created_date: datetime | None#

The timestamp of when the session was originally created.

Returns:

The creation timestamp.

Return type:

datetime

classmethod from_dict(session_dict)#

Create Session from dict.

Parameters:

session_dict (Dict) – The session parameters.

Returns:

The session instance.

Return type:

Session

classmethod from_openapi(session, *, client=None)#

Create Session from base class.

Parameters:
  • session (BaseSession) – The base session instance.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

Returns:

The session instance.

Return type:

Session

property id: str#

The unique identifier of the session.

Returns:

The session ID.

Return type:

str

property metadata: Dict[str, Any] | None#

The session metadata.

Warning

This returns a deep copy of the metadata. To update the metadata of the session, use the method set_metadata().

Returns:

The metadata of the session.

Return type:

dict

property modified_date: datetime | None#

The timestamp of when the session was last modified.

Returns:

The modified timestamp.

Return type:

datetime

property name: str#

The name of the session.

Returns:

The session name.

Return type:

str

set_metadata(metadata)#

Update the session metadata.

Parameters:

metadata (dict or None) – Any key-value pair to store as custom metadata for the session. Providing None will remove the current metadata.

Return type:

None

property user: AccountIdentity | None#

The user account that the session belongs to.

Returns:

The user account information.

Return type:

AccountIdentity

class diveplane.reactor.Trainee#

Bases: Trainee

A Diveplane Trainee.

A Trainee is most closely related to what would normally be called a ‘model’ in Machine Learning. It contains feature information, training cases, session data, parameters, and other metadata. A Trainee is actually a little more abstract than a model which is why we don’t use the terms interchangeably.

Parameters:
  • name (str, optional) – The name of the trainee.

  • features (dict of {str: dict}) – The feature attributes of the trainee. Where feature name is the key and a sub dictionary of feature attributes is the value.

  • default_action_features (list of str, optional) – The default action feature names of the trainee.

  • default_context_features (list of str, optional) – The default context feature names of the trainee.

  • library_type (str, optional) –

    The library type of the Trainee. Valid options include:

    • ”st”: use single-threaded library.

    • ”mt”: use multi-threaded library.

  • max_wait_time (int or float, default 30) – The number of seconds to wait for a trainee to be created and become available before aborting gracefully. Set to 0 (or None) to wait as long as the system-configured maximum for sufficient resources to become available, which is typically 20 minutes.

  • persistence (str, default "allow") – The requested persistence state of the trainee. Allowed values include “allow”, “always”, and “never”.

  • project (str or Project, optional) – The instance or id of the project to use for the trainee.

  • metadata (dict, optional) – Any key-value pair to store as custom metadata for the trainee.

  • resources (TraineeResources or dict, optional) – Customize the resources provisioned for the Trainee instance.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

  • overwrite_existing (bool, default False) – Overwrite existing trainee with the same name (if exists).

__init__(name=None, features=None, *, default_action_features=None, default_context_features=None, id=None, library_type=None, max_wait_time=None, metadata=None, persistence='allow', project=None, resources=None, client=None, overwrite_existing=False)#

Implement the constructor.

Parameters:
  • name (str | None) –

  • features (Dict[str, Dict] | None) –

  • default_action_features (Iterable[str] | None) –

  • default_context_features (Iterable[str] | None) –

  • id (str | None) –

  • library_type (str | None) –

  • max_wait_time (int | float | None) –

  • metadata (Dict[str, Any] | None) –

  • persistence (str | None) –

  • project (str | Project | None) –

  • resources (TraineeResources | Dict[str, Any] | None) –

  • client (AbstractDiveplaneClient | None) –

  • overwrite_existing (bool | None) –

Return type:

None

acquire_resources(*, max_wait_time=None)#

Acquire resources for a trainee in the Diveplane service.

Parameters:

max_wait_time (int or float, default 60) – The number of seconds to wait for trainee resources to be acquired before aborting gracefully. Set to 0 (or None) to wait as long as the system-configured maximum for sufficient resources to become available, which is typically 20 minutes.

Return type:

None

property active_session: Session#

The active session.

Returns:

The session instance.

Return type:

Session

add_feature(feature, feature_value=None, *, condition=None, condition_session=None, feature_attributes=None, overwrite=False)#

Add a feature to the model.

Parameters:
  • feature (str) – The name of the feature.

  • feature_attributes (dict, optional) – The dict of feature specific attributes for this feature. If unspecified and conditions are not specified, will assume feature type as ‘continuous’.

  • feature_value (int or float or str, optional) – The value to populate the feature with. By default, populates the new feature with None.

  • condition (dict, optional) –

    A condition map where feature values will only be added when certain criteria is met.

    If None, the feature will be added to all cases in the model and feature metadata will be updated to include it. If specified as an empty dict, the feature will still be added to all cases in the model but the feature metadata will not be updated.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    For instance to add the feature_value only when the length and width features are equal to 10:

    condition = {"length": 10, "width": 10}
    

  • condition_session (str or BaseSession, optional) – If specified, ignores the condition and operates on cases for the specified session id or BaseSession instance.

  • overwrite (bool, default False) – If True, the feature will be over-written if it exists.

Return type:

None

analyze(context_features=None, action_features=None, *, bypass_calculate_feature_residuals=None, bypass_calculate_feature_weights=None, bypass_hyperparameter_analysis=None, dt_values=None, inverse_residuals_as_weights=None, k_folds=None, k_values=None, num_analysis_samples=None, num_samples=None, analysis_sub_model_size=None, analyze_level=None, p_values=None, targeted_model=None, use_case_weights=None, use_deviations=None, weight_feature=None, **kwargs)#

Analyzes the trainee.

Parameters:
  • context_features (list of str, optional) – The context features to analyze for.

  • action_features (list of str, optional) – The action features to analyze for.

  • bypass_calculate_feature_residuals (bool, default False) – When True, bypasses calculation of feature residuals.

  • bypass_calculate_feature_weights (bool, default False) – When True, bypasses calculation of feature weights.

  • bypass_hyperparameter_analysis (bool, default False) – When True, bypasses hyperparameter analysis.

  • dt_values (list of float, optional) – The dt value hyperparameters to analyze with.

  • inverse_residuals_as_weights (bool, default is False) – When True, will compute and use inverse of residuals as feature weights.

  • k_folds (int, optional) – The number of cross validation folds to do. A value of 1 does hold-one-out instead of k-fold.

  • k_values (list of int, optional) – The k value hyperparameters to analyze with.

  • num_analysis_samples (int, optional) – Specifies the number of observations to be considered for analysis.

  • num_samples (int, optional) – Number of samples used in calculating feature residuals.

  • analysis_sub_model_size (int, optional) – Number of samples to use for analysis. The rest will be randomly held-out and not included in calculations.

  • analyze_level (int, optional) –

    If specified, will analyze for the following flows:

    1. Predictions/accuracy (hyperparameters)

    2. Data synth (cache: global residuals)

    3. Standard explanations

    4. Full analysis

  • p_values (list of float, optional) – The p value hyperparameters to analyze with.

  • targeted_model (str or None) –

    Type of hyperparameter targeting. Valid options include:

    • single_targeted: Analyze hyperparameters for the specified action_features.

    • omni_targeted: Analyze hyperparameters for each context feature as an action feature, ignores action_features parameter.

    • targetless: Analyze hyperparameters for all context features as possible action features, ignores action_features parameter.

  • use_case_weights (bool, default False) – (Optional) When True will scale influence weights by each case’s weight_feature weight.

  • use_deviations (bool, default False) – When True, uses deviations for LK metric in queries.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • kwargs – Additional experimental analyze parameters.

Return type:

None

append_to_series_store(series, contexts, *, context_features=None)#

Append the specified contexts to a series store.

For use with train series.

Parameters:
  • series (str) – The name of the series store to append to.

  • contexts (list of list of object or pandas.DataFrame) – The list of context values to append to the series.

  • context_features (list of str, optional) – The list of feature names for contexts.

Return type:

None

auto_analyze()#

Auto-analyze the trainee.

Re-use all parameters from the previous analyze call, assuming that the user has called ‘analyze’ before. If not, it will default to a robust and versatile analysis.

Return type:

None

auto_optimize()#

Auto-optimize the trainee model.

Re-uses all parameters from the previous optimize or set_auto_optimize_params call. If optimize or set_auto_optimize_params has not been previously called, auto_optimize will default to a robust and versatile optimization.

Deprecated since version 6.0.0: Use Trainee.auto_analyze() instead.

property client: AbstractDiveplaneClient | DiveplanePandasClientMixin#

The client instance used by the trainee.

Returns:

The client instance.

Return type:

AbstractDiveplaneClient

copy(name=None, *, library_type=None, project=None, resources=None)#

Copy the trainee to another trainee.

Parameters:
  • name (str, optional) – The name of the new trainee.

  • library_type (str, optional) –

    The library type of the Trainee. If not specified, the new trainee will inherit the value from the original. Valid options include:

    • ”st”: use single-threaded library.

    • ”mt”: use multi-threaded library.

  • project (str or Project, optional) – The instance or id of the project to use for the new trainee.

  • resources (TraineeResources or dict, optional) – Customize the resources provisioned for the Trainee instance. If not specified, the new trainee will inherit the value from the original.

Returns:

The new trainee copy.

Return type:

Trainee

property default_action_features: List[str] | None#

The default action features of the trainee.

Warning

This returns a deep copy of the default action features. To update them, use the method set_default_features().

Returns:

The default action feature names for the trainee.

Return type:

list of str or None

property default_context_features: List[str] | None#

The default context features of the trainee.

Warning

This returns a deep copy of the default context features. To update them, use the method set_default_features().

Returns:

The default context feature names for the trainee.

Return type:

list of str or None

delete()#

Delete the trainee.

Return type:

None

delete_session(session)#

Delete a session from the trainee.

Parameters:

session (str or BaseSession) – The id or instance of the session to remove from the model.

Return type:

None

edit_cases(feature_values, *, case_indices=None, condition=None, condition_session=None, features=None, num_cases=None, precision=None)#

Edit feature values for the specified cases.

Parameters:
  • feature_values (list of object or pandas.DataFrame) – The feature values to edit the case(s) with. If specified as a list, the order corresponds with the order of the features parameter. If specified as a DataFrame, only the first row will be used.

  • case_indices (Iterable of Sequence[str, int], optional) – An Iterable of Sequences containing the session id and index, where index is the original 0-based index of the case as it was trained into the session. This explicitly specifies the cases to edit. When specified, condition and condition_session are ignored.

  • condition (dict or None, optional) –

    A condition map to select which cases to edit. Ignored when case_indices are specified.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

  • condition_session (str or BaseSession, optional) – If specified, ignores the condition and operates on all cases for the specified session id or BaseSession instance.

  • features (list of str, optional) – The names of the features to edit. Required when feature_values is not specified as a DataFrame.

  • num_cases (int, default None) – The maximum amount of cases to edit. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.

  • precision (str, default None) – The precision to use when removing the cases. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used.

Returns:

The number of cases modified.

Return type:

int

evaluate(features_to_code_map, *, aggregation_code=None)#

Evaluates custom code on feature values of all cases in the trainee.

Parameters:
  • features_to_code_map (dict[str, str]) –

    A dictionary with feature name keys and custom Amalgam code string values.

    The custom code can use "#feature_name 0" to reference the value of that feature for each case.

  • aggregation_code (str, optional) –

    A string of custom Amalgam code that can access the list of values derived form the custom code in features_to_code_map.

    The custom code can use "#feature_name 0" to reference the list of values derived from using the custom code in features_to_code_map.

Returns:

A dictionary with keys: ‘evaluated’ and ‘aggregated’.

’evaluated’ is a dictionary with feature name keys and lists of values derived from the features_to_code_map custom code.

’aggregated’ is None if no aggregation_code is given, it otherwise holds the output of the custom ‘aggregation_code’

Return type:

dict

property features: Dict[str, Dict]#

The trainee feature attributes.

Warning

This returns a deep copy of the feature attributes. To update features attributes of the trainee, use the method set_feature_attributes().

Returns:

dict of {str – The feature attributes of the trainee.

Return type:

dict}

classmethod from_dict(trainee_dict)#

Create Trainee from dict.

Parameters:

trainee_dict (Dict) – The Trainee parameters.

Returns:

The trainee instance.

Return type:

Trainee

classmethod from_openapi(trainee, *, client=None)#

Create Trainee from base class.

Parameters:
  • trainee (BaseTrainee) – The base trainee instance.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

Returns:

The trainee instance.

Return type:

Trainee

get_cases(*, case_indices=None, features=None, indicate_imputed=False, session=None, condition=None, num_cases=None, precision=None)#

Get the trainee’s cases.

Parameters:
  • case_indices (Iterable of Sequence[str, int], optional) –

    List of tuples, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns only these cases and ignores the session parameter.

    Note

    If case_indices are provided, condition (and precision) are ignored.

  • features (list of str, optional) –

    A list of feature names to return values for in leu of all default features.

    Built-in features that are available for retrieval:

    .session - The session id the case was trained under.
    .session_training_index - 0-based original index of the case, ordered by training during the session; is never changed.

  • indicate_imputed (bool, default False) – If True, an additional value will be appended to the cases indicating if the case was imputed.

  • session (str or BaseSession, optional) –

    The id or instance of the session to retrieve training indices for from the model.

    Note

    If a session is not provided, the order of the cases is not guaranteed to be the same as the order they were trained into the model.

  • condition (dict, optional) –

    The condition map to select the cases to retrieve that meet all the provided conditions.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Note

    This option will be ignored if case_indices is supplied.

    Tip

    Example 1 - Retrieve all values belonging to feature_name:

    criteria = {"feature_name": None}
    

    Example 2 - Retrieve cases that have the value 10:

    criteria = {"feature_name": 10}
    

    Example 3 - Retrieve cases that have a value in range [10, 20]:

    criteria = {"feature_name": [10, 20]}
    

    Example 4 - Retrieve cases that match one of [‘a’, ‘c’, ‘e’]:

    condition = {"feature_name": ['a', 'c', 'e']}
    

    Example 5 - Retrieve cases using session name and index:

    criteria = {'.session':'your_session_name',
                '.session_training_index': 1}
    

  • num_cases (int, default None) – The maximum amount of cases to retrieve. If not specified, the limit will be k cases if precision is “similar”, or no limit if precision is “exact”.

  • precision (str, default None) – The precision to use when retrieving the cases via condition. Options are ‘exact’ or ‘similar’. If not specified, “exact” will be used.

Returns:

The trainee’s cases.

Return type:

pandas.DataFrame

get_distances(features=None, *, action_feature=None, case_indices=None, feature_values=None, use_case_weights=False, weight_feature=None)#

Computes distances matrix for specified cases.

Returns a dict with computed distances between all cases specified in case_indices or from all cases in local model as defined by feature_values.

Parameters:
  • features (list of str, optional) – List of feature names to use when computing distances. If unspecified uses all features.

  • action_feature (str, optional) – The action feature. If specified, uses targeted hyperparameters used to predict this action_feature, otherwise uses targetless hyperparameters.

  • case_indices (Iterable of Sequence[str, int], optional) – List of tuples, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified, returns distances for all of these cases. Ignored if feature_values is provided. If neither feature_values nor case_indices is specified, uses full dataset.

  • feature_values (list of object or pandas.DataFrame, optional) – If specified, returns distances of the local model relative to these values, ignores case_indices parameter. If provided a DataFrame, only the first row will be used.

  • use_case_weights (bool, default False) – If set to True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A dict containing a matrix of computed distances and the list of corresponding case indices in the following format:

{
    'case_indices': [ session-indices ],
    'distances': DataFrame( distances )
}

Return type:

dict

get_extreme_cases(*, features=None, num, sort_feature)#

Get the trainee’s extreme cases.

Parameters:
  • features (list of str, optional) – The features to include in the case data.

  • num (int) – The number of cases to get.

  • sort_feature (str) – The name of the feature by which extreme cases are sorted.

Returns:

The trainee’s extreme cases.

Return type:

pandas.DataFrame

get_feature_contributions(action_feature, *, robust=None, weight_feature=None)#

Get cached feature contributions.

All keyword arguments are optional, when not specified will auto-select cached contributions for output, when specified will attempt to output the cached contributions best matching the requested parameters, or None if no cached match is found.

Deprecated since version 1.0.0: Use Trainee.get_prediction_stats() instead.

Parameters:
  • action_feature (str) – Will attempt to return contributions that were computed for this specified action_feature.

  • robust (bool, optional) – When specified, will attempt to return contributions that were computed with the specified robust or non-robust type.

  • weight_feature (str, optional) – When specified, will attempt to return contributions that were computed using this weight_feature.

Returns:

The contribution values for context features.

Return type:

pandas.DataFrame

get_feature_conviction(*, action_features=None, familiarity_conviction_addition=True, familiarity_conviction_removal=False, features=None, use_case_weights=False, weight_feature=None)#

Get familiarity conviction for features in the model.

Parameters:
  • action_features (list of str, optional) – The feature names to be treated as action features during conviction calculation in order to determine the conviction of each feature against the set of action_features. If not specified, conviction is computed for each feature against the rest of the features as a whole.

  • familiarity_conviction_addition (bool, default True) – (Optional) Calculate and output familiarity conviction of adding the specified cases.

  • familiarity_conviction_removal (bool, default False) – (Optional) Calculate and output familiarity conviction of removing the specified cases.

  • features (list of str, optional) – The feature names to calculate convictions for. At least 2 features are required to get familiarity conviction. If not specified all features will be used.

  • use_case_weights (bool, default False) – When True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A DataFrame containing the familiarity conviction rows to feature columns.

Return type:

pandas.DataFrame

get_feature_mda(action_feature, *, permutation=None, robust=None, weight_feature=None)#

Get cached feature Mean Decrease In Accuracy (MDA).

All keyword arguments are optional, when not specified will auto-select cached MDA for output, when specified will attempt to output the cached MDA best matching the requested parameters, or None if no cached match is found.

Deprecated since version 1.0.0: Use Trainee.get_prediction_stats() instead.

Parameters:
  • action_feature (str) – Will attempt to return MDA that was computed for this specified action_feature.

  • permutation (bool, optional) – When False, will attempt to return MDA that was computed by dropping each feature. When True will attempt to return MDA that was computed with permutations by scrambling each feature.

  • robust (bool, optional) – When specified, will attempt to return MDA that was computed with the specified robust or non-robust type.

  • weight_feature (str, optional) – When specified, will attempt to return MDA that was computed using this weight_feature.

Returns:

The mean decrease in accuracy values for context features.

Return type:

pandas.DataFrame

get_feature_residuals(*, action_feature=None, robust=None, robust_hyperparameters=None, weight_feature=None)#

Get cached feature residuals.

All keyword arguments are optional, when not specified will auto-select cached residuals for output, when specified will attempt to output the cached residuals best matching the requested parameters, or None if no cached match is found.

Deprecated since version 1.0.0: Use Trainee.get_prediction_stats() instead.

Parameters:
  • action_feature (str, optional) – When specified, will attempt to return residuals that were computed for this specified action_feature. Note: “.targetless” is the action feature used during targetless analysis.

  • robust (bool, optional) – When specified, will attempt to return residuals that were computed with the specified robust or non-robust type.

  • robust_hyperparameters (bool, optional) – When specified, will attempt to return residuals that were computed using hyperpparameters with the specified robust or non-robust type.

  • weight_feature (str, optional) – When specified, will attempt to return residuals that were computed using this weight_feature.

Returns:

The feature residuals.

Return type:

pandas.DataFrame

get_marginal_stats(*, weight_feature=None)#

Get marginal stats for all features.

Parameters:

weight_feature (str, optional) – When specified, will attempt to return stats that were computed using this weight_feature.

Returns:

A DataFrame of feature name columns to stat value rows. Indexed by the stat type.

Return type:

DataFrame

get_num_training_cases()#

Return the number of trained cases for the trainee.

Returns:

The number of trained cases.

Return type:

int

get_pairwise_distances(features=None, *, action_feature=None, from_case_indices=None, from_values=None, to_case_indices=None, to_values=None, use_case_weights=False, weight_feature=None)#

Computes pairwise distances between specified cases.

Returns a list of computed distances between each respective pair of cases specified in either from_values or from_case_indices to to_values or to_case_indices. If only one case is specified in any of the lists, all respective distances are computed to/from that one case.

Note

  • One of from_values or from_case_indices must be specified, not both.

  • One of to_values or to_case_indices must be specified, not both.

Parameters:
  • features (list of str, optional) – List of feature names to use when computing pairwise distances. If unspecified uses all features.

  • action_feature (str, optional) – The action feature. If specified, uses targeted hyperparameters used to predict this action_feature, otherwise uses targetless hyperparameters.

  • from_case_indices (Iterable of Sequence[str, int], optional) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length of to_values or to_case_indices.

  • from_values (list of list of object or pandas.DataFrame, optional) – A 2d-list of case values. If specified must be either length of 1 or match length of to_values or to_case_indices.

  • to_case_indices (Iterable of Sequence[str, int], optional) – An Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If specified must be either length of 1 or match length of from_values or from_case_indices.

  • to_values (list of list of object or pandas.DataFrame, optional) – A 2d-list of case values. If specified must be either length of 1 or match length of from_values or from_case_indices.

  • use_case_weights (bool, default False) – If set to True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

A list of computed pairwise distances between each corresponding pair of cases in from_case_indices and to_case_indices.

Return type:

list

get_params()#

Get the workflow attributes used by the trainee.

Returns:

A dict including the trainee’s hyperparameter_map, analyze_threshold, analyze_growth_factor and auto_analyze_limit_size.

Return type:

dict

get_prediction_stats(*, action_feature=None, condition=None, num_cases=None, num_robust_influence_samples_per_case=None, precision=None, robust=None, robust_hyperparameters=None, stats=None, weight_feature=None)#

Get feature prediction stats.

Gets cached stats when condition is None. If condition is not None, then uses the condition to select cases and computes prediction stats for that set of cases.

All keyword arguments are optional, when not specified will auto-select all cached stats for output, when specified will attempt to output the cached stats best matching the requested parameters, or None if no cached match is found.

Parameters:
  • action_feature (str, optional) –

    When specified, will attempt to return stats that were computed for this specified action_feature. Note: “.targetless” is the action feature used during targetless analysis.

    Note

    If get_prediction_stats is being used with time series analysis, the action feature for which the prediction statistics information is desired must be specified.

  • condition (dict or None, optional) –

    A condition map to select which cases to compute prediction stats for.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

  • num_cases (int, default None) – The maximum amount of cases to use to calculate prediction stats. If not specified, the limit will be k cases if precision is “similar”, or 1000 cases if precision is “exact”. Only used if condition is not None.

  • num_robust_influence_samples_per_case (int, optional) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).

  • precision (str, default None) – The precision to use when selecting cases with the condition. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used. Only used if condition is not None.

  • robust (bool, optional) – When specified, will attempt to return stats that were computed with the specified robust or non-robust type.

  • robust_hyperparameters (bool, optional) – When specified, will attempt to return stats that were computed using hyperparameters with the specified robust or non-robust type.

  • stats (list of str, optional) –

    List of stats to output. When unspecified, returns all. Allowed values:

    • accuracy : The number of correct predictions divided by the total number of predictions.

    • confusion_matrix : A map of actual feature value to a map of predicted feature value to counts.

    • contribution : Feature contributions to predicted value when each feature is dropped from the model, applies to all features.

    • mae : Mean absolute error. For continuous features, this is calculated as the mean of absolute values of the difference between the actual and predicted values. For nominal features, this is 1 - the average categorical action probability of each case’s correct classes. Categorical action probabilities are the probabilities for each class for the action feature.

    • mda : Mean decrease in accuracy when each feature is dropped from the model, applies to all features.

    • mda_permutation : Mean decrease in accuracy that used scrambling of feature values instead of dropping each feature, applies to all features.

    • precision : Precision (positive predictive) value for nominal features only.

    • r2 : The r-squared coefficient of determination, for continuous features only.

    • recall : Recall (sensitivity) value for nominal features only.

    • rmse : Root mean squared error, for continuous features only.

    • spearman_coeff : Spearman’s rank correlation coefficient, for continuous features only.

  • weight_feature (str, optional) – When specified, will attempt to return stats that were computed using this weight_feature.

Returns:

A DataFrame of feature name columns to stat value rows. Indexed by the stat type.

Return type:

DataFrame

get_session_indices(session)#

Get all session indices for a specified session.

Parameters:

session (str or BaseSession) – The id or instance of the session to retrieve indices for from the model.

Returns:

An index of the session indices for the requested session.

Return type:

pandas.Index

get_session_training_indices(session)#

Get all session training indices for a specified session.

Parameters:

session (str or BaseSession) – The id or instance of the session to retrieve training indices for from the model.

Returns:

An index of the session training indices for the requested session.

Return type:

pandas.Index

get_sessions()#

Get all session ids of the trainee.

Returns:

A list of dicts with keys “id” and “name” for each session in the model.

Return type:

list of dict

get_substitute_feature_values(*, clear_on_get=True)#

Get a substitution map for use in extended nominal generation.

Parameters:

clear_on_get (bool, default True) – Clears the substitution values map in the trainee upon retrieving them. This is done if it is desired to prevent the substitution map from being persisted. If set to False, the model will not be cleared which preserves substitution mappings if the model is saved; representing a potential privacy leak should the substitution map be made public.

Returns:

dict of {str – A dictionary of feature name to a dictionary of feature value to substitute feature value.

Return type:

dict}

property id: str#

The unique identifier of the trainee.

Returns:

The trainee’s ID.

Return type:

str

impute(*, batch_size=1, features=None, features_to_impute=None)#

Impute (fill) the missing values for the specified features_to_impute.

If no ‘features’ are specified, will use all features in the trainee for imputation. If no ‘features_to_impute’ are specified, will impute all features specified by ‘features’.

Parameters:
  • batch_size (int, default 1) –

    Larger batch size will increase speed but decrease accuracy. Batch size indicates how many rows to fill before recomputing conviction.

    The default value (which is 1) should return the best accuracy but might be slower. Higher values should improve performance but may decrease accuracy of results.

  • features (list of str, optional) – A list of feature names to use for imputation. If not specified, all features will be used.

  • features_to_impute (list of str, optional) – A list of feature names to impute. If not specified, features will be used.

Return type:

None

information()#

Get detail information about the trainee.

Returns:

The trainee detail information. Including trainee version and configuration parameters.

Return type:

TraineeInformation

property metadata: Dict[str, Any] | None#

The trainee metadata.

Warning

This returns a deep copy of the metadata. To update the metadata of the trainee, use the method set_metadata().

Returns:

The metadata of the trainee.

Return type:

dict

metrics()#

Get metric information of the trainee.

Returns:

The trainee metric information. Including cpu and memory.

Return type:

Metrics

property name: str | None#

The name of the trainee.

Returns:

The name.

Return type:

str or None

optimize(*args, **kwargs)#

Optimizes a trainee.

Deprecated since version 6.0.0: Use Trainee.analyze() instead.

Parameters:
  • context_features (iterable of str, optional) – The context features to optimize for.

  • action_features (iterable of str, optional) – The action features to optimize for.

  • k_folds (int) – optional, (default 6) number of cross validation folds to do

  • bypass_hyperparameter_optimization (bool) – optional, bypasses hyperparameter optimization

  • bypass_calculate_feature_residuals (bool) – optional, bypasses feature residual calculation

  • bypass_calculate_feature_weights (bool) – optional, bypasses calculation of feature weights

  • use_deviations (bool) – optional, uses deviations for LK metric in queries

  • num_samples (int) – used in calculating feature residuals

  • k_values (list of int) – optional list used in hyperparameter search

  • p_values (list of float) – optional list used in hyperparameter search

  • dwe_values (list of float) – optional list used in hyperparameter search

  • optimize_level (int) –

    optional value, if specified, will optimize for the following flows:

    1. predictions/accuracy (hyperparameters)

    2. data synth (cache: global residuals)

    3. standard explanations

    4. full analysis

  • targeted_model ({"omni_targeted", "single_targeted", "targetless"}) –

    optional, valid values as follows:

    ”single_targeted” = optimize hyperparameters for the

    specified action_features

    ”omni_targeted” = optimize hyperparameters for each context

    feature as an action feature, ignores action_features parameter

    ”targetless” = optimize hyperparameters for all context

    features as possible action features, ignores action_features parameter

  • num_optimization_samples (int, optional) – If the dataset size to too large, optimize on (randomly sampled) subset of data. The num_optimization_samples specifies the number of observations to be considered for optimization.

  • optimization_sub_model_size (int or Node, optional) – Number of samples to use for optimization. The rest will be randomly held-out and not included in calculations.

  • inverse_residuals_as_weights (bool, default is False) – When True will compute and use inverse of residuals as feature weights

  • use_case_weights (bool, default False) – When True will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

  • kwargs – Additional experimental optimize parameters.

persist()#

Persist the trainee.

Return type:

None

property persistence: str#

The persistence state of the trainee.

Returns:

The trainee’s persistence value.

Return type:

str

predict(contexts=None, *, action_features=None, allow_nulls=False, case_indices=None, context_features=None, derived_action_features=None, derived_context_features=None, leave_case_out=None, suppress_warning=False, use_case_weights=False, weight_feature=None)#

Wrapper around react().

Performs a discriminative react to predict the action feature values based on the given contexts. Returns only the predicted action values.

See also

react()

Parameters:
  • contexts (list of list of object or pandas.DataFrame, optional) – (Optional) The context values to react to. If context values are not specified, then case_indices must be specified.

  • action_features (list of str, optional) – (Optional) Feature names to treat as action features during react. If no action_features is specified, the Trainee default_action_features is used.

  • allow_nulls (bool, default False, optional) – See parameter allow_nulls in react().

  • case_indices (Iterable of Sequence[str, int], default None, optional) – (Optional) Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If this case does not exist, discriminative react outputs null.

  • context_features (list of str, optional) – (Optional) Feature names to treat as context features during react. If no context_features is specified, then the Trainee’s default_action_features are used. If the Trainee has no default_action_features, then context_features will be all of the features excluding the action_features.

  • derived_action_features (list of str, optional) – See parameter derived_action_features in react().

  • derived_context_features (list of str, optional) – See parameter derived_context_features in react().

  • leave_case_out (bool, default False) – See parameter leave_case_out in react().

  • suppress_warning (bool, default False) – See parameter suppress_warning in react().

  • use_case_weights (bool, default False) – See parameter use_case_weights in react().

  • weight_feature (str, optional) – See parameter weight_feature in react().

Returns:

DataFrame consisting of the discriminative predicted results.

Return type:

pandas.DataFrame

property project: Project | None#

The trainee’s project.

Returns:

The trainee’s project.

Return type:

Project or None

property project_id: str | None#

The unique identifier of the trainee’s project.

Returns:

The trainee’s project ID.

Return type:

str or None

react(contexts=None, *, action_features=None, actions=None, allow_nulls=False, case_indices=None, context_features=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, feature_bounds_map=None, generate_new_cases='no', input_is_substituted=False, into_series_store=None, leave_case_out=None, new_case_threshold='min', num_cases_to_generate=1, ordered_by_specified_features=False, preserve_feature_values=None, progress_callback=None, substitute_output=True, suppress_warning=False, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

React to the trainee.

If desired_conviction is specified, executes a generative react, producing action_values for the specified action_features conditioned on the optionally provided contexts.

If desired_conviction is not specified, executes a discriminative react. Provided a list of contexts, the trainee reacts to the model and produces predictions for the specified actions.

Parameters:
  • contexts (list of list of object or pandas.DataFrame, optional) – The context values to react to.

  • action_features (list of str, optional) – Feature names to treat as action features during react.

  • actions (list of list of object or pandas.DataFrame, optional) – One or more action values to use for action features. If specified, will only return the specified explanation details for the given actions. (Discriminative reacts only)

  • allow_nulls (bool, default False) – (Optional) When true will allow return of null values if there are nulls in the local model for the action features, applicable only to discriminative reacts.

  • case_indices (Iterable of Sequence[str, int], defaults to None) – (Optional) Iterable of Sequences, of session id and index, where index is the original 0-based index of the case as it was trained into the session. If this case does not exist, discriminative react outputs null, generative react ignores it.

  • context_features (list of str, optional) – Feature names to treat as context features during react.

  • derived_action_features (list of str, optional) –

    Features whose values should be computed after reaction from the resulting case prior to output, in the specified order. Must be a subset of action_features.

    Note

    Both of these derived feature lists rely on the features’ “derived_feature_code” attribute to compute the values. If the “derived_feature_code” attribute is undefined or references a non-0 feature indices, the derived value will be null.

  • derived_context_features (list of str, optional) – Features whose values should be computed from the provided context in the specified order.

  • desired_conviction (float, optional) – If specified will execute a generative react. If not specified will execute a discriminative react. Conviction is the ratio of expected surprisal to generated surprisal for each feature generated, valid values are in the range of (0,infinity].

  • details (dict of {str: object}) –

    If details are specified, the response will contain the requested explanation data along with the reaction. Below are the valid keys and data types for the different audit details. Omitted keys, values set to None, or False values for Booleans will not be included in the audit data returned.

    • boundary_casesbool, optional

      If True, outputs an automatically determined (when ‘num_boundary_cases’ is not specified) relevant number of boundary cases. Uses both context and action features of the reacted case to determine the counterfactual boundary based on action features, which maximize the dissimilarity of action features while maximizing the similarity of context features. If action features aren’t specified, uses familiarity conviction to determine the boundary instead.

    • boundary_cases_familiarity_convictionsbool, optional

      If True, outputs familiarity conviction of addition for each of the boundary cases.

    • case_contributionsbool, optional

      If True, outputs each influential case’s differences between the predicted action feature value and the predicted action feature value if each individual case were not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • case_feature_residualsbool, optional

      If True, outputs feature residuals for all (context and action) features for just the specified case. Uses leave-one-out for each feature, while using the others to predict the left out feature with their corresponding values from this case. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • case_mdabool, optional

      If True, outputs each influential case’s mean decrease in accuracy of predicting the action feature in the local model area, as if each individual case were included versus not included. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • categorical_action_probabilitiesbool, optional

      If True, outputs probabilities for each class for the action. Applicable only to categorical action features.

    • distance_contributionbool, optional

      If True, outputs the distance contribution (expected total surprisal contribution) for the reacted case. Uses both context and action feature values.

    • distance_ratiobool, optional

      If True, outputs the ratio of distance (relative surprisal) between this reacted case and its nearest case to the minimum distance (relative surprisal) in between the closest two cases in the local area. All distances are computed using only the specified context features.

    • feature_contributionsbool, optional

      If True, outputs each context feature’s differences between the predicted action feature value and the predicted action feature value if each context were not in the model for all context features in the local model area. Relies ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • feature_mdabool, optional

      If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature given the context. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • feature_mda_ex_postbool, optional

      If True, outputs each context feature’s mean decrease in accuracy of predicting the action feature as an explanation given that the specified prediction was already made as specified by the action value. Uses both context and action features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • feature_residualsbool, optional

      If True, outputs feature residuals for all (context and action) features locally around the prediction. Uses only the context features of the reacted case to determine that area. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • global_case_feature_residual_convictionsbool, optional

      If True, outputs this case’s feature residual convictions for the global model. Computed as: global model feature residual divided by case feature residual. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • hypothetical_valuesdict, optional

      A dictionary of feature name to feature value. If specified, shows how a prediction could change in a what-if scenario where the influential cases’ context feature values are replaced with the specified values. Iterates over all influential cases, predicting the action features each one using the updated hypothetical values. Outputs the predicted arithmetic over the influential cases for each action feature.

    • influential_casesbool, optional

      If True, outputs the most influential cases and their influence weights based on the surprisal of each case relative to the context being predicted among the cases. Uses only the context features of the reacted case.

    • influential_cases_familiarity_convictionsbool, optional

      If True, outputs familiarity conviction of addition for each of the influential cases.

    • influential_cases_raw_weightsbool, optional

      If True, outputs the surprisal for each of the influential cases.

    • local_case_feature_residual_convictionsbool, optional

      If True, outputs this case’s feature residual convictions for the region around the prediction. Uses only the context features of the reacted case to determine that region. Computed as: region feature residual divided by case feature residual. Relies on ‘robust_computation’ parameter to determine whether to do standard or robust computation.

    • most_similar_casesbool, optional

      If True, outputs an automatically determined (when ‘num_most_similar_cases’ is not specified) relevant number of similar cases, which will first include the influential cases. Uses only the context features of the reacted case.

    • num_boundary_casesint, optional

      Outputs this manually specified number of boundary cases.

    • num_most_similar_casesint, optional

      Outputs this manually specified number of most similar cases, which will first include the influential cases.

    • num_most_similar_case_indicesint, optional

      Outputs this specified number of most similar case indices when ‘distance_ratio’ is also set to True.

    • num_robust_influence_samples_per_caseint, optional
      Specifies the number of robust samples to use for each case.

      Applicable only for computing robust feature contributions or

      robust case feature contributions. Defaults to 2000. Higher values will take longer but provide more stable results.

    • observational_errorsbool, optional

      If True, outputs observational errors for all features as defined in feature attributes.

    • outlying_feature_valuesbool, optional

      If True, outputs the reacted case’s context feature values that are outside the min or max of the corresponding feature values of all the cases in the local model area. Uses only the context features of the reacted case to determine that area.

    • prediction_residual_conviction: bool, optional

      If True, outputs residual conviction for the reacted case’s action features by computing the prediction residual for the action features in the local model area. Uses both context and action features to determine that area. This is defined as the expected (global) model residual divided by computed local residual.

    • prediction_similarity_convictionbool, optional

      If True, outputs similarity conviction for the reacted case. Uses both context and action feature values as the case values for all computations. This is defined as expected (global) distance contribution divided by reacted case distance contribution.

    • robust_computationbool, optional

      Default is False, uses leave-one-out for features (or cases, as needed) for all relevant computations. If True, uses uniform sampling from the power set of all combinations of features (or cases, as needed) instead.

  • feature_bounds_map (dict of {str: dict of {str: object}}, optional) –

    A mapping of feature names to the bounds for the feature values to be generated in. For continuous features this should be a numeric value, for datetimes this should be a datetime string or a numeric epoch value. Min bounds should be equal to or smaller than max bounds, except when setting the bounds around the cycle length of a cyclic feature. (e.g., to allow 0 +/- 60 degrees, set min=300 and max=60).

    Example:

    {
        "feature_a": {"min": 0},
        "feature_b" : {"min": 1, "max": 5},
        "feature_c": {"max": 1}
    }
    

  • generate_new_cases (str, default "no") –

    This parameter takes in a string that may be one of the following:

    • attempt: Geminai attempts to generate new cases and if its not possible to generate a new case, it might generate cases in “no” mode (see point c.)

    • always: Geminai always generates new cases and if its not possible to generate a new case, it returns None.

    • no: Geminai generates data based on the desired_conviction specified and the generated data is not guaranteed to be a new case (that is, a case not found in original dataset.)

  • input_is_substituted (bool, default False) – When True, assumes provided categorical (nominal or ordinal) feature values have already been substituted.

  • into_series_store (str, optional) – The name of a series store. If specified, will store an internal record of all react contexts for this session and series to be used later with train series.

  • leave_case_out (bool, default False) – When True and specified along with case_indices, each individual react will respectively ignore the corresponding case specified by case_indices by leaving it out.

  • new_case_threshold (str, default None) –

    (Optional) Distance to determine the privacy cutoff. If None, will default to “min”.

    Possible values:

    • min: minimum distance in the original local space.

    • max: maximum distance in the original local space.

    • most_similar: distance between the nearest neighbor to the nearest neighbor in the original space.

  • num_cases_to_generate (int, default 1) – The number of cases to generate.

  • ordered_by_specified_features (bool, default False) – When True, the order of generated feature values will match the order of specified features.

  • preserve_feature_values (list of str, optional) – Features that will preserve their values from the case specified by case_indices, appending and overwriting the specified contexts as necessary. For generative reacts, if case_indices isn’t specified will preserve feature values of a random case.

  • progress_callback (callable or None, optional) – (Optional) A callback method that will be called before each batched call to react and at the end of reacting. The method is given a ProgressTimer containing metrics on the progress and timing of the react operation, and the batch result.

  • substitute_output (bool, default True) – When False, will not substitute categorical feature values. Only applicable if a substitution value map has been set.

  • suppress_warning (bool, default False) – When True, warnings will not be displayed.

  • use_case_weights (bool, default False) – When True, will scale influence weights by each case’s weight_feature weight.

  • use_regional_model_residuals (bool, default True) – When false, uses model feature residuals. When True, recalculates regional model residuals.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

dict of {action – The action values and explanations.

Return type:

pandas.DataFrame, explanation: dict}

react_group(*, distance_contributions=False, familiarity_conviction_addition=True, familiarity_conviction_removal=False, features=None, new_cases=None, kl_divergence_addition=False, kl_divergence_removal=False, p_value_of_addition=False, p_value_of_removal=False, trainees_to_compare=None, use_case_weights=False, weight_feature=None)#

Computes specified data for a set of cases.

Return the list of familiarity convictions (and optionally, distance contributions or p values) for each set.

Parameters:
  • distance_contributions (bool, default False) – (Optional) Calculate and output distance contribution ratios in the output dict for each case.

  • familiarity_conviction_addition (bool, default True) – (Optional) Calculate and output familiarity conviction of adding the specified cases.

  • familiarity_conviction_removal (bool, default False) – (Optional) Calculate and output familiarity conviction of removing the specified cases.

  • features (list of str or None, optional) – A list of feature names to consider while calculating convictions.

  • kl_divergence_addition (bool, default False) – (Optional) Calculate and output KL divergence of adding the specified cases.

  • kl_divergence_removal (bool, default False) – (Optional) Calculate and output KL divergence of removing the specified cases.

  • new_cases (list of list of list of object or list of pandas.DataFrame, optional) –

    Specify a set using a list of cases to compute the conviction of groups of cases as shown in the following example.

    Example:

    new_cases = [
        [[1, 2, 3], [4, 5, 6], [7, 8, 9]], # Group 1
        [[1, 2, 3]], # Group 2
    ]
    

  • p_value_of_addition (bool, default False) – (Optional) If true will output p value of addition.

  • p_value_of_removal (bool, default False) – (Optional) If true will output p value of removal.

  • trainees_to_compare (list of (str or Trainee), optional) – (Optional) If specified ignores the ‘new_cases’ parameter and uses cases from the specified trainee(s) instead. Values should be either the trainee object or its ID (trainee name is not supported).

  • use_case_weights (bool, default False) – When True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Returns:

The conviction of grouped cases.

Return type:

pandas.DataFrame

react_into_features(*, distance_contribution=False, familiarity_conviction_addition=False, familiarity_conviction_removal=False, features=None, p_value_of_addition=False, p_value_of_removal=False, use_case_weights=False, weight_feature=None)#

Calculate conviction and other data and stores them into features.

Parameters:
  • distance_contribution (bool or str, default False) – The name of the feature to store distance contribution. If set to True the values will be stored to the feature ‘distance_contribution’.

  • familiarity_conviction_addition (bool or str, default False) – (Optional) The name of the feature to store conviction of addition values. If set to True the values will be stored to the feature ‘familiarity_conviction_addition’.

  • familiarity_conviction_removal (bool or str, default False) – (Optional) The name of the feature to store conviction of removal values. If set to True the values will be stored to the feature ‘familiarity_conviction_removal’.

  • features (list of str, optional) – A list of features to calculate convictions.

  • p_value_of_addition (bool or str, default False) – (Optional) The name of the feature to store p value of addition values. If set to True the values will be stored to the feature ‘p_value_of_addition’.

  • p_value_of_removal (bool or str, default False) – (Optional) The name of the feature to store p value of removal values. If set to True the values will be stored to the feature ‘p_value_of_removal’.

  • use_case_weights (bool, default False) – When True, will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – Name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Return type:

None

react_into_trainee(*, action_feature=None, context_features=None, contributions=None, contributions_robust=None, hyperparameter_param_path=None, mda=None, mda_permutation=None, mda_robust=None, mda_robust_permutation=None, num_robust_influence_samples=None, num_robust_residual_samples=None, num_robust_influence_samples_per_case=None, num_samples=None, residuals=None, residuals_robust=None, sample_model_fraction=None, sub_model_size=None, use_case_weights=False, weight_feature=None)#

Compute and cache specified feature interpretations.

Parameters:
  • action_feature (str, optional) – Name of target feature whose hyperparameters to use for computations. Default is whatever the model was analyzed for, or the mda_action_features for MDA, or “.targetless” if analyzed for targetless.

  • context_features (list of str, optional) – List of features names to use as contexts for computations. Default is all trained non-unique features if unspecified.

  • contributions (bool, optional) – For each context_feature, use the full set of all other context_features to compute the mean absolute delta between prediction of action_feature with and without the context_feature in the model. False removes cached values.

  • contributions_robust (bool, optional) – For each context_feature, use the robust (power set/permutation) set of all other context_features to compute the mean absolute delta between prediction of action_feature with and without the context_feature in the model. False removes cached values.

  • hyperparameter_param_path (list of str, optional.) – Full path for hyperparameters to use for computation. If specified for any residual computations, takes precendence over action_feature parameter. Can be set to a ‘paramPath’ value from the results of ‘get_params()’ for a specific set of hyperparameters.

  • mda (bool, optional) – When True will compute Mean Decrease in Accuracy (MDA) for each context feature at predicting mda_action_features. Drop each feature and use the full set of remaining context features for each prediction. False removes cached values.

  • mda_permutation (bool, optional) – Compute MDA by scrambling each feature and using the full set of remaining context features for each prediction. False removes cached values.

  • mda_robust (bool, optional) – Compute MDA by dropping each feature and using the robust (power set/permutations) set of remaining context features for each prediction. False removes cached values.

  • mda_robust_permutation (bool, optional) – Compute MDA by scrambling each feature and using the robust (power set/permutations) set of remaining context features for each prediction. False removes cached values.

  • num_robust_influence_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for robust contribution computation. Defaults to 300.

  • num_robust_residual_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for robust mda and residual computation. Defaults to 1000 * (1 + log(number of features)). Note: robust mda will be updated to use num_robust_influence_samples in a future release.

  • num_robust_influence_samples_per_case (int, optional) – Specifies the number of robust samples to use for each case for robust contribution computations. Defaults to 300 + 2 * (number of features).

  • num_samples (int, optional) – Total sample size of model to use (using sampling with replacement) for all non-robust computation. Defaults to 1000. If specified overrides sample_model_fraction.```

  • residuals (bool, optional) – For each context_feature, use the full set of all other context_features to predict the feature. False removes cached values.

  • residuals_robust (bool, optional) – For each context_feature, use the robust (power set/permutations) set of all other context_features to predict the feature. False removes cached values.

  • sample_model_fraction (float, optional) – A value between 0.0 - 1.0, percent of model to use in sampling (using sampling without replacement). Applicable only to non-robust computation. Ignored if num_samples is specified. Higher values provide better accuracy at the cost of compute time.

  • sub_model_size (int, optional) – Subset of model to use for calculations. Applicable only to models > 1000 cases.

  • use_case_weights (bool, default False) – If set to True will scale influence weights by each case’s weight_feature weight.

  • weight_feature (str, optional) – The name of feature whose values to use as case weights. When left unspecified uses the internally managed case weight.

Return type:

None

react_series(contexts=None, *, action_features=None, actions=None, case_indices=None, context_features=None, derived_action_features=None, derived_context_features=None, desired_conviction=None, details=None, feature_bounds_map=None, final_time_steps=None, generate_new_cases='no', series_index='.series', init_time_steps=None, initial_features=None, initial_values=None, input_is_substituted=False, leave_case_out=None, max_series_lengths=None, new_case_threshold='min', num_series_to_generate=1, ordered_by_specified_features=False, output_new_series_ids=True, preserve_feature_values=None, progress_callback=None, series_context_features=None, series_context_values=None, series_id_tracking='fixed', series_stop_maps=None, substitute_output=True, suppress_warning=False, use_case_weights=False, use_regional_model_residuals=True, weight_feature=None)#

React to the trainee in a series until a stop condition is met.

Aggregates rows of data corresponding to the specified context, action, derived_context and derived_action features, utilizing previous rows to derive values as necessary. Outputs an dict of “action_features” and corresponding “series” where “series” is the completed ‘matrix’ for the corresponding action_features and derived_action_features.

Parameters:
  • contexts (list of list of object or pandas.DataFrame, optional) – The context values to react to.

  • action_features (list of str, optional) – See parameter action_features in react().

  • actions (list of list of object or pandas.DataFrame, optional) – See parameter actions in react().

  • case_indices (Iterable of Sequence[str, int]) – See parameter case_indices in react().

  • context_features (list of str, optional) – See parameter context_features in react().

  • derived_action_features (list of str, optional) – See parameter derived_action_features in react().

  • derived_context_features (list of str, optional) – See parameter derived_context_features in react().

  • desired_conviction (float, optional) – See parameter desired_conviction in react().

  • details (dict of {str: object}) – See parameter details in react().

  • feature_bounds_map (dict of {str: dict of {str: object}}, optional) – See parameter feature_bounds_map in react().

  • final_time_steps (list of object, optional) – The time steps at which to end synthesis. Time-series only. Time-series only. Must provide either one for all series, or exactly one per series.

  • generate_new_cases (str, default "no") – See parameter generate_new_cases in react().

  • series_index (str, default ".series") – When set to a string, will include the series index as a column in the returned DataFrame using the column name given. If set to None, no column will be added.

  • init_time_steps (list of object, optional) – The time steps at which to begin synthesis. Time-series only. Time-series only. Must provide either one for all series, or exactly one per series.

  • initial_features (list of str, optional) – Features to condition just the first case in a series, overwrites context_features and derived_context_features for that first case. All specified initial features must be in one of: context_features, action_features, derived_context_features or derived_action_features. If provided a value that isn’t in one of those lists, it will be ignored.

  • initial_values (list of list of object or pandas.DataFrame, optional) – Values corresponding to the initial_features, used to condition just the first case in each series. Must provide either exactly one value to use for all series, or one per series.

  • input_is_substituted (bool, default False) – See parameter input_is_substituted in react().

  • leave_case_out (bool, default False) – See parameter leave_case_out in react().

  • max_series_lengths (list of int, optional) – Maximum size a series is allowed to be. A 0 or less is no limit. Must provide either exactly one to use for all series, or one per series. Default is 3 * model_size

  • new_case_threshold (str or None, optional) – (Optional) See parameter new_case_threshold in react().

  • num_series_to_generate (int, default 1) – The number of series to generate.

  • ordered_by_specified_features (bool, default False) – See parameter ordered_by_specified_features in react().

  • output_new_series_ids (bool, default True) – If True, series ids are replaced with unique values on output. If False, will maintain or replace ids with existing trained values, but also allows output of series with duplicate existing ids.

  • preserve_feature_values (list of str, optional) – See parameter preserve_feature_values in react().

  • progress_callback (callable or None, optional) – (Optional) A callback method that will be called before each batched call to react series and at the end of reacting. The method is given a ProgressTimer containing metrics on the progress and timing of the react series operation, and the batch result.

  • series_context_features (list of str, default None) – (Optional) list of context features corresponding to series_context_values, if specified must not overlap with any initial_features or context_features.

  • series_context_values (list of list of list of object or list of pandas.DataFrame, default None) – (Optional) 3d-list of context values, one for each feature for each row for each series. If specified, batch_size and max_series_lengths are ignored.

  • series_id_tracking ({"fixed", "dynamic", "no"}, default "fixed") –

    Controls how closely generated series should follow existing series (plural).

    • If “fixed”, tracks the particular relevant series ID.

    • If “dynamic”, tracks the particular relevant series ID, but is allowed to change the series ID that it tracks based on its current context.

    • If “no”, does not track any particular series ID.

  • series_stop_maps (list of dict of {str: dict}, optional) –

    Map of series stop conditions. Must provide either exactly one to use for all series, or one per series.

    Tip

    Stop series when value exceeds max or is smaller than min:

    {"feature_name":  {"min" : 1, "max": 2}}
    

    Stop series when feature value matches any of the values listed:

    {"feature_name":  {"values": ["val1", "val2"]}}
    

  • substitute_output (bool, default True) – See parameter substitute_output in react().

  • suppress_warning (bool, default False) – See parameter suppress_warning in react().

  • use_case_weights (bool, default False) – See parameter use_case_weights in react().

  • use_regional_model_residuals (bool, default True) – See parameter use_regional_model_residuals in react().

  • weight_feature (str, optional) – See parameter weight_feature in react().

Returns:

The series action values.

Return type:

pandas.DataFrame

release_resources()#

Release a trainee’s resources from the Diveplane service.

Return type:

None

remove_cases(num_cases, *, condition=None, condition_session=None, distribute_weight_feature=None, precision=None, preserve_session_data=False)#

Remove training cases from the trainee.

The training cases will be completely purged from the model and the model will behave as if it had never been trained with them.

Parameters:
  • num_cases (int) – The number of cases to remove; minimum 1 case must be removed.

  • condition (dict, optional) –

    The condition map to select the cases to remove that meet all the provided conditions.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    Example 1 - Remove all values belonging to feature_name:

    condition = {"feature_name": None}
    

    Example 2 - Remove cases that have the value 10:

    condition = {"feature_name": 10}
    

    Example 3 - Remove cases that have a value in range [10, 20]:

    condition = {"feature_name": [10, 20]}
    

    Example 4 - Remove cases that match one of [‘a’, ‘c’, ‘e’]:

    condition = {"feature_name": ['a', 'c', 'e']}
    

    Example 5 - Remove cases using session name and index:

    condition = {".session": "your_session_name",
                 ".session_training_index": 1}
    

  • condition_session (str or BaseSession, optional) – If specified, ignores the condition and operates on cases for the specified session id or BaseSession instance.

  • distribute_weight_feature (str, default None) – When specified, will distribute the removed cases’ weights from this feature into their neighbors.

  • precision (str, default None) – The precision to use when removing the cases. Options are ‘exact’ or ‘similar’. If not specified “exact” will be used.

  • preserve_session_data (bool, default False) – When True, will remove cases without cleaning up session data.

Returns:

The number of cases removed.

Return type:

int

remove_feature(feature, *, condition=None, condition_session=None)#

Remove a feature from the trainee.

Parameters:
  • feature (str) – The name of the feature to remove.

  • condition (dict, default None) –

    A condition map where features will only be removed when certain criteria is met.

    If None, the feature will be removed from all cases in the model and feature metadata will be updated to exclude it. If specified as an empty dict, the feature will still be removed from all cases in the model but the feature metadata will not be updated.

    Note

    The dictionary keys are the feature name and values are one of:

    • None

    • A value, must match exactly.

    • An array of two numeric values, specifying an inclusive range. Only applicable to continuous and numeric ordinal features.

    • An array of string values, must match any of these values exactly. Only applicable to nominal and string ordinal features.

    Tip

    For instance to remove the length feature only when the value is between 1 and 5:

    condition = {"length": [1, 5]}
    

  • condition_session (str or BaseSession, optional) – If specified, ignores the condition and operates on cases for the specified session id or BaseSession instance.

Return type:

None

remove_series_store(series=None)#

Clear stored series from trainee.

Parameters:

series (str, optional) – Series id to clear. If not provided, clears the entire series store for the trainee.

Return type:

None

set_auto_analyze_params(auto_analyze_enabled=False, analyze_threshold=None, *, auto_analyze_limit_size=None, analyze_growth_factor=None, **kwargs)#

Set parameters for auto analysis.

Auto-analysis is disabled if this is called without specifying an analyze_threshold.

Parameters:
  • auto_analyze_enabled (bool, default False) – When True, the train() method will trigger an analyze when it’s time for the model to be analyzed again.

  • analyze_threshold (int, optional) – The threshold for the number of cases at which the model should be re-analyzed.

  • auto_analyze_limit_size (int, optional) – The size of the model at which to stop doing auto-analysis. Value of 0 means no limit.

  • analyze_growth_factor (float, optional) – The factor by which to increase the analysis threshold every time the model grows to the current threshold size.

  • kwargs (dict, optional) – See parameters in analyze().

Return type:

None

set_auto_optimize_params(*args, **kwargs)#

Set trainee parameters for auto optimization.

Deprecated since version 6.0.0: Use Trainee.set_auto_analyze_params() instead.

Parameters:
  • auto_optimize_enabled (bool, default False) – When True, the train() method will trigger an optimize when it’s time for the model to be optimized again.

  • optimize_threshold (int, optional) – The threshold for the number of cases at which the model should be re-optimized.

  • auto_optimize_limit_size (int, optional) – The size of of the model at which to stop doing auto-optimization. Value of 0 means no limit.

  • optimize_growth_factor (float, optional) – The factor by which to increase the optimize threshold every time the model grows to the current threshold size.

  • kwargs (dict, optional) – Parameters specific for optimize() may be passed in via kwargs, and will be cached and used during future auto-optimizations.

set_default_features(*, action_features, context_features)#

Update the trainee default features.

Parameters:
  • action_features (list of str or None) – The default action feature names.

  • context_features (list of str or None) – The default context feature names.

Return type:

None

set_feature_attributes(feature_attributes)#

Update the trainee feature attributes.

Parameters:

feature_attributes (dict of {str: dict}) – The feature attributes of the trainee. Where feature name is the key and a sub dictionary of feature attributes is the value.

Return type:

None

set_metadata(metadata)#

Update the trainee metadata.

Parameters:

metadata (dict or None) – Any key-value pair to store as custom metadata for the trainee. Providing None will remove the current metadata.

Return type:

None

set_params(params)#

Set the workflow attributes for the trainee.

Parameters:

params (dict) –

A dictionary in the following format containing the hyperparameter information, which is required, and other parameters which are all optional.

Example:

{
    "hyperparameter_map": {
        ".targetless": {
            "robust": {
                ".none": {
                    "dt": -1, "p": .1, "k": 8
                }
            }
        }
    },
    "auto_analyze_enabled": False,
    "analyze_threshold": 100,
    "analyze_growth_factor": 7.389,
    "auto_analyze_limit_size": 100000
}

Return type:

None

set_random_seed(seed)#

Set the random seed for the trainee.

Parameters:

seed (int or float or str) – The random seed.

Return type:

None

set_substitute_feature_values(substitution_value_map)#

Set a substitution map for use in extended nominal generation.

Parameters:

substitution_value_map (dict) –

A dictionary of feature name to a dictionary of feature value to substitute feature value.

If this dict is None, all substitutions will be disabled and cleared. If any feature in the substitution_value_map has features mapping to None or {}, substitution values will immediately be generated.

Return type:

None

train(cases, *, ablatement_params=None, accumulate_weight_feature=None, batch_size=None, derived_features=None, features=None, input_is_substituted=False, progress_callback=None, series=None, train_weights_only=False, validate=True)#

Train one or more cases into the trainee (model).

Parameters:
  • cases (list of list of object or pandas.DataFrame) – One or more cases to train into the model.

  • ablatement_params (dict [str, list of obj], optional) –

    A dict of feature name to threshold type. Valid thresholds include:

    • [‘exact’]: Don’t train if prediction matches exactly

    • [‘tolerance’, MIN, MAX]: Don’t train if prediction >= (case value - MIN) & prediction <= (case value + MAX)

    • [‘relative’, PERCENT]: Don’t train if abs(prediction - case value) / prediction <= PERCENT

    • [‘residual’]: Don’t train if abs(prediction - case value) <= feature residual

  • accumulate_weight_feature (str, default None) – Name of feature into which to accumulate neighbors’ influences as weight for ablated cases. If unspecified, will not accumulate weights.

  • batch_size (int or None, optional) – Define the number of cases to train at once. If left unspecified, the batch size will be determined automatically.

  • derived_features (list of str, optional) – List of feature names for which values should be derived in the specified order. If this list is not provided, features with the ‘auto_derive_on_train’ feature attribute set to True will be auto-derived. If provided an empty list, no features are derived. Any derived_features that are already in the ‘features’ list will not be derived since their values are being explicitly provided.

  • features (list of str, optional) –

    A list of feature names. This parameter should be provided in the following scenarios:

    1. When cases are not in the format of a DataFrame, or the DataFrame does not define named columns.

    2. You want to train only a subset of columns defined in your cases DataFrame.

    3. You want to re-order the columns that are trained.

  • input_is_substituted (bool, default False) – If True assumes provided nominal feature values have already been substituted.

  • progress_callback (callable or None, optional) – (Optional) A callback method that will be called before each batched call to train and at the end of training. The method is given a ProgressTimer containing metrics on the progress and timing of the train operation.

  • series (str, optional) – The name of the series to pull features and case values from internal series storage. If specified, trains on all cases that are stored in the internal series store for the specified series. The trained feature set is the combined features from storage and the passed in features. If cases is of length one, the value(s) of this case are appended to all cases in the series. If cases is the same length as the series, the value of each case in cases is applied in order to each of the cases in the series.

  • train_weights_only (bool, default False) – When true, and accumulate_weight_feature is provided, will accumulate all of the cases’ neighbor weights instead of training the cases into the model.

  • validate (bool, default True) – Whether to validate the data against the provided feature attributes. Issues warnings if there are any discrepancies between the data and the features dictionary.

Return type:

None

unload()#

Unload the trainee.

Deprecated since version 1.0.0: Use Trainee.release_resources() instead.

Return type:

None

update()#

Update the remote trainee with local state.

Return type:

None

diveplane.reactor.delete_trainee(name_or_id, *, client=None)#

Delete an existing trainee.

Parameters:
  • name_or_id (str) – The name or id of the trainee.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

Return type:

None

diveplane.reactor.get_active_session(*, client=None)#

Get the active session.

Parameters:

client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

Returns:

The session instance.

Return type:

Session

diveplane.reactor.get_client()#

Get the active Diveplane client instance.

Returns:

The active client.

Return type:

DiveplanePandasClient

diveplane.reactor.get_session(session_id, *, client=None)#

Get an existing Session.

Parameters:
  • session_id (str) – The id of the session.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

Returns:

The session instance.

Return type:

Session

diveplane.reactor.get_trainee(name_or_id, *, client=None)#

Get an existing trainee.

Parameters:
  • name_or_id (str) – The name or id of the trainee.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

Returns:

The trainee instance.

Return type:

Trainee

diveplane.reactor.list_sessions(search_terms=None, *, client=None, project=None)#

Get listing of Sessions.

Parameters:
  • search_terms (str) – Terms to filter results by.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

  • project (str or Project, optional) – The instance or id of a project to filter by. Ignored if client does not support projects.

Returns:

The list of session instances.

Return type:

list of Session

diveplane.reactor.list_trainees(search_terms=None, *, client=None, project=None)#

Get listing of available trainees.

This method only returns a simplified informational listing of available trainees, not full reactor Trainee instances. To get a Trainee instance that can be used with the reactor API call get_trainee.

Parameters:
  • search_terms (str) – Terms to filter results by.

  • client (AbstractDiveplaneClient, optional) – The Diveplane client instance to use.

  • project (str or Project, optional) – The instance or id of a project to filter by.

Returns:

The list of available trainees.

Return type:

list of TraineeIdentity

diveplane.reactor.use_client(client)#

Set the active Diveplane client instance to use for the API.

Parameters:

client (AbstractDiveplaneClient) – The client instance.

Return type:

None

Raises:

ValueError – When the client is not an instance of AbstractDiveplaneClient.