# Recipe 6: Instance Based Learning
## Overview 

Recipes 1-5 provided an overview of the many Diveplane <span style="color:orange">REACTOR™</span> capabilties. In many of these recipes, we refer to <span style="color:orange">REACTOR™</span>'s system as instance-based learning. Many of these methods used are only available in <span style="color:orange">REACTOR™</span> due to the unique combination of instance based learning combined with information theory. This notebook will provide a deeper demonstration of one of Diveplane <span style="color:orange">REACTOR™</span>'s functionalities that arises from instance-based learning.

Since most modern methods use model-based learning, many of the standard practices in machine learning revolve around model-based approaches. There are also many commonly accepted machine learning paradigms that also arise from this model-based learning approach. One of the most commonly used practices is a train-test split for model validation. Since Diveplane <span style="color:orange">REACTOR™</span> performs an accuracy validation calculation that surpasses the scope of many train-test splits, we recommend using Diveplane <span style="color:orange">REACTOR™</span>'s Feature Residuals instead of using a train-test split. As many of our methods and recommendations, including this, may seem counterintuitive compared to standard machine learning practices, it is important for us to provide empirical validation.

Thus, this Recipe is designed to provide context and proof for Diveplane's recommendation that train-test splits are uncessary for Trainee evaluation through the use of our `Feature Residuals`. This recommendation is to save the user time, remove complexity, and alleviate possible data constraints. Users are always welcome to continue using train-test validation splits but it our hope is to provide ample evidence that a train-test split does not provide any benefit when using <span style="color:orange">REACTOR™</span>.

## Recipe Goals: 
This recipe conducts an experiment to provide evidence Diveplane <span style="color:orange">REACTOR™</span>'s recommendation to use Feature Residuals instead of train-test splits. This is done not only to justify the recommendation, but also provide a further details on instance-based learning by showing an example of how Diveplane <span style="color:orange">REACTOR™</span> leverages instance-based learning's unique attributes.

In [1]:
import pandas as pd
from pmlb import fetch_data
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split

from diveplane import reactor
from diveplane.utilities import infer_feature_attributes

# Instance-Based Learning

Instance based learning is a type of learning algorithms that compares new instances (or data) with instances seen during training, which have been stored in memory. One of our favorite phrases is "the model is the data" when refering to <span style="color:orange">REACTOR™</span>'s used of instance-based learning. Ultimately, instance-based learning allows <span style="color:orange">REACTOR™</span> to provide a system that is truly representative of the data, instead of a generalization in the form of a model. 

In modern machine learning, model-based learning methods dominate the industry. This is due to model-based learning methods generally having better generalization and performance. Traditionally, some model-based methods present better explainability, for example the coefficients of regression models provide information on each feature. As machine learning methods advance, the traditional easy to compute, closed-form solutions have often given way to powerful and complex algorithms such as Neural Networks. These methods can offer impressive performance, but often at an even bigger tradeoff for interpretability. 

Due to Divepane <span style="color:orange">REACTOR™</span>'s advanced querying system and novel use of information theory, <span style="color:orange">REACTOR™</span> has overcome many of the challenges associated with instance-based learning, allowing it to take advantage of its existing advantages without sacrificing performance. This recipe is not designed to go deep into the underlying math. For more information, please refer to Diveplane's whitepaper <em>Natively Interpretable Machine Learning and Artificial Intelligence: Preliminary Results and Future Directions. </em> (Hazard et. al. 2019)
`https://arxiv.org/abs/1901.00246`.


# Residuals

Why are instance-based models so flexible? Because the data is memorized or saved in the Trainee, dynamic editing and calculation is enabled. 

> Note: Residuals and Error are often used interchangably. Even within, Diveplane <span style="color:orange">REACTOR™</span>, there are multiple forms of Residuals. To clarify, Feature Residuals can be considered the exact same metric as the Mean Absolute Error (MAE). We will use the term Feature Residuals when refering to Diveplane <span style="color:orange">REACTOR™</span> calculations and Error or MAE when refering to calculations from non-<span style="color:orange">REACTOR™</span> models. However, from a mathematics standpoint, Feature Residuals, Error, and MAE are the equivalent, which will be demonstrated.

In traditional machine learning workflows, a train-test split is used to validate the model and measure its performance. If a test dataset is not used, testing on the training data will often over-inflate the validation results because the model has seen the data before. Train-test splits are also often done in large chunks, with a 20% test set being a common heuristic. Unless a user uses enough train-test split iterations to ensure that every data point has been tested, there is also the danger of undercoverage in terms of validation. The difficulty with taking more chunks is that a model needs to be retrained every time, thus every model is different. When all of the test set results are aggregated, how do you aggregate the models? This is often impossible to do and usually the results are an indicator of how well the underlying method, e.g., regression, decision tree, Neural Network etc., and its corresponding set of hyperparameters is at creating a model for the data, but not necessarily how good a specific model is.

How does <span style="color:orange">REACTOR™</span> provide a solution to these issues? Instance-based learning allows Diveplane to hold out every Case one at a time and predict a feature for that Case. After the prediction, the Case is return and instantly becomes part of the Trainee again. By repeating and doing this for every case, Reactor provides a comprehensive `Feature Residual` calculation that emcompasses every single Case. Since the held out Case becomes part of the Trainee again, there is no need to retrain the Trainee. The `Feature Residual` represents one of the most comprehensive forms of validation, leave-one-out, that is often times prohibitively expensive in other models. The results are also directly reflective of the performance of the current Trainee, and thus model, so it can be immediately used with exact knowledge of its current state of performance.

# Section 1. Setup

We will continue to use the `Adult` dataset. In previous receipes, we explicitly used `target` as our Action Feature, however for this experiment, we will iterate through all of the features and use one feature every iteration until every feature has been used as an Action Feature.

In [2]:
df = fetch_data('adult', local_cache_dir="data/adult")

# Subsample the data to ensure the example runs quickly
df = df.sample(500)

features = infer_feature_attributes(df)

# Section 2. Experiment

We will demonstrate that <span style="color:orange">REACTOR™</span>'s Global Feature Residuals is equivalent to calculating the prediction MAE.  

For every iteration:  

> 1. We will create, train, and analyze a Trainee
> 2. We will then extract the Global Feature Residuals. Under Diveplane's recommended workflow, this would be the only metric needed for validation

To test, within every iteration, for every feature:

> 3. We split the test set and use that feature as the Action Feature and every other feature as Context Features
> 4. We compute the MAE either through the prediction results or the `categorical_action_probabilities` for nominal Action Features

Finally:    
> 5. We compare the Global Feature Residuals with the MAE for every feature

In the following code, we will use more in-line comments as we cannot break up this section of code.

In [3]:
# Metrics
result_columns = ['Global Feature Residuals', 'Prediction MAE']

# Create the results holder
all_df_results = dict()
for feature in list(df):
    all_df_results[feature] = pd.DataFrame(columns=result_columns) 

# Main experiment loop   
for run in range(0,9):
    print("Begin Run: ", run+1)
    
    """
    We start by using a train-test split. The goal of this recipe is to demonstrate
    that this step is no longer needed, however our experiment will use it for 
    comparison.
    """


    train_df, test_df = train_test_split(df, test_size=0.2) #,random_state=42)

    # Create the Trainee
    trainee = reactor.Trainee(features=features)

    # Analyze and Train the Trainee on the test dataset
    trainee.train(train_df)
    trainee.analyze()
    
    """
    These Residuals are the full residuals that the Trainee automatically calculates when using
    `reacting_into_trainee` and the Residual that Diveplane recommends as a replacement for the
    manual Mean Absolute Error calculations from a train-test split.
    """
    # Compute and return full residuals
    trainee.react_into_trainee(residuals=True)
    residuals = trainee.get_prediction_stats(robust=False, stats=['mae'])
    
    """
    To provide a robust comparison, we loop through every feature. In each loop, one feature is
    selected as the Action Feature and the remaining features are the Context Features that predict
    this Action Feature, exactly like the standard workflow for predicting a target feature.
    """
    for feature in list(df):

        action_features = [feature]

        # Split the data into context features (X) and action feature (y)
        X_test = test_df.drop(feature, axis=1)
        y_test = test_df[action_features[0]]

        context_features = X_test.columns.tolist()  

        # Designate whether the feature is continous or nominal for metric selection 
        feature_is_nominal = features[feature]['type'] == 'nominal'

        residual = residuals[feature][0]
        """
        Since we are also predicting nominal features, categorical_action_probabilities are used to
        calculate the MAE instead of exact predictions as it provides a more accurate representation
        of the model prediction. A more detailed explanation is available later in this section.
        """
        details={'categorical_action_probabilities': feature_is_nominal, 'feature_residuals': True}

        results = trainee.react(
            X_test, 
            context_features=context_features, 
            action_features=action_features,
            details=details
        )
        
        # Determine metric
        if feature_is_nominal:
            """
            For nominal features, we can use the `categorical_action_probabilities` to get a better 
            estimate of the MAE. `categorical_action_probabilities` tells use the probability the Trainee
            assigns for each possible outcome. For example, if we are predicting a nominal feature where the
            possible outcomes are the nominal integers of 1 and 2, then each one of those outcomes will be 
            associated with a probability of prediction. This means that if nominal 1 has a probability of 0.55,
            then the Trainee believes that the case has a 55% chance of being a 1. Since the Trainee has to
            predict a value for the case, the final predicted value is the possible value with the highest
            probability.

            This gives us a more accurate measurement of the Mean Absolute Error of the Trainee. If we only use the 
            actual prediction in our previous example, it might give us a skewed estimate, especially towards smaller
            sample sizes. If our entire test set was 2 samples with identical probabilities like the example and both
            of them were predicted wrong (predicted 1, actual 0), then using the raw prediction would give us a MAE of 
            2 while using `categorical_action_probabilities` will give us 1.1. For the opposite case it would give the 
            Trainee too much credit.

            Since the prediction was essentialy a coin flip, using `categorical_action_probabilities` will ensure that 
            the accuracy reflects that model uncertainity. 
            """
            enumerated_explanation = enumerate(results['explanation']['categorical_action_probabilities'])
            
            # Create/reset the counter
            total = 0
            for i, prob in enumerated_explanation:
                actual = y_test.iloc[i]
                predicted = 1 - prob[feature][str(actual)] if str(actual) in prob[feature] else 1
                total += predicted
                accuracy =  total / len(y_test)

        else:
            accuracy = mean_absolute_error(y_test, results['action'][feature])
        
        # Package results into a DataFrame
        result_df = all_df_results[feature]
        result_df = pd.concat([pd.DataFrame([[residual, accuracy]], columns=result_columns), result_df], ignore_index=True)    
        all_df_results[feature] = result_df
    
    trainee.release_resources()

Begin Run:  1
Begin Run:  2
Begin Run:  3
Begin Run:  4
Begin Run:  5
Begin Run:  6
Begin Run:  7
Begin Run:  8
Begin Run:  9


# Step 3. Results

We aggregate all the iterations by the mean of both metrics for each feature.

In [4]:
final_output = pd.DataFrame()
for feature in list(df):
    final_output = pd.concat(
            [pd.DataFrame([[feature] + list(all_df_results[feature].mean().values)], columns=['feature']+result_columns), final_output], 
            ignore_index=True
    ) 

final_output

Unnamed: 0,feature,Global Feature Residuals,Prediction MAE
0,target,0.239342,0.244415
1,native-country,3.552339,3.966667
2,hours-per-week,9.104764,9.07
3,capital-loss,96.96281,102.44
4,capital-gain,1622.754975,1843.413333
5,sex,0.314256,0.302985
6,race,0.211296,0.233154
7,relationship,0.40958,0.40423
8,occupation,0.809258,0.82105
9,marital-status,0.338857,0.332746


## 3a. Results

We can see that the Global Feature Residuals and the Prediction MAE are extremely close for all features. In this experiment, the Prediction MAE represents the MAE from using a train-test split. These metrics converge even more as the number of runs increase.

Using just the Global Feature Residuals has several efficiency benefits for the user. In addition to saving the user time and effort, it also allows the user to use all of the available data in training the Trainee. The smaller the dataset, the more important it is, as holding out 20% can represent a significant chunk of a small dataset. 

Conceptually, Global Feature Residuals represents a better representation of the Trainee's true performance. In machine learning, testing each individual case against a Trainee/model is often considered the most accurate form of validation called Leave-One-Out. However, this is very time consuming so generally data is tested in chunks, hence the use of the train-test split. Since Diveplane <span style="color:orange">REACTOR™</span> already performs this Leave-One-Out validation, it allows users to take advantage of this method withou having to do it manually. 

Instance-based learning allows <span style="color:orange">REACTOR™</span> to do this effectively. For model-based learning methods performing Leave-One-Out, the model will need to be retrained every single case. In <span style="color:orange">REACTOR™</span>, predicting a case for feature residuals is equivalent to removing the case as demonstrated in `4-audit_edit` and then predicting the case using the Trainee with that case removed without needed to re-train or re-analyze. 


# 4. Conclusion and Next Steps.


This recipe was meant to both convince the user that the Global Feature Residuals is an equivalent substitute for Mean Absolute Error, thus removing the need for train-test split, and to provide further context into the capabilities of instance-based learning. Many of Diveplane <span style="color:orange">REACTOR™</span>'s unique capabilities may seem conceptually different since most are generally taught model-based techniques. By obtaining a deeper understanding of instance based-learning and Diveplane <span style="color:orange">REACTOR™</span>, we can further develop use cases and capabilities that have not been previously explored and push the boundaries of Diveplane <span style="color:orange">REACTOR™</span>'s capabilities.

Thank you for using <span style="color:orange">REACTOR™</span> we hope these Recipes have been useful to you. We here at Diveplane are beyond excited at the possibilities that you will come up with! The next step is to start exploring!

# 