Note
Go to the end to download the full example code
Scikit Example#
An example to demonstrate the usage of Diveplane in “traditional” ML ways.
The diveplane python package extends the scikit-learn Estimator via the following classes:
DiveplaneEstimator provides users with a Python interface that follows the conventions of sklearn estimators. For use
of Diveplane’s functionality use diveplane.reactor.Trainee
.
This is a simple example on how to use the diveplane.scikit.DiveplaneRegressor
which extends the
diveplane.scikit.DiveplaneEstimator
to fit data and make predictions based on that data.
Reading breast cancer data set.
Target values encoded from [0, 1] to [0, 1].
Training on a random selection of 80% of the data.
Scoring against 20% reserve test data:
0.9635036496350365
Getting details for most similar cases from the first prediction:
[{'.distance': 33.970562748477136,
'.session': 'cc890592-a381-4bf9-a7a4-e0c9fa18e0b0',
'.session_training_index': 11,
'0': 6,
'1': 5,
'2': 5,
'3': 8,
'4': 4,
'5': 10,
'6': 3,
'7': 4,
'8': 1,
'y': '1'},
{'.distance': 38.970562748477136,
'.session': 'cc890592-a381-4bf9-a7a4-e0c9fa18e0b0',
'.session_training_index': 303,
'0': 5,
'1': 4,
'2': 6,
'3': 10,
'4': 2,
'5': 10,
'6': 4,
'7': 1,
'8': 1,
'y': '1'},
{'.distance': 39.41224019560975,
'.session': 'cc890592-a381-4bf9-a7a4-e0c9fa18e0b0',
'.session_training_index': 504,
'0': 8,
'1': 7,
'2': 6,
'3': 4,
'4': 4,
'5': 10,
'6': 5,
'7': 1,
'8': 1,
'y': '1'}]
Getting class probabilities and classes for the model:
array([[0. , 1. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[0.65901916, 0.34098084],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0.26946653, 0.73053347],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[0.2673567 , 0.7326433 ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0.84454321, 0.15545679],
[1. , 0. ],
[1. , 0. ],
[0.36784126, 0.63215874],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[0.40374078, 0.59625922],
[0.24802117, 0.75197883],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[0. , 1. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[0.60561328, 0.39438672],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[0. , 1. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ],
[1. , 0. ]])
array(['0', '1'], dtype='<U21')
import os
import pandas as pd
from pprint import pprint
from diveplane.scikit import DiveplaneClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
# sphinx_gallery_thumbnail_path = '_static/gallery/scikit.png'
# Get path of breast cancer data included in the python package.
data_path = os.path.join("breast_cancer.csv")
# Read in the data.
print("Reading breast cancer data set.")
df = pd.read_csv(data_path)
# Split the dataset into the feature (X) and targets (y)
X = df.drop('y', axis=1).values.astype(float)
y = df['y'].values.astype(float)
le = LabelEncoder()
le.fit(df['y'])
y = le.transform(df['y'])
print(f"Target values encoded from {list(le.classes_)} to "
f"{list(le.transform(le.classes_))}.")
# Split the dataset into an 80/20 train/test set.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, shuffle=True)
# Create a classifier.
dp = DiveplaneClassifier()
# Fit the training data.
print("Training on a random selection of 80% of the data.")
dp.fit(X_train, y_train)
# Test against the reserved test data.
print("Scoring against 20% reserve test data:")
score = dp.score(X_test, y_test)
# Print the resulting accuracy.
print(score)
# Detailed prediction results
results = dp.describe_prediction(X_test)
print("Getting details for most similar cases from the first prediction:")
pprint(results['explanation']['most_similar_cases'][0])
print("Getting class probabilities and classes for the model:")
probas = dp.predict_proba(X_test)
pprint(probas)
pprint(dp.classes_)
Total running time of the script: ( 0 minutes 21.748 seconds)