Tree Models¶
This section documents the decision tree model components of the nextmv-scikit-learn package.
Model¶
model
¶
Defines sklearn.tree models interoperability.
FUNCTION | DESCRIPTION |
---|---|
DecisionTreeRegressor |
Creates a scikit-learn DecisionTreeRegressor from provided options |
DecisionTreeRegressor
¶
DecisionTreeRegressor(
options: Options,
) -> DecisionTreeRegressor
Creates a sklearn.tree.DecisionTreeRegressor
from the provided options.
You can import the DecisionTreeRegressor
function directly from tree
:
This function uses the options to create a scikit-learn DecisionTreeRegressor model with the specified parameters. It extracts parameter values from the Nextmv options object and passes them to the scikit-learn constructor.
PARAMETER | DESCRIPTION |
---|---|
|
Options for the DecisionTreeRegressor. Can contain the following parameters: - criterion : str, default='squared_error' The function to measure the quality of a split. - splitter : str, default='best' The strategy used to choose the split at each node. - max_depth : int, optional The maximum depth of the tree. - min_samples_split : int, optional The minimum number of samples required to split an internal node. - min_samples_leaf : int, optional The minimum number of samples required to be at a leaf node. - min_weight_fraction_leaf : float, optional The minimum weighted fraction of the sum total of weights required to be at a leaf node. - max_features : int, optional The number of features to consider when looking for the best split. - random_state : int, optional Controls the randomness of the estimator. - max_leaf_nodes : int, optional Grow a tree with max_leaf_nodes in best-first fashion. - min_impurity_decrease : float, optional A node will be split if this split induces a decrease of the impurity. - ccp_alpha : float, optional Complexity parameter used for Minimal Cost-Complexity Pruning.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DecisionTreeRegressor
|
A sklearn.tree.DecisionTreeRegressor instance. |
Examples:
>>> from nextmv_sklearn.tree import DecisionTreeRegressorOptions
>>> from nextmv_sklearn.tree import DecisionTreeRegressor
>>>
>>> # Create options for the regressor
>>> options = DecisionTreeRegressorOptions().to_nextmv()
>>>
>>> # Set specific parameters if needed
>>> options.set("max_depth", 5)
>>> options.set("min_samples_split", 2)
>>>
>>> # Create the regressor model
>>> regressor = DecisionTreeRegressor(options)
>>>
>>> # Use the regressor with scikit-learn API
>>> X = [[0, 0], [1, 1], [2, 2], [3, 3]]
>>> y = [0, 1, 2, 3]
>>> regressor.fit(X, y)
>>> regressor.predict([[4, 4]])
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/model.py
Options¶
options
¶
Defines sklearn.tree options interoperability.
This module provides functionality for interfacing with scikit-learn's tree-based algorithms within the Nextmv framework. It includes classes for configuring decision tree regressors.
CLASS | DESCRIPTION |
---|---|
DecisionTreeRegressorOptions |
Options wrapper for scikit-learn's DecisionTreeRegressor. |
DECISION_TREE_REGRESSOR_PARAMETERS
module-attribute
¶
DECISION_TREE_REGRESSOR_PARAMETERS = [
Option(
name="criterion",
option_type=str,
choices=[
"squared_error",
"friedman_mse",
"absolute_error",
"poisson",
],
description="The function to measure the quality of a split.",
default="squared_error",
),
Option(
name="splitter",
option_type=str,
choices=["best", "random"],
description="The strategy used to choose the split at each node.",
default="best",
),
Option(
name="max_depth",
option_type=int,
description="The maximum depth of the tree.",
),
Option(
name="min_samples_split",
option_type=int,
description="The minimum number of samples required to split an internal node.",
),
Option(
name="min_samples_leaf",
option_type=int,
description="The minimum number of samples required to be at a leaf node.",
),
Option(
name="min_weight_fraction_leaf",
option_type=float,
description="The minimum weighted fraction of the sum total of weights required to be at a leaf node.",
),
Option(
name="max_features",
option_type=int,
description="The number of features to consider when looking for the best split.",
),
Option(
name="random_state",
option_type=int,
description="Controls the randomness of the estimator.",
),
Option(
name="max_leaf_nodes",
option_type=int,
description="Grow a tree with max_leaf_nodes in best-first fashion.",
),
Option(
name="min_impurity_decrease",
option_type=float,
description="A node will be split if this split induces a decrease of the impurity #.",
),
Option(
name="ccp_alpha",
option_type=float,
description="Complexity parameter used for Minimal Cost-Complexity Pruning.",
),
]
List of Nextmv Option objects for configuring a DecisionTreeRegressor.
Each option corresponds to a hyperparameter of the scikit-learn DecisionTreeRegressor, providing a consistent interface for setting up decision tree regression models within the Nextmv ecosystem.
You can import the DECISION_TREE_REGRESSOR_PARAMETERS
directly from tree
:
DecisionTreeRegressorOptions
¶
Options for the sklearn.tree.DecisionTreeRegressor.
You can import the DecisionTreeRegressorOptions
class directly from tree
:
A wrapper class for scikit-learn's DecisionTreeRegressor hyperparameters, providing a consistent interface for configuring decision tree regression models within the Nextmv ecosystem.
ATTRIBUTE | DESCRIPTION |
---|---|
params |
List of Nextmv Option objects corresponding to DecisionTreeRegressor parameters.
TYPE:
|
Examples:
>>> from nextmv_sklearn.tree import DecisionTreeRegressorOptions
>>> options = DecisionTreeRegressorOptions()
>>> nextmv_options = options.to_nextmv()
Initialize a DecisionTreeRegressorOptions instance.
Configures the default parameters for a decision tree regressor.
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/options.py
to_nextmv
¶
Converts the options to a Nextmv options object.
Creates a Nextmv Options instance from the configured decision tree regressor parameters.
RETURNS | DESCRIPTION |
---|---|
Options
|
A Nextmv options object containing all decision tree regressor parameters. |
Examples:
>>> options = DecisionTreeRegressorOptions()
>>> nextmv_options = options.to_nextmv()
>>> # Access options as CLI arguments
>>> # python script.py --criterion squared_error --max_depth 5
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/options.py
Solution¶
solution
¶
Defines sklearn.tree solution interoperability.
This module provides classes for working with scikit-learn tree models.
CLASS | DESCRIPTION |
---|---|
DecisionTreeRegressorSolution |
Represents a scikit-learn DecisionTreeRegressor model, allowing conversion to and from a serializable format. |
DecisionTreeRegressorSolution
¶
Bases: BaseModel
Decision Tree Regressor scikit-learn model representation.
You can import the DecisionTreeRegressorSolution
class directly from tree
:
This class provides functionality to convert between scikit-learn's DecisionTreeRegressor model and a serializable format. It enables saving and loading trained models through dictionaries or JSON.
PARAMETER | DESCRIPTION |
---|---|
|
The inferred value of max_features.
TYPE:
|
|
Number of features seen during fit.
TYPE:
|
|
Names of features seen during fit.
TYPE:
|
|
The number of outputs when fit is performed.
TYPE:
|
|
The underlying Tree object.
TYPE:
|
Examples:
>>> from sklearn.datasets import load_diabetes
>>> from sklearn.tree import DecisionTreeRegressor
>>> from nextmv_sklearn.tree import DecisionTreeRegressorSolution
>>>
>>> # Train a scikit-learn model
>>> X, y = load_diabetes(return_X_y=True)
>>> model = DecisionTreeRegressor().fit(X, y)
>>>
>>> # Convert to solution object
>>> solution = DecisionTreeRegressorSolution.from_model(model)
>>>
>>> # Convert to dictionary for serialization
>>> model_dict = solution.to_dict()
>>>
>>> # Recreate solution from dictionary
>>> restored = DecisionTreeRegressorSolution.from_dict(model_dict["attributes"])
>>>
>>> # Convert back to scikit-learn model
>>> restored_model = restored.to_model()
feature_names_in_
class-attribute
instance-attribute
¶
feature_names_in_: ndarray = None
Names of features seen during fit. Defined only when X has feature names that are all strings.
from_dict
classmethod
¶
from_dict(
data: dict[str, Any],
) -> DecisionTreeRegressorSolution
Creates a DecisionTreeRegressorSolution instance from a dictionary.
PARAMETER | DESCRIPTION |
---|---|
|
Dictionary containing the model attributes.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DecisionTreeRegressorSolution
|
Instance of DecisionTreeRegressorSolution. |
Examples:
>>> solution_dict = {
... "max_features_": 10,
... "n_features_in_": 10,
... "n_outputs_": 1,
... "tree_": "base64encodedtreedata"
... }
>>> solution = DecisionTreeRegressorSolution.from_dict(solution_dict)
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/solution.py
from_model
classmethod
¶
from_model(
model: DecisionTreeRegressor,
) -> DecisionTreeRegressorSolution
Creates a DecisionTreeRegressorSolution instance from a scikit-learn DecisionTreeRegressor model.
PARAMETER | DESCRIPTION |
---|---|
|
scikit-learn DecisionTreeRegressor model.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DecisionTreeRegressorSolution
|
Instance of DecisionTreeRegressorSolution. |
Examples:
>>> from sklearn.datasets import load_diabetes
>>> from sklearn.tree import DecisionTreeRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> model = DecisionTreeRegressor().fit(X, y)
>>> solution = DecisionTreeRegressorSolution.from_model(model)
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/solution.py
max_features_
class-attribute
instance-attribute
¶
The inferred value of max_features.
model_config
class-attribute
instance-attribute
¶
n_features_in_
class-attribute
instance-attribute
¶
Number of features seen during fit.
n_outputs_
class-attribute
instance-attribute
¶
The number of outputs when fit is performed.
to_dict
¶
Convert a data model instance to a dict with associated class info.
RETURNS | DESCRIPTION |
---|---|
dict
|
Dictionary with class information and model attributes. The dictionary has two main keys: - 'class': Contains module and class name information - 'attributes': Contains the serialized model attributes |
Examples:
>>> solution = DecisionTreeRegressorSolution(max_features_=10)
>>> solution_dict = solution.to_dict()
>>> print(solution_dict['class']['name'])
'DecisionTreeRegressorSolution'
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/solution.py
to_model
¶
Transforms the DecisionTreeRegressorSolution instance into a scikit-learn DecisionTreeRegressor model.
RETURNS | DESCRIPTION |
---|---|
DecisionTreeRegressor
|
scikit-learn DecisionTreeRegressor model. |
Examples:
>>> solution = DecisionTreeRegressorSolution(max_features_=10, n_features_in_=10)
>>> model = solution.to_model()
>>> isinstance(model, tree.DecisionTreeRegressor)
True
Source code in nextmv-scikit-learn/nextmv_sklearn/tree/solution.py
Tree
module-attribute
¶
Tree = Annotated[
Tree,
BeforeValidator(lambda x: x),
PlainSerializer(lambda x: b64encode(dumps(x))),
]
Type annotation for handling scikit-learn Tree objects.
This type is annotated with Pydantic validators and serializers to handle the conversion between scikit-learn Tree objects and base64-encoded strings for JSON serialization.
Statistics¶
statistics
¶
Scikit-learn tree module statistics interoperability for Nextmv.
This module provides functionality to integrate scikit-learn tree-based models with Nextmv statistics tracking.
FUNCTION | DESCRIPTION |
---|---|
DecisionTreeRegressorStatistics |
Convert a DecisionTreeRegressor model to Nextmv statistics format. |
DecisionTreeRegressorStatistics
¶
DecisionTreeRegressorStatistics(
model: DecisionTreeRegressor,
X: Iterable,
y: Iterable,
sample_weight: float = None,
run_duration_start: Optional[float] = None,
) -> Statistics
Create a Nextmv statistics object from a scikit-learn DecisionTreeRegressor model.
You can import the DecisionTreeRegressorStatistics
function directly from tree
:
Converts a trained scikit-learn DecisionTreeRegressor model into Nextmv statistics
format. The statistics include model depth, feature importances, number of leaves,
and model score. Additional custom metrics can be added by the user after this
function returns. The optional run_duration_start
parameter can be used to track
the total runtime of the modeling process.
PARAMETER | DESCRIPTION |
---|---|
|
The trained scikit-learn DecisionTreeRegressor model.
TYPE:
|
|
The input features used for scoring the model.
TYPE:
|
|
The target values used for scoring the model.
TYPE:
|
|
The sample weights used for scoring, by default None.
TYPE:
|
|
The timestamp when the model run started, typically from time.time(), by default None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Statistics
|
A Nextmv statistics object containing model performance metrics. |
Examples:
>>> from sklearn.tree import DecisionTreeRegressor
>>> from nextmv_sklearn.tree import DecisionTreeRegressorStatistics
>>> import time
>>>
>>> # Record start time
>>> start_time = time.time()
>>>
>>> # Train model
>>> model = DecisionTreeRegressor(max_depth=5)
>>> model.fit(X_train, y_train)
>>>
>>> # Create statistics
>>> stats = DecisionTreeRegressorStatistics(
... model, X_test, y_test, run_duration_start=start_time
... )
>>>
>>> # Add additional metrics
>>> stats.result.custom["my_custom_metric"] = custom_value