Options¶

Use options to capture parameters (i.e.: configurations) for the run. The <Model>Options class captures the native parameters that each model needs to be instantiated, and the to_nextmv() method allows you to convert them to nextmv options, for convenience.

Dummy¶

Reference

Find the reference for the dummy.options module here.

from nextmv_sklearn import dummy

options = dummy.DummyRegressorOptions().to_nextmv()
options.parse()

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -strategy {mean,median,quantile,constant}, --strategy {mean,median,quantile,constant}
                        [env var: STRATEGY] (type: str): Strategy to use to generate predictions.
  -constant CONSTANT, --constant CONSTANT
                        [env var: CONSTANT] (type: float): The explicit constant as predicted by the "constant" strategy.
  -quantile QUANTILE, --quantile QUANTILE
                        [env var: QUANTILE] (type: float): The quantile to predict using the "quantile" strategy.

Ensemble¶

Reference

Find the reference for the ensemble.options module here.

GradientBoostingRegressor

from nextmv_sklearn import ensemble

options = ensemble.GradientBoostingRegressorOptions().to_nextmv()
options.parse()

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -loss {squared_error,absolute_error,huber,quantile}, --loss {squared_error,absolute_error,huber,quantile}
                        [env var: LOSS] (type: str): Loss function to be optimized.
  -learning_rate LEARNING_RATE, --learning_rate LEARNING_RATE
                        [env var: LEARNING_RATE] (type: float): Learning rate shrinks the contribution of each tree by learning_rate.
  -n_estimators N_ESTIMATORS, --n_estimators N_ESTIMATORS
                        [env var: N_ESTIMATORS] (type: int): The number of boosting stages to perform.
  -subsample SUBSAMPLE, --subsample SUBSAMPLE
                        [env var: SUBSAMPLE] (type: float): The fraction of samples to be used for fitting the individual base learners.
  -criterion {friedman_mse,squared_error}, --criterion {friedman_mse,squared_error}
                        [env var: CRITERION] (type: str): The function to measure the quality of a split.
  -min_samples_split MIN_SAMPLES_SPLIT, --min_samples_split MIN_SAMPLES_SPLIT
                        [env var: MIN_SAMPLES_SPLIT] (type: int): The minimum number of samples required to split an internal node.
  -min_samples_leaf MIN_SAMPLES_LEAF, --min_samples_leaf MIN_SAMPLES_LEAF
                        [env var: MIN_SAMPLES_LEAF] (type: int): The minimum number of samples required to be at a leaf node.
  -min_weight_fraction_leaf MIN_WEIGHT_FRACTION_LEAF, --min_weight_fraction_leaf MIN_WEIGHT_FRACTION_LEAF
                        [env var: MIN_WEIGHT_FRACTION_LEAF] (type: float): The minimum weighted fraction of the sum total of weights required to be at a leaf node.
  -max_depth MAX_DEPTH, --max_depth MAX_DEPTH
                        [env var: MAX_DEPTH] (type: int): Maximum depth of the individual regression estimators.
  -min_impurity_decrease MIN_IMPURITY_DECREASE, --min_impurity_decrease MIN_IMPURITY_DECREASE
                        [env var: MIN_IMPURITY_DECREASE] (type: float): A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
  -random_state RANDOM_STATE, --random_state RANDOM_STATE
                        [env var: RANDOM_STATE] (type: int): Controls the random seed given to each Tree estimator at each boosting iteration.
  -max_features MAX_FEATURES, --max_features MAX_FEATURES
                        [env var: MAX_FEATURES] (type: int): The number of features to consider when looking for the best split.
  -alpha ALPHA, --alpha ALPHA
                        [env var: ALPHA] (type: float): The alpha-quantile of the huber loss function and the quantile loss function.
  -max_leaf_nodes MAX_LEAF_NODES, --max_leaf_nodes MAX_LEAF_NODES
                        [env var: MAX_LEAF_NODES] (type: int): Grow trees with max_leaf_nodes in best-first fashion.
  -warm_start WARM_START, --warm_start WARM_START
                        [env var: WARM_START] (type: bool): When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise,
                        just erase the previous solution.
  -validation_fraction VALIDATION_FRACTION, --validation_fraction VALIDATION_FRACTION
                        [env var: VALIDATION_FRACTION] (type: float): The proportion of training data to set aside as validation set for early stopping.
  -n_iter_no_change N_ITER_NO_CHANGE, --n_iter_no_change N_ITER_NO_CHANGE
                        [env var: N_ITER_NO_CHANGE] (type: int): n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is
                        not improving.
  -tol TOL, --tol TOL   [env var: TOL] (type: float): Tolerance for the early stopping.
  -ccp_alpha CCP_ALPHA, --ccp_alpha CCP_ALPHA
                        [env var: CCP_ALPHA] (type: float): Complexity parameter used for Minimal Cost-Complexity Pruning.

RandomForestRegressor

from nextmv_sklearn import ensemble

options = ensemble.RandomForestRegressorOptions().to_nextmv()
options.parse()

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -n_estimators N_ESTIMATORS, --n_estimators N_ESTIMATORS
                        [env var: N_ESTIMATORS] (type: int): The number of trees in the forest.
  -criterion {squared_error,absolute_error,friedman_mse,poisson}, --criterion {squared_error,absolute_error,friedman_mse,poisson}
                        [env var: CRITERION] (type: str): The function to measure the quality of a split.
  -max_depth MAX_DEPTH, --max_depth MAX_DEPTH
                        [env var: MAX_DEPTH] (type: int): The maximum depth of the tree.
  -min_samples_split MIN_SAMPLES_SPLIT, --min_samples_split MIN_SAMPLES_SPLIT
                        [env var: MIN_SAMPLES_SPLIT] (type: int): The minimum number of samples required to split an internal node.
  -min_samples_leaf MIN_SAMPLES_LEAF, --min_samples_leaf MIN_SAMPLES_LEAF
                        [env var: MIN_SAMPLES_LEAF] (type: int): The minimum number of samples required to be at a leaf node.
  -min_weight_fraction_leaf MIN_WEIGHT_FRACTION_LEAF, --min_weight_fraction_leaf MIN_WEIGHT_FRACTION_LEAF
                        [env var: MIN_WEIGHT_FRACTION_LEAF] (type: float): The minimum weighted fraction of the sum total of weights required to be at a leaf node.
  -max_features MAX_FEATURES, --max_features MAX_FEATURES
                        [env var: MAX_FEATURES] (type: int): The number of features to consider when looking for the best split.
  -max_leaf_nodes MAX_LEAF_NODES, --max_leaf_nodes MAX_LEAF_NODES
                        [env var: MAX_LEAF_NODES] (type: int): Grow trees with max_leaf_nodes in best-first fashion.
  -min_impurity_decrease MIN_IMPURITY_DECREASE, --min_impurity_decrease MIN_IMPURITY_DECREASE
                        [env var: MIN_IMPURITY_DECREASE] (type: float): A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
  -bootstrap BOOTSTRAP, --bootstrap BOOTSTRAP
                        [env var: BOOTSTRAP] (type: bool): Whether bootstrap samples are used when building trees.
  -oob_score OOB_SCORE, --oob_score OOB_SCORE
                        [env var: OOB_SCORE] (type: bool): Whether to use out-of-bag samples to estimate the generalization score.
  -n_jobs N_JOBS, --n_jobs N_JOBS
                        [env var: N_JOBS] (type: int): The number of jobs to run in parallel.
  -random_state RANDOM_STATE, --random_state RANDOM_STATE
                        [env var: RANDOM_STATE] (type: int): Controls both the randomness of the bootstrapping of the samples used when building trees and the sampling of the
                        features.
  -verbose VERBOSE, --verbose VERBOSE
                        [env var: VERBOSE] (type: int): Controls the verbosity when fitting and predicting.
  -warm_start WARM_START, --warm_start WARM_START
                        [env var: WARM_START] (type: bool): When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise,
                        just erase the previous solution.
  -ccp_alpha CCP_ALPHA, --ccp_alpha CCP_ALPHA
                        [env var: CCP_ALPHA] (type: float): Complexity parameter used for Minimal Cost-Complexity Pruning.
  -max_samples MAX_SAMPLES, --max_samples MAX_SAMPLES
                        [env var: MAX_SAMPLES] (type: int): If bootstrap is True, the number of samples to draw from X to train each base estimator.
  -monotonic_cst MONOTONIC_CST, --monotonic_cst MONOTONIC_CST
                        [env var: MONOTONIC_CST] (type: int): Indicates the monotonicity constraint to enforce on each feature.

Linear model¶

Reference

Find the reference for the linear_model.options module here.

from nextmv_sklearn import linear_model

options = linear_model.LinearRegressionOptions().to_nextmv()
options.parse()

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -fit_intercept FIT_INTERCEPT, --fit_intercept FIT_INTERCEPT
                        [env var: FIT_INTERCEPT] (type: bool): Whether to calculate the intercept for this model.
  -copy_X COPY_X, --copy_X COPY_X
                        [env var: COPY_X] (type: bool): If True, X will be copied; else, it may be overwritten.
  -n_jobs N_JOBS, --n_jobs N_JOBS
                        [env var: N_JOBS] (type: int): The number of jobs to use for the computation.
  -positive POSITIVE, --positive POSITIVE
                        [env var: POSITIVE] (type: bool): When set to True, forces the coefficients to be positive.

Neural network¶

Reference

Find the reference for the neural_network.options module here.

from nextmv_sklearn import neural_network

options = neural_network.MLPRegressorOptions().to_nextmv()
options.parse()

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -hidden_layer_sizes HIDDEN_LAYER_SIZES, --hidden_layer_sizes HIDDEN_LAYER_SIZES
                        [env var: HIDDEN_LAYER_SIZES] (type: str): The ith element represents the number of neurons in the ith hidden layer. (e.g. "1,2,3")
  -activation {identity,logistic,tanh,relu}, --activation {identity,logistic,tanh,relu}
                        [env var: ACTIVATION] (type: str): Activation function for the hidden layer.
  -solver {lbfgs,sgd,adam}, --solver {lbfgs,sgd,adam}
                        [env var: SOLVER] (type: str): The solver for weight optimization.
  -alpha ALPHA, --alpha ALPHA
                        [env var: ALPHA] (type: float): Strength of the L2 regularization term.
  -batch_size BATCH_SIZE, --batch_size BATCH_SIZE
                        [env var: BATCH_SIZE] (type: int): Size of minibatches for stochastic optimizers.
  -learning_rate {constant,invscaling,adaptive}, --learning_rate {constant,invscaling,adaptive}
                        [env var: LEARNING_RATE] (type: str): Learning rate schedule for weight updates.
  -learning_rate_init LEARNING_RATE_INIT, --learning_rate_init LEARNING_RATE_INIT
                        [env var: LEARNING_RATE_INIT] (type: float): The initial learning rate used.
  -power_t POWER_T, --power_t POWER_T
                        [env var: POWER_T] (type: float): The exponent for inverse scaling learning rate.
  -max_iter MAX_ITER, --max_iter MAX_ITER
                        [env var: MAX_ITER] (type: int): Maximum number of iterations.
  -shuffle SHUFFLE, --shuffle SHUFFLE
                        [env var: SHUFFLE] (type: bool): Whether to shuffle samples in each iteration.
  -random_state RANDOM_STATE, --random_state RANDOM_STATE
                        [env var: RANDOM_STATE] (type: int): Determines random number generation for weights and bias initialization, train-test split if early stopping is used,
                        and batch sampling when solver='sgd' or 'adam'.
  -tol TOL, --tol TOL   [env var: TOL] (type: float): Tolerance for the optimization.
  -verbose VERBOSE, --verbose VERBOSE
                        [env var: VERBOSE] (type: bool): Whether to print progress messages to stdout.
  -warm_start WARM_START, --warm_start WARM_START
                        [env var: WARM_START] (type: bool): When set to True, reuse the solution of the previous call to fit as initialization.
  -momentum MOMENTUM, --momentum MOMENTUM
                        [env var: MOMENTUM] (type: float): Momentum for gradient descent update.
  -nesterovs_momentum NESTEROVS_MOMENTUM, --nesterovs_momentum NESTEROVS_MOMENTUM
                        [env var: NESTEROVS_MOMENTUM] (type: bool): Whether to use Nesterov's momentum.
  -early_stopping EARLY_STOPPING, --early_stopping EARLY_STOPPING
                        [env var: EARLY_STOPPING] (type: bool): Whether to use early stopping to terminate training when validation score is not improving.
  -validation_fraction VALIDATION_FRACTION, --validation_fraction VALIDATION_FRACTION
                        [env var: VALIDATION_FRACTION] (type: float): The proportion of training data to set aside as validation set for early stopping.
  -beta_1 BETA_1, --beta_1 BETA_1
                        [env var: BETA_1] (type: float): Exponential decay rate for estimates of first moment vector in adam.
  -beta_2 BETA_2, --beta_2 BETA_2
                        [env var: BETA_2] (type: float): Exponential decay rate for estimates of second moment vector in adam.
  -epsilon EPSILON, --epsilon EPSILON
                        [env var: EPSILON] (type: float): Value for numerical stability in adam.
  -n_iter_no_change N_ITER_NO_CHANGE, --n_iter_no_change N_ITER_NO_CHANGE
                        [env var: N_ITER_NO_CHANGE] (type: int): Maximum number of epochs to not meet tol improvement.
  -max_fun MAX_FUN, --max_fun MAX_FUN
                        [env var: MAX_FUN] (type: int): Only used when solver='lbfgs'.

Tree¶

Reference

Find the reference for the tree.options module here.

from nextmv_sklearn import tree

options = tree.DecisionTreeRegressorOptions().to_nextmv()
options.parse()

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -criterion {squared_error,friedman_mse,absolute_error,poisson}, --criterion {squared_error,friedman_mse,absolute_error,poisson}
                        [env var: CRITERION] (default: squared_error) (type: str): The function to measure the quality of a split.
  -splitter {best,random}, --splitter {best,random}
                        [env var: SPLITTER] (default: best) (type: str): The strategy used to choose the split at each node.
  -max_depth MAX_DEPTH, --max_depth MAX_DEPTH
                        [env var: MAX_DEPTH] (type: int): The maximum depth of the tree.
  -min_samples_split MIN_SAMPLES_SPLIT, --min_samples_split MIN_SAMPLES_SPLIT
                        [env var: MIN_SAMPLES_SPLIT] (type: int): The minimum number of samples required to split an internal node.
  -min_samples_leaf MIN_SAMPLES_LEAF, --min_samples_leaf MIN_SAMPLES_LEAF
                        [env var: MIN_SAMPLES_LEAF] (type: int): The minimum number of samples required to be at a leaf node.
  -min_weight_fraction_leaf MIN_WEIGHT_FRACTION_LEAF, --min_weight_fraction_leaf MIN_WEIGHT_FRACTION_LEAF
                        [env var: MIN_WEIGHT_FRACTION_LEAF] (type: float): The minimum weighted fraction of the sum total of weights required to be at a leaf node.
  -max_features MAX_FEATURES, --max_features MAX_FEATURES
                        [env var: MAX_FEATURES] (type: int): The number of features to consider when looking for the best split.
  -random_state RANDOM_STATE, --random_state RANDOM_STATE
                        [env var: RANDOM_STATE] (type: int): Controls the randomness of the estimator.
  -max_leaf_nodes MAX_LEAF_NODES, --max_leaf_nodes MAX_LEAF_NODES
                        [env var: MAX_LEAF_NODES] (type: int): Grow a tree with max_leaf_nodes in best-first fashion.
  -min_impurity_decrease MIN_IMPURITY_DECREASE, --min_impurity_decrease MIN_IMPURITY_DECREASE
                        [env var: MIN_IMPURITY_DECREASE] (type: float): A node will be split if this split induces a decrease of the impurity #.
  -ccp_alpha CCP_ALPHA, --ccp_alpha CCP_ALPHA
                        [env var: CCP_ALPHA] (type: float): Complexity parameter used for Minimal Cost-Complexity Pruning.

Merge options together¶

You can merge nextmv.Options together using the merge method.

from nextmv_sklearn import dummy, linear_model

opt1 = linear_model.LinearRegressionOptions().to_nextmv()
opt2 = dummy.DummyRegressorOptions().to_nextmv()

options = opt1.merge(opt2)

$ python main.py --help
usage: main.py [options]

Options for main.py. Use command-line arguments (highest precedence) or environment variables.

options:
  -h, --help            show this help message and exit
  -fit_intercept FIT_INTERCEPT, --fit_intercept FIT_INTERCEPT
                        [env var: FIT_INTERCEPT] (type: bool): Whether to calculate the intercept for this model.
  -copy_X COPY_X, --copy_X COPY_X
                        [env var: COPY_X] (type: bool): If True, X will be copied; else, it may be overwritten.
  -n_jobs N_JOBS, --n_jobs N_JOBS
                        [env var: N_JOBS] (type: int): The number of jobs to use for the computation.
  -positive POSITIVE, --positive POSITIVE
                        [env var: POSITIVE] (type: bool): When set to True, forces the coefficients to be positive.
  -strategy {mean,median,quantile,constant}, --strategy {mean,median,quantile,constant}
                        [env var: STRATEGY] (type: str): Strategy to use to generate predictions.
  -constant CONSTANT, --constant CONSTANT
                        [env var: CONSTANT] (type: float): The explicit constant as predicted by the "constant" strategy.
  -quantile QUANTILE, --quantile QUANTILE
                        [env var: QUANTILE] (type: float): The quantile to predict using the "quantile" strategy.

Notice how the LinearRegressionOptions are merged with the DummyRegressorOptions and you can access the options from both sets.