Skip to main content

automl.model

BaseEstimator Objects

class BaseEstimator()

The abstract class for all learners.

Typical examples:

  • XGBoostEstimator: for regression.
  • XGBoostSklearnEstimator: for classification.
  • LGBMEstimator, RandomForestEstimator, LRL1Classifier, LRL2Classifier: for both regression and classification.

__init__

def __init__(task="binary", **config)

Constructor.

Arguments:

  • task - A string of the task type, one of 'binary', 'multiclass', 'regression', 'rank', 'seq-classification', 'seq-regression', 'token-classification', 'multichoice-classification', 'summarization', 'ts_forecast', 'ts_forecast_classification'.
  • config - A dictionary containing the hyperparameter names, 'n_jobs' as keys. n_jobs is the number of parallel threads.

model

@property
def model()

Trained model after fit() is called, or None before fit() is called.

estimator

@property
def estimator()

Get the best trained estimator model.

Returns:

object or None: The trained model obtained after calling the fit() method, representing the best estimator found during the training process. If fit() has not been called yet, it returns None.

Examples:

from flaml import AutoML automl = AutoML() automl.fit(X_train, y_train) best_estimator = automl.model.estimator print(best_estimator) RandomForestClassifier()

Notes:

To access the best estimator, use automl.model.estimator.

feature_names_in_

@property
def feature_names_in_()

if self.model has attribute feature_names_in, return it. otherwise, if self.model has attribute feature_name, return it. otherwise, if self._model has attribute feature_names, return it. otherwise, if self._model has method get_booster, return the feature names. otherwise, return None.

feature_importances_

@property
def feature_importances_()

if self.model has attribute feature_importances, return it. otherwise, if self.model has attribute coef, return it. otherwise, return None.

fit

def fit(X_train, y_train, budget=None, free_mem_ratio=0, **kwargs)

Train the model from given training data.

Arguments:

  • X_train - A numpy array or a dataframe of training data in shape n*m.
  • y_train - A numpy array or a series of labels in shape n*1.
  • budget - A float of the time budget in seconds.
  • free_mem_ratio - A float between 0 and 1 for the free memory ratio to keep during training.

Returns:

  • train_time - A float of the training time in seconds.

predict

def predict(X, **kwargs)

Predict label from features.

Arguments:

  • X - A numpy array or a dataframe of featurized instances, shape n*m.

Returns:

A numpy array of shape n*1. Each element is the label for a instance.

predict_proba

def predict_proba(X, **kwargs)

Predict the probability of each class from features.

Only works for classification problems

Arguments:

  • X - A numpy array of featurized instances, shape n*m.

Returns:

A numpy array of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.

score

def score(X_val: DataFrame, y_val: Series, **kwargs)

Report the evaluation score of a trained estimator.

Arguments:

  • X_val - A pandas dataframe of the validation input data.
  • y_val - A pandas series of the validation label.
  • kwargs - keyword argument of the evaluation function, for example:
    • metric: A string of the metric name or a function e.g., 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'f1', 'micro_f1', 'macro_f1', 'log_loss', 'mae', 'mse', 'r2', 'mape'. Default is 'auto'. If metric is given, the score will report the user specified metric. If metric is not given, the metric is set to accuracy for classification and r2 for regression. You can also pass a customized metric function, for examples on how to pass a customized metric function, please check test/nlp/test_autohf_custom_metric.py and test/automl/test_multiclass.py.

Returns:

The evaluation score on the validation dataset.

search_space

@classmethod
def search_space(cls, data_size, task, **params)

[required method] search space.

Arguments:

  • data_size - A tuple of two integers, number of rows and columns.
  • task - A str of the task type, e.g., "binary", "multiclass", "regression".

Returns:

A dictionary of the search space. Each key is the name of a hyperparameter, and value is a dict with its domain (required) and low_cost_init_value, init_value, cat_hp_cost (if applicable). e.g., {'domain': tune.randint(lower=1, upper=10), 'init_value': 1}.

size

@classmethod
def size(cls, config: dict) -> float

[optional method] memory size of the estimator in bytes.

Arguments:

  • config - A dict of the hyperparameter config.

Returns:

A float of the memory size required by the estimator to train the given config.

cost_relative2lgbm

@classmethod
def cost_relative2lgbm(cls) -> float

[optional method] relative cost compared to lightgbm.

init

@classmethod
def init(cls)

[optional method] initialize the class.

config2params

def config2params(config: dict) -> dict

[optional method] config dict to params dict

Arguments:

  • config - A dict of the hyperparameter config.

Returns:

A dict that will be passed to self.estimator_class's constructor.

SparkEstimator Objects

class SparkEstimator(BaseEstimator)

The base class for fine-tuning spark models, using pyspark.ml and SynapseML API.

fit

def fit(X_train: psDataFrame, y_train: psSeries = None, budget=None, free_mem_ratio=0, index_col: str = "tmp_index_col", **kwargs, ,)

Train the model from given training data.

Arguments:

  • X_train - A pyspark.pandas DataFrame of training data in shape n*m.
  • y_train - A pyspark.pandas Series in shape n*1. None if X_train is a pyspark.pandas Dataframe contains y_train.
  • budget - A float of the time budget in seconds.
  • free_mem_ratio - A float between 0 and 1 for the free memory ratio to keep during training.

Returns:

  • train_time - A float of the training time in seconds.

predict

def predict(X, index_col="tmp_index_col", return_all=False, **kwargs)

Predict label from features.

Arguments:

  • X - A pyspark or pyspark.pandas dataframe of featurized instances, shape n*m.
  • index_col - A str of the index column name. Default to "tmp_index_col".
  • return_all - A bool of whether to return all the prediction results. Default to False.

Returns:

A pyspark.pandas series of shape n*1 if return_all is False. Otherwise, a pyspark.pandas dataframe.

predict_proba

def predict_proba(X, index_col="tmp_index_col", return_all=False, **kwargs)

Predict the probability of each class from features. Only works for classification problems

Arguments:

  • X - A pyspark or pyspark.pandas dataframe of featurized instances, shape n*m.
  • index_col - A str of the index column name. Default to "tmp_index_col".
  • return_all - A bool of whether to return all the prediction results. Default to False.

Returns:

A pyspark.pandas dataframe of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.

SparkLGBMEstimator Objects

class SparkLGBMEstimator(SparkEstimator)

The class for fine-tuning spark version lightgbm models, using SynapseML API.

SparkRandomForestEstimator Objects

class SparkRandomForestEstimator(SparkEstimator)

The SparkEstimator class for Random Forest.

predict

def predict(X, index_col="tmp_index_col", return_all=False, **kwargs)

Predict label from features.

Arguments:

  • X - A pyspark or pyspark.pandas dataframe of featurized instances, shape n*m.
  • index_col - A str of the index column name. Default to "tmp_index_col".
  • return_all - A bool of whether to return all the prediction results. Default to False.

Returns:

A pyspark.pandas series of shape n*1 if return_all is False. Otherwise, a pyspark.pandas dataframe.

TransformersEstimator Objects

class TransformersEstimator(BaseEstimator)

The class for fine-tuning language models, using huggingface transformers API.

SKLearnEstimator Objects

class SKLearnEstimator(BaseEstimator)

The base class for tuning scikit-learn estimators.

Subclasses can modify the function signature of __init__ to ignore the values in config that are not relevant to the constructor of their underlying estimator. For example, some regressors in scikit-learn don't accept the n_jobs parameter contained in config. For these, one can add n_jobs=None, before **config to make sure config doesn't contain an n_jobs key.

LGBMEstimator Objects

class LGBMEstimator(BaseEstimator)

The class for tuning LGBM, using sklearn API.

XGBoostEstimator Objects

class XGBoostEstimator(SKLearnEstimator)

The class for tuning XGBoost regressor, not using sklearn API.

XGBoostSklearnEstimator Objects

class XGBoostSklearnEstimator(SKLearnEstimator,  LGBMEstimator)

The class for tuning XGBoost with unlimited depth, using sklearn API.

XGBoostLimitDepthEstimator Objects

class XGBoostLimitDepthEstimator(XGBoostSklearnEstimator)

The class for tuning XGBoost with limited depth, using sklearn API.

RandomForestEstimator Objects

class RandomForestEstimator(SKLearnEstimator,  LGBMEstimator)

The class for tuning Random Forest.

ExtraTreesEstimator Objects

class ExtraTreesEstimator(RandomForestEstimator)

The class for tuning Extra Trees.

LRL1Classifier Objects

class LRL1Classifier(SKLearnEstimator)

The class for tuning Logistic Regression with L1 regularization.

LRL2Classifier Objects

class LRL2Classifier(SKLearnEstimator)

The class for tuning Logistic Regression with L2 regularization.

CatBoostEstimator Objects

class CatBoostEstimator(BaseEstimator)

The class for tuning CatBoost.

SVCEstimator Objects

class SVCEstimator(SKLearnEstimator)

The class for tuning Linear Support Vector Machine Classifier.

predict_proba

def predict_proba(X, **kwargs)

Predict the probability of each class from features.

Only works for classification problems

Arguments:

  • X - A numpy array of featurized instances, shape n*m.

Returns:

A numpy array of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.

SparkNaiveBayesEstimator Objects

class SparkNaiveBayesEstimator(SparkEstimator)

The class for tuning Naive Bayes Classifier.

SGDEstimator Objects

class SGDEstimator(SKLearnEstimator)

The class for tuning Stoachastic Gradient Descent model.

predict_proba

def predict_proba(X, **kwargs)

Predict the probability of each class from features.

Only works for classification problems

Arguments:

  • X - A numpy array of featurized instances, shape n*m.

Returns:

A numpy array of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.

ElasticNetEstimator Objects

class ElasticNetEstimator(SKLearnEstimator)

The class for tuning Elastic Net regression model.

LassoLarsEstimator Objects

class LassoLarsEstimator(SKLearnEstimator)

The class for tuning Lasso model fit with Least Angle Regression a.k.a. Lars.

SparkGLREstimator Objects

class SparkGLREstimator(SparkEstimator)

The class for tuning Generalized Linear Regression PySpark model.

SparkLinearRegressionEstimator Objects

class SparkLinearRegressionEstimator(SparkEstimator)

The class for tuning Linear Regression PySpark model.

SparkLinearSVCEstimator Objects

class SparkLinearSVCEstimator(SparkEstimator)

The class for tuning Linear SVC PySpark model.

SparkGBTEstimator Objects

class SparkGBTEstimator(SparkEstimator)

The class for tuning GBT PySpark model.

SparkAFTSurvivalRegressionEstimator Objects

class SparkAFTSurvivalRegressionEstimator(SparkEstimator)

The class for tuning AFTSurvivalRegression PySpark model.