automl.model
BaseEstimator Objects
class BaseEstimator()
The abstract class for all learners.
Typical examples:
- XGBoostEstimator: for regression.
- XGBoostSklearnEstimator: for classification.
- LGBMEstimator, RandomForestEstimator, LRL1Classifier, LRL2Classifier: for both regression and classification.
__init__
def __init__(task="binary", **config)
Constructor.
Arguments:
task
- A string of the task type, one of 'binary', 'multiclass', 'regression', 'rank', 'seq-classification', 'seq-regression', 'token-classification', 'multichoice-classification', 'summarization', 'ts_forecast', 'ts_forecast_classification'.config
- A dictionary containing the hyperparameter names, 'n_jobs' as keys. n_jobs is the number of parallel threads.
model
@property
def model()
Trained model after fit() is called, or None before fit() is called.
estimator
@property
def estimator()
Get the best trained estimator model.
Returns:
object or None: The trained model obtained after calling the fit()
method,
representing the best estimator found during the training process. If fit()
has
not been called yet, it returns None
.
Examples:
from flaml import AutoML automl = AutoML() automl.fit(X_train, y_train) best_estimator = automl.model.estimator print(best_estimator) RandomForestClassifier()
Notes:
To access the best estimator, use automl.model.estimator
.
feature_names_in_
@property
def feature_names_in_()
if self.model has attribute feature_names_in, return it. otherwise, if self.model has attribute feature_name, return it. otherwise, if self._model has attribute feature_names, return it. otherwise, if self._model has method get_booster, return the feature names. otherwise, return None.
feature_importances_
@property
def feature_importances_()
if self.model has attribute feature_importances, return it. otherwise, if self.model has attribute coef, return it. otherwise, return None.
fit
def fit(X_train, y_train, budget=None, free_mem_ratio=0, **kwargs)
Train the model from given training data.
Arguments:
X_train
- A numpy array or a dataframe of training data in shape n*m.y_train
- A numpy array or a series of labels in shape n*1.budget
- A float of the time budget in seconds.free_mem_ratio
- A float between 0 and 1 for the free memory ratio to keep during training.
Returns:
train_time
- A float of the training time in seconds.
predict
def predict(X, **kwargs)
Predict label from features.
Arguments:
X
- A numpy array or a dataframe of featurized instances, shape n*m.
Returns:
A numpy array of shape n*1. Each element is the label for a instance.
predict_proba
def predict_proba(X, **kwargs)
Predict the probability of each class from features.
Only works for classification problems
Arguments:
X
- A numpy array of featurized instances, shape n*m.
Returns:
A numpy array of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.
score
def score(X_val: DataFrame, y_val: Series, **kwargs)
Report the evaluation score of a trained estimator.
Arguments:
X_val
- A pandas dataframe of the validation input data.y_val
- A pandas series of the validation label.kwargs
- keyword argument of the evaluation function, for example:- metric: A string of the metric name or a function e.g., 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'f1', 'micro_f1', 'macro_f1', 'log_loss', 'mae', 'mse', 'r2', 'mape'. Default is 'auto'. If metric is given, the score will report the user specified metric. If metric is not given, the metric is set to accuracy for classification and r2 for regression. You can also pass a customized metric function, for examples on how to pass a customized metric function, please check test/nlp/test_autohf_custom_metric.py and test/automl/test_multiclass.py.
Returns:
The evaluation score on the validation dataset.
search_space
@classmethod
def search_space(cls, data_size, task, **params)
[required method] search space.
Arguments:
data_size
- A tuple of two integers, number of rows and columns.task
- A str of the task type, e.g., "binary", "multiclass", "regression".
Returns:
A dictionary of the search space.
Each key is the name of a hyperparameter, and value is a dict with
its domain (required) and low_cost_init_value, init_value,
cat_hp_cost (if applicable).
e.g., {'domain': tune.randint(lower=1, upper=10), 'init_value': 1}
.
size
@classmethod
def size(cls, config: dict) -> float
[optional method] memory size of the estimator in bytes.
Arguments:
config
- A dict of the hyperparameter config.
Returns:
A float of the memory size required by the estimator to train the given config.
cost_relative2lgbm
@classmethod
def cost_relative2lgbm(cls) -> float
[optional method] relative cost compared to lightgbm.
init
@classmethod
def init(cls)
[optional method] initialize the class.
config2params
def config2params(config: dict) -> dict
[optional method] config dict to params dict
Arguments:
config
- A dict of the hyperparameter config.
Returns:
A dict that will be passed to self.estimator_class's constructor.
SparkEstimator Objects
class SparkEstimator(BaseEstimator)
The base class for fine-tuning spark models, using pyspark.ml and SynapseML API.
fit
def fit(X_train: psDataFrame, y_train: psSeries = None, budget=None, free_mem_ratio=0, index_col: str = "tmp_index_col", **kwargs, ,)
Train the model from given training data.
Arguments:
X_train
- A pyspark.pandas DataFrame of training data in shape n*m.y_train
- A pyspark.pandas Series in shape n*1. None if X_train is a pyspark.pandas Dataframe contains y_train.budget
- A float of the time budget in seconds.free_mem_ratio
- A float between 0 and 1 for the free memory ratio to keep during training.
Returns:
train_time
- A float of the training time in seconds.
predict
def predict(X, index_col="tmp_index_col", return_all=False, **kwargs)
Predict label from features.
Arguments:
X
- A pyspark or pyspark.pandas dataframe of featurized instances, shape n*m.index_col
- A str of the index column name. Default to "tmp_index_col".return_all
- A bool of whether to return all the prediction results. Default to False.
Returns:
A pyspark.pandas series of shape n*1 if return_all is False. Otherwise, a pyspark.pandas dataframe.
predict_proba
def predict_proba(X, index_col="tmp_index_col", return_all=False, **kwargs)
Predict the probability of each class from features. Only works for classification problems
Arguments:
X
- A pyspark or pyspark.pandas dataframe of featurized instances, shape n*m.index_col
- A str of the index column name. Default to "tmp_index_col".return_all
- A bool of whether to return all the prediction results. Default to False.
Returns:
A pyspark.pandas dataframe of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.
SparkLGBMEstimator Objects
class SparkLGBMEstimator(SparkEstimator)
The class for fine-tuning spark version lightgbm models, using SynapseML API.
SparkRandomForestEstimator Objects
class SparkRandomForestEstimator(SparkEstimator)
The SparkEstimator class for Random Forest.
predict
def predict(X, index_col="tmp_index_col", return_all=False, **kwargs)
Predict label from features.
Arguments:
X
- A pyspark or pyspark.pandas dataframe of featurized instances, shape n*m.index_col
- A str of the index column name. Default to "tmp_index_col".return_all
- A bool of whether to return all the prediction results. Default to False.
Returns:
A pyspark.pandas series of shape n*1 if return_all is False. Otherwise, a pyspark.pandas dataframe.
TransformersEstimator Objects
class TransformersEstimator(BaseEstimator)
The class for fine-tuning language models, using huggingface transformers API.
SKLearnEstimator Objects
class SKLearnEstimator(BaseEstimator)
The base class for tuning scikit-learn estimators.
Subclasses can modify the function signature of __init__
to
ignore the values in config
that are not relevant to the constructor
of their underlying estimator. For example, some regressors in scikit-learn
don't accept the n_jobs
parameter contained in config
. For these,
one can add n_jobs=None,
before **config
to make sure config
doesn't
contain an n_jobs
key.
LGBMEstimator Objects
class LGBMEstimator(BaseEstimator)
The class for tuning LGBM, using sklearn API.
XGBoostEstimator Objects
class XGBoostEstimator(SKLearnEstimator)
The class for tuning XGBoost regressor, not using sklearn API.
XGBoostSklearnEstimator Objects
class XGBoostSklearnEstimator(SKLearnEstimator, LGBMEstimator)
The class for tuning XGBoost with unlimited depth, using sklearn API.
XGBoostLimitDepthEstimator Objects
class XGBoostLimitDepthEstimator(XGBoostSklearnEstimator)
The class for tuning XGBoost with limited depth, using sklearn API.
RandomForestEstimator Objects
class RandomForestEstimator(SKLearnEstimator, LGBMEstimator)
The class for tuning Random Forest.
ExtraTreesEstimator Objects
class ExtraTreesEstimator(RandomForestEstimator)
The class for tuning Extra Trees.
LRL1Classifier Objects
class LRL1Classifier(SKLearnEstimator)
The class for tuning Logistic Regression with L1 regularization.
LRL2Classifier Objects
class LRL2Classifier(SKLearnEstimator)
The class for tuning Logistic Regression with L2 regularization.
CatBoostEstimator Objects
class CatBoostEstimator(BaseEstimator)
The class for tuning CatBoost.
SVCEstimator Objects
class SVCEstimator(SKLearnEstimator)
The class for tuning Linear Support Vector Machine Classifier.
predict_proba
def predict_proba(X, **kwargs)
Predict the probability of each class from features.
Only works for classification problems
Arguments:
X
- A numpy array of featurized instances, shape n*m.
Returns:
A numpy array of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.
SparkNaiveBayesEstimator Objects
class SparkNaiveBayesEstimator(SparkEstimator)
The class for tuning Naive Bayes Classifier.
SGDEstimator Objects
class SGDEstimator(SKLearnEstimator)
The class for tuning Stoachastic Gradient Descent model.
predict_proba
def predict_proba(X, **kwargs)
Predict the probability of each class from features.
Only works for classification problems
Arguments:
X
- A numpy array of featurized instances, shape n*m.
Returns:
A numpy array of shape n*c. c is the # classes. Each element at (i,j) is the probability for instance i to be in class j.
ElasticNetEstimator Objects
class ElasticNetEstimator(SKLearnEstimator)
The class for tuning Elastic Net regression model.
LassoLarsEstimator Objects
class LassoLarsEstimator(SKLearnEstimator)
The class for tuning Lasso model fit with Least Angle Regression a.k.a. Lars.
SparkGLREstimator Objects
class SparkGLREstimator(SparkEstimator)
The class for tuning Generalized Linear Regression PySpark model.
SparkLinearRegressionEstimator Objects
class SparkLinearRegressionEstimator(SparkEstimator)
The class for tuning Linear Regression PySpark model.
SparkLinearSVCEstimator Objects
class SparkLinearSVCEstimator(SparkEstimator)
The class for tuning Linear SVC PySpark model.
SparkGBTEstimator Objects
class SparkGBTEstimator(SparkEstimator)
The class for tuning GBT PySpark model.
SparkAFTSurvivalRegressionEstimator Objects
class SparkAFTSurvivalRegressionEstimator(SparkEstimator)
The class for tuning AFTSurvivalRegression PySpark model.