Skip to main content

automl.data

load_openml_dataset

def load_openml_dataset(dataset_id, data_dir=None, random_state=0, dataset_format="dataframe")

Load dataset from open ML.

If the file is not cached locally, download it from open ML.

Arguments:

  • dataset_id - An integer of the dataset id in openml.
  • data_dir - A string of the path to store and load the data.
  • random_state - An integer of the random seed for splitting data.
  • dataset_format - A string specifying the format of returned dataset. Default is 'dataframe'. Can choose from ['dataframe', 'array']. If 'dataframe', the returned dataset will be a Pandas DataFrame. If 'array', the returned dataset will be a NumPy array or a SciPy sparse matrix.

Returns:

  • X_train - Training data.
  • X_test - Test data.
  • y_train - A series or array of labels for training data.
  • y_test - A series or array of labels for test data.

load_openml_task

def load_openml_task(task_id, data_dir)

Load task from open ML.

Use the first fold of the task. If the file is not cached locally, download it from open ML.

Arguments:

  • task_id - An integer of the task id in openml.
  • data_dir - A string of the path to store and load the data.

Returns:

  • X_train - A dataframe of training data.
  • X_test - A dataframe of test data.
  • y_train - A series of labels for training data.
  • y_test - A series of labels for test data.

get_output_from_log

def get_output_from_log(filename, time_budget)

Get output from log file.

Arguments:

  • filename - A string of the log file name.
  • time_budget - A float of the time budget in seconds.

Returns:

  • search_time_list - A list of the finished time of each logged iter.
  • best_error_list - A list of the best validation error after each logged iter.
  • error_list - A list of the validation error of each logged iter.
  • config_list - A list of the estimator, sample size and config of each logged iter.
  • logged_metric_list - A list of the logged metric of each logged iter.

concat

def concat(X1, X2)

concatenate two matrices vertically.

DataTransformer Objects

class DataTransformer()

Transform input training data.

fit_transform

def fit_transform(X: Union[DataFrame, np.ndarray], y, task: Union[str, "Task"])

Fit transformer and process the input training data according to the task type.

Arguments:

  • X - A numpy array or a pandas dataframe of training data.
  • y - A numpy array or a pandas series of labels.
  • task - An instance of type Task, or a str such as 'classification', 'regression'.

Returns:

  • X - Processed numpy array or pandas dataframe of training data.
  • y - Processed numpy array or pandas series of labels.

transform

def transform(X: Union[DataFrame, np.array])

Process data using fit transformer.

Arguments:

  • X - A numpy array or a pandas dataframe of training data.

Returns:

  • X - Processed numpy array or pandas dataframe of training data.