sammo.data
#
DataTables are the primary data structure used in SAMMO. They are essentially a wrapper around a list of inputs and outputs (labels), with some additional functionality.
Module Contents#
Classes#
API#
- class sammo.data.DataTable(inputs: list, outputs: beartype.typing.Union[list, None] = None, constants: beartype.typing.Union[dict, None] = None, seed=42)#
Bases:
pyglove.JSONConvertible
- property inputs#
Access input data.
- property outputs#
Access output data.
- property constants: beartype.typing.Union[dict, None]#
Access constants.
- to_json(**kwargs)#
Convert to a JSON-serializable object.
Note
This only saves the values of the outputs (shallow state), not the raw results.
- persistent_hash()#
- classmethod from_json(json_value, **kwargs)#
- classmethod from_pandas(df: pandas.DataFrame, output_fields: beartype.typing.Union[list[str], str] = 'output', input_fields: beartype.typing.Union[list[str], str, None] = None, constants: beartype.typing.Union[dict, None] = None, seed=42)#
Create a DataTable from a pandas DataFrame.
- Parameters:
df â Pandas DataFrame.
input_fields â Columns from pandas DataFrame that will be used as inputs.
output_fields â Columns that will be used as outputs or targets (e.g., labels).
constants â Constants.
seed â Random seed.
- classmethod from_records(records: list[dict], output_fields: beartype.typing.Union[list[str], str] = 'output', input_fields: beartype.typing.Union[list[str], str, None] = None, **kwargs)#
- to_records(only_values=True)#
Convert to a list of dictionaries.
- Parameters:
only_values â If False, raw result objects will be returned for .outputs.
- to_string(max_rows: int = 10, max_col_width: int = 60, max_cell_length: int = 500)#
Convert to a printable string.
- Parameters:
max_rows â Maximum number of rows to include. Defaults to 10.
max_col_width â Maximum width of each column. Defaults to 50.
max_cell_length â Maximum characters in each cell. Defaults to 100.
- sample(k: int, seed: beartype.typing.Union[int, None] = None)#
Sample rows without replacement.
- Parameters:
k â Number of rows to sample.
seed â Random seed. If not provided, instance seed is used.
- shuffle(seed: beartype.typing.Union[int, None] = None)#
Shuffle rows.
- Parameters:
seed â Random seed. If not provided, instance seed is used.
- random_split(*sizes: int, seed=None) tuple #
Randomly split the dataset into non-overlapping new datasets of given lengths. :param sizes: Sizes of splits to be produced, sum of sizes may not exceed length of the dataset. :param seed: Random seed. If not provided, instance seed is used. :return: Tuple of splits.
- copy()#
- get_minibatch_iterator(minibatch_size)#