sammo.data

`sammo.data`#

DataTables are the primary data structure used in SAMMO. They are essentially a wrapper around a list of inputs and outputs (labels), with some additional functionality.

Module Contents#

Classes#

DataTable

API#

class sammo.data.DataTable(inputs: list, outputs: beartype.typing.Union[list, None] = None, constants: beartype.typing.Union[dict, None] = None, seed=42)#

Bases: pyglove.JSONConvertible

property inputs#: Access input data.

property outputs#: Access output data.

property constants: beartype.typing.Union[dict, None]#: Access constants.

to_json(**kwargs)#: Convert to a JSON-serializable object.

Note

This only saves the values of the outputs (shallow state), not the raw results.

persistent_hash()#

classmethod from_json(json_value, **kwargs)#

classmethod from_pandas(df: pandas.DataFrame, output_fields: beartype.typing.Union[list[str], str] = 'output', input_fields: beartype.typing.Union[list[str], str, None] = None, constants: beartype.typing.Union[dict, None] = None, seed=42)#

Create a DataTable from a pandas DataFrame.

Parameters:

df – Pandas DataFrame.
input_fields – Columns from pandas DataFrame that will be used as inputs.
output_fields – Columns that will be used as outputs or targets (e.g., labels).
constants – Constants.
seed – Random seed.

classmethod from_records(records: list[dict], output_fields: beartype.typing.Union[list[str], str] = 'output', input_fields: beartype.typing.Union[list[str], str, None] = None, **kwargs)#

to_records(only_values=True)#

Convert to a list of dictionaries.

Parameters:: only_values – If False, raw result objects will be returned for .outputs.

to_string(max_rows: int = 10, max_col_width: int = 60, max_cell_length: int = 500)#

Convert to a printable string.

Parameters:

max_rows – Maximum number of rows to include. Defaults to 10.
max_col_width – Maximum width of each column. Defaults to 50.
max_cell_length – Maximum characters in each cell. Defaults to 100.

sample(k: int, seed: beartype.typing.Union[int, None] = None)#

Sample rows without replacement.

Parameters:

k – Number of rows to sample.
seed – Random seed. If not provided, instance seed is used.

shuffle(seed: beartype.typing.Union[int, None] = None)#

Shuffle rows.

Parameters:: seed – Random seed. If not provided, instance seed is used.

random_split(*sizes: int, seed=None) → tuple#: Randomly split the dataset into non-overlapping new datasets of given lengths. :param sizes: Sizes of splits to be produced, sum of sizes may not exceed length of the dataset. :param seed: Random seed. If not provided, instance seed is used. :return: Tuple of splits.

copy()#

get_minibatch_iterator(minibatch_size)#

sammo.data

Contents

sammo.data#

Module Contents#

Classes#

API#

`sammo.data`#