sammo.data

sammo.data#

DataTables are the primary data structure used in SAMMO. They are essentially a wrapper around a list of inputs and outputs (labels), with some additional functionality.

Module Contents#

Classes#

DataTable

API#

class sammo.data.DataTable(inputs: list, outputs: list | None = None, constants: dict | None = None, seed=42)#

Bases: pyglove.JSONConvertible

property inputs#

Access input data.

property outputs#

Access output data.

property constants: dict | None#

Access constants.

to_json(**kwargs)#

Convert to a JSON-serializable object.

Note

This only saves the values of the outputs (shallow state), not the raw results.

persistent_hash()#
classmethod from_json(json_value, **kwargs)#
classmethod from_pandas(df: pandas.DataFrame, output_fields: list[str] | str = 'output', input_fields: list[str] | str | None = None, constants: dict | None = None, seed=42)#

Create a DataTable from a pandas DataFrame.

Parameters:
  • df – Pandas DataFrame.

  • input_fields – Columns from pandas DataFrame that will be used as inputs.

  • output_fields – Columns that will be used as outputs or targets (e.g., labels).

  • constants – Constants.

  • seed – Random seed.

classmethod from_records(records: list[dict], output_fields: list[str] | str = 'output', input_fields: list[str] | str | None = None, **kwargs)#
to_records(only_values=True)#

Convert to a list of dictionaries.

Parameters:

only_values – If False, raw result objects will be returned for .outputs.

to_string(max_rows: int = 10, max_col_width: int = 60, max_cell_length: int = 500)#

Convert to a printable string.

Parameters:
  • max_rows – Maximum number of rows to include. Defaults to 10.

  • max_col_width – Maximum width of each column. Defaults to 50.

  • max_cell_length – Maximum characters in each cell. Defaults to 100.

sample(k: int, seed: int | None = None) beartype.typing.Self#

Sample rows without replacement.

Parameters:
  • k – Number of rows to sample.

  • seed – Random seed. If not provided, instance seed is used.

shuffle(seed: int | None = None) beartype.typing.Self#

Shuffle rows.

Parameters:

seed – Random seed. If not provided, instance seed is used.

random_split(*sizes: int, seed=None) tuple#

Randomly split the dataset into non-overlapping new datasets of given lengths. :param sizes: Sizes of splits to be produced, sum of sizes may not exceed length of the dataset. :param seed: Random seed. If not provided, instance seed is used. :return: Tuple of splits.

copy() beartype.typing.Self#
get_minibatch_iterator(minibatch_size)#