dataset

Classes

ParquetDataset(source[, schema])

An iterable dataset backed by one or more parquet files.

class undertale.models.dataset.ParquetDataset(source: str, schema: Type[Dataset] | None = None)

Bases: IterableDataset

An iterable dataset backed by one or more parquet files.

Loads parquet data sequentially, one file at a time, making it suitable for datasets larger than memory. When used with a multi-worker DataLoader, files are distributed across workers so that each row is yielded by exactly one worker.

Note

DataLoader shuffle is not supported - shuffling must happen prior to loading.

Note

Schema validation is performed against the first file only, as a representative check. It is assumed that all files in a directory share the same schema.

Parameters:
  • source – Path to a single parquet file or a directory of parquet files.

  • schema – An optional schema class to validate the dataset against on construction.

Raises:

SchemaError – If schema is provided and the dataset does not conform to it.