dataset¶
Classes
|
An iterable dataset backed by one or more parquet files. |
- class undertale.models.dataset.ParquetDataset(source: str, schema: Type[Dataset] | None = None)¶
Bases:
IterableDatasetAn iterable dataset backed by one or more parquet files.
Loads parquet data sequentially, one file at a time, making it suitable for datasets larger than memory. When used with a multi-worker
DataLoader, files are distributed across workers so that each row is yielded by exactly one worker.Note
DataLoadershuffle is not supported - shuffling must happen prior to loading.Note
Schema validation is performed against the first file only, as a representative check. It is assumed that all files in a directory share the same schema.
- Parameters:
source – Path to a single parquet file or a directory of parquet files.
schema – An optional schema class to validate the dataset against on construction.
- Raises:
SchemaError – If
schemais provided and the dataset does not conform to it.