windows

A collection of Windows binaries built from open-source code.

Data: https://assemblage-dataset.net/.

This is a big dataset. In order to create it, you will have to arrange for enough disk space. Also, note that the code for parsing was developed and run on a machine with 512GB RAM. Takes about 1.5 hours to parse on that machine, FYI.

Disk space requirements.

  1. About 20GB for the two zip files that come from the URL above: winpe_licensed.zip and winpe_licensed.sqlite.zip. These should be in the directory pointed to by path argument.

  2. These uncompress into about 102 GB in same directory.

  3. Hugging face needs to do mysterious big things in your ~/.cache directory. If that’s very big (I have no idea how much space it needs) then fine. Otherwise, set the HF_HOME env variable to somewhere very large and HF will use that space instead of ~/.cache.

  4. The default output dir (unless you set the UNDERTALE_DATASETS_DIRECTORY environment variable) is ~/undertale_shared. The dataset generated there, in the form of arrow files, will be about 102GB. So make sure there is enough space there.

Classes

AssemblageWindows([writer, executor, ...])

AssemblageWindowsReader(*args, **kwargs)

class undertale.datasets.assemblage.windows.AssemblageWindowsReader(*args, **kwargs)

Bases: PipelineStep

class undertale.datasets.assemblage.windows.AssemblageWindows(writer: str = 'parquet', executor: str = 'local', logging_directory: str | None = None)

Bases: Dataset