windows¶
A collection of Windows binaries built from open-source code.
Data: https://assemblage-dataset.net/.
This is a big dataset. In order to create it, you will have to arrange for enough disk space. Also, note that the code for parsing was developed and run on a machine with 512GB RAM. Takes about 1.5 hours to parse on that machine, FYI.
Disk space requirements.
About 20GB for the two zip files that come from the URL above: winpe_licensed.zip and winpe_licensed.sqlite.zip. These should be in the directory pointed to by path argument.
These uncompress into about 102 GB in same directory.
Hugging face needs to do mysterious big things in your ~/.cache directory. If that’s very big (I have no idea how much space it needs) then fine. Otherwise, set the HF_HOME env variable to somewhere very large and HF will use that space instead of ~/.cache.
The default output dir (unless you set the UNDERTALE_DATASETS_DIRECTORY environment variable) is ~/undertale_shared. The dataset generated there, in the form of arrow files, will be about 102GB. So make sure there is enough space there.
Classes
|
|
|
- class undertale.datasets.assemblage.windows.AssemblageWindowsReader(*args, **kwargs)¶
Bases:
PipelineStep