item¶
Tokenization utilities for the ITEM model.
Classes
|
Preprocesses disassembly for the tokenizer. |
|
Tokenize preprocessed disassembly. |
- class undertale.datasets.pipeline.formatters.item.ITEMPretokenizer(*args, **kwargs)¶
Bases:
PipelineStepPreprocesses disassembly for the tokenizer.
- Input:
Disassembled code with a disassembly field that can be pre-tokenized.
- Output:
Replaces the current disassembly field with preprocessed disassembly.
- class undertale.datasets.pipeline.formatters.item.ITEMTokenizer(*args, **kwargs)¶
Bases:
PipelineStepTokenize preprocessed disassembly.
- Parameters:
tokenizer – The path to the trained tokenizer.
- Input:
Disassembled code with a disassembly field that has already been pre-tokenized.
- Output:
Adds a tokens field with disassembly tokens.