item

Tokenization utilities for the ITEM model.

Classes

ITEMPretokenizer(*args, **kwargs)

Preprocesses disassembly for the tokenizer.

ITEMTokenizer(*args, **kwargs)

Tokenize preprocessed disassembly.

class undertale.datasets.pipeline.formatters.item.ITEMPretokenizer(*args, **kwargs)

Bases: PipelineStep

Preprocesses disassembly for the tokenizer.

Input:

Disassembled code with a disassembly field that can be pre-tokenized.

Output:

Replaces the current disassembly field with preprocessed disassembly.

class undertale.datasets.pipeline.formatters.item.ITEMTokenizer(*args, **kwargs)

Bases: PipelineStep

Tokenize preprocessed disassembly.

Parameters:

tokenizer – The path to the trained tokenizer.

Input:

Disassembled code with a disassembly field that has already been pre-tokenized.

Output:

Adds a tokens field with disassembly tokens.