maskedlm¶

Masked language modeling implementation.

Classes

`InstructionTraceTransformerEncoderForMaskedLM`(...)	A transformer encoder with a masked language modeling head.
`InstructionTraceTransformerEncoderForMaskedLMConfiguration`()	Model size configurations with associated parameters.
`MaskedLMCollator`(mask_token_id, vocab_size)	Collation function for masked language modeling.
`MaskedLMHead`(hidden_dimensions, vocab_size, eps)	Masked language modeling head.

class undertale.models.maskedlm.MaskedLMCollator(mask_token_id: int, vocab_size: int, probability: float = 0.15)¶

Bases: object

Collation function for masked language modeling.

Masking follows the BERT convention: of the candidate positions selected at the given probability, 80% are replaced with [MASK], 10% with a random token from the vocabulary, and 10% are left unchanged. Only non-padding positions are eligible for masking.

Parameters:

mask_token_id – The token ID of the [MASK] special token.
vocab_size – The vocabulary size, used for random token replacement.
probability – The fraction of non-padding tokens selected as masking candidates per sequence.

class undertale.models.maskedlm.MaskedLMHead(hidden_dimensions: int, vocab_size: int, eps: float)¶

Bases: Module

Masked language modeling head.

A simple linear transform and decode - the standard masked language modeling head.

Parameters:

hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
eps – Layer normalization stabalization parameter.

forward(state: Tensor) → Tensor¶

Decode to vocabulary tokens.

Parameters:: state – The input state tensor from the hidden state of a transformer.
Returns:: A decoded state tensor in vocabulary token space.

class undertale.models.maskedlm.InstructionTraceTransformerEncoderForMaskedLM(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, next_token_id: int, mask_token_id: int, dropout: float, eps: float, lr: float = 0.0001, warmup: float = 0.025)¶

Bases: LightningModule, Module

A transformer encoder with a masked language modeling head.

Parameters:

depth – The number of stacked transformer layers.
hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
sequence_length – The fixed size of the input vector.
heads – The number of attention heads.
intermediate_dimensions – The size of the intermediate state space.
next_token_id – The ID of the special NEXT token.
dropout – Dropout probability.
eps – Layer normalization stabalization parameter.
lr – Peak learning rate reached after warmup.
warmup – Number of linear warmup steps before cosine decay begins.

forward(state: Tensor, mask: Tensor | None = None) → Tensor¶

Encode and decode with the language modeling head.

Parameters:

state – The tokenized input state tensor.
mask – Optional attention mask.

Returns:

The computed output tensor in output token space.

infer(tokens: Tensor, mask: Tensor | None = None) → Tensor¶

Fill masked tokens given pre-tokenized input.

Runs a forward pass and replaces each masked position with the highest-probability predicted token, leaving all other positions unchanged.

Parameters:

tokens – Pre-tokenized input tensor.
mask – Optional attention mask tensor.

Returns:

A 1-D tensor of token IDs with masked positions filled.

class undertale.models.maskedlm.InstructionTraceTransformerEncoderForMaskedLMConfiguration¶

Bases: object

Model size configurations with associated parameters.

To make use of this class, simply pass the model size dictionary to model initialization as kwargs.

maskedlm¶

Undertale

Navigation

Related Topics