maskedlm

Masked language modeling implementation.

Classes

InstructionTraceTransformerEncoderForMaskedLM(...)

A transformer encoder with a masked language modeling head.

InstructionTraceTransformerEncoderForMaskedLMConfiguration()

Model size configurations with associated parameters.

MaskedLMCollator(mask_token_id, vocab_size)

Collation function for masked language modeling.

MaskedLMHead(hidden_dimensions, vocab_size, eps)

Masked language modeling head.

class undertale.models.maskedlm.MaskedLMCollator(mask_token_id: int, vocab_size: int, probability: float = 0.15)

Bases: object

Collation function for masked language modeling.

Masking follows the BERT convention: of the candidate positions selected at the given probability, 80% are replaced with [MASK], 10% with a random token from the vocabulary, and 10% are left unchanged. Only non-padding positions are eligible for masking.

Parameters:
  • mask_token_id – The token ID of the [MASK] special token.

  • vocab_size – The vocabulary size, used for random token replacement.

  • probability – The fraction of non-padding tokens selected as masking candidates per sequence.

class undertale.models.maskedlm.MaskedLMHead(hidden_dimensions: int, vocab_size: int, eps: float)

Bases: Module

Masked language modeling head.

A simple linear transform and decode - the standard masked language modeling head.

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • vocab_size – The size of the vocabulary.

  • eps – Layer normalization stabalization parameter.

forward(state: Tensor) Tensor

Decode to vocabulary tokens.

Parameters:

state – The input state tensor from the hidden state of a transformer.

Returns:

A decoded state tensor in vocabulary token space.

class undertale.models.maskedlm.InstructionTraceTransformerEncoderForMaskedLM(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, next_token_id: int, mask_token_id: int, dropout: float, eps: float, lr: float = 0.0001, warmup: float = 0.025)

Bases: LightningModule, Module

A transformer encoder with a masked language modeling head.

Parameters:
  • depth – The number of stacked transformer layers.

  • hidden_dimensions – The size of the hidden state space.

  • vocab_size – The size of the vocabulary.

  • sequence_length – The fixed size of the input vector.

  • heads – The number of attention heads.

  • intermediate_dimensions – The size of the intermediate state space.

  • next_token_id – The ID of the special NEXT token.

  • dropout – Dropout probability.

  • eps – Layer normalization stabalization parameter.

  • lr – Peak learning rate reached after warmup.

  • warmup – Number of linear warmup steps before cosine decay begins.

forward(state: Tensor, mask: Tensor | None = None) Tensor

Encode and decode with the language modeling head.

Parameters:
  • state – The tokenized input state tensor.

  • mask – Optional attention mask.

Returns:

The computed output tensor in output token space.

infer(tokens: Tensor, mask: Tensor | None = None) Tensor

Fill masked tokens given pre-tokenized input.

Runs a forward pass and replaces each masked position with the highest-probability predicted token, leaving all other positions unchanged.

Parameters:
  • tokens – Pre-tokenized input tensor.

  • mask – Optional attention mask tensor.

Returns:

A 1-D tensor of token IDs with masked positions filled.

class undertale.models.maskedlm.InstructionTraceTransformerEncoderForMaskedLMConfiguration

Bases: object

Model size configurations with associated parameters.

To make use of this class, simply pass the model size dictionary to model initialization as kwargs.