maskedlm¶
Masked language modeling implementation.
Classes
A transformer encoder with a masked language modeling head. |
|
|
Model size configurations with associated parameters. |
|
Collation function for masked language modeling. |
|
Masked language modeling head. |
- class undertale.models.maskedlm.MaskedLMCollator(mask_token_id: int, vocab_size: int, probability: float = 0.15)¶
Bases:
objectCollation function for masked language modeling.
Masking follows the BERT convention: of the candidate positions selected at the given
probability, 80% are replaced with[MASK], 10% with a random token from the vocabulary, and 10% are left unchanged. Only non-padding positions are eligible for masking.- Parameters:
mask_token_id – The token ID of the
[MASK]special token.vocab_size – The vocabulary size, used for random token replacement.
probability – The fraction of non-padding tokens selected as masking candidates per sequence.
- class undertale.models.maskedlm.MaskedLMHead(hidden_dimensions: int, vocab_size: int, eps: float)¶
Bases:
ModuleMasked language modeling head.
A simple linear transform and decode - the standard masked language modeling head.
- Parameters:
hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
eps – Layer normalization stabalization parameter.
- forward(state: Tensor) Tensor¶
Decode to vocabulary tokens.
- Parameters:
state – The input state tensor from the hidden state of a transformer.
- Returns:
A decoded state tensor in vocabulary token space.
- class undertale.models.maskedlm.InstructionTraceTransformerEncoderForMaskedLM(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, next_token_id: int, mask_token_id: int, dropout: float, eps: float, lr: float = 0.0001, warmup: float = 0.025)¶
Bases:
LightningModule,ModuleA transformer encoder with a masked language modeling head.
- Parameters:
depth – The number of stacked transformer layers.
hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
sequence_length – The fixed size of the input vector.
heads – The number of attention heads.
intermediate_dimensions – The size of the intermediate state space.
next_token_id – The ID of the special
NEXTtoken.dropout – Dropout probability.
eps – Layer normalization stabalization parameter.
lr – Peak learning rate reached after warmup.
warmup – Number of linear warmup steps before cosine decay begins.
- forward(state: Tensor, mask: Tensor | None = None) Tensor¶
Encode and decode with the language modeling head.
- Parameters:
state – The tokenized input state tensor.
mask – Optional attention mask.
- Returns:
The computed output tensor in output token space.
- infer(tokens: Tensor, mask: Tensor | None = None) Tensor¶
Fill masked tokens given pre-tokenized input.
Runs a forward pass and replaces each masked position with the highest-probability predicted token, leaving all other positions unchanged.
- Parameters:
tokens – Pre-tokenized input tensor.
mask – Optional attention mask tensor.
- Returns:
A 1-D tensor of token IDs with masked positions filled.
- class undertale.models.maskedlm.InstructionTraceTransformerEncoderForMaskedLMConfiguration¶
Bases:
objectModel size configurations with associated parameters.
To make use of this class, simply pass the model size dictionary to model initialization as kwargs.