transformer¶
Basic transformer implementation.
Classes
|
Single-headed attention module. |
|
A simple feed-forward network module. |
|
Multi-headed attention. |
|
Standard transformer positional encoding. |
|
A vanilla transformer encoder. |
|
An individual layer of an encoder with attention. |
- class undertale.models.transformer.Attention(hidden_dimensions: int, head_dimensions: int)¶
Bases:
ModuleSingle-headed attention module.
- Parameters:
hidden_dimensions – The size of the hidden state space.
head_dimensions – The size of the output space.
- forward(state: Tensor, mask: Tensor | None = None) Tensor¶
Compute attention.
- Parameters:
state – The input state tensor.
mask – Optional attention mask.
- Returns:
A tensor in attended state space.
- class undertale.models.transformer.MultiHeadAttention(hidden_dimensions: int, heads: int)¶
Bases:
ModuleMulti-headed attention.
- Parameters:
hidden_dimensions – The size of the hidden state space.
heads – The number of attention heads.
The number of dimensions for each head is computed as:
head_dimensions = hidden_dimensions // heads
- forward(state: Tensor, mask: Tensor | None = None) Tensor¶
Compute attention.
- Parameters:
state – The input state tensor.
mask – Optional attention mask.
- Returns:
A tensor in attended state space.
- class undertale.models.transformer.FeedForward(hidden_dimensions: int, intermediate_dimensions: int, dropout: float)¶
Bases:
ModuleA simple feed-forward network module.
- Parameters:
hidden_dimensions – The size of the hidden state space.
intermediate_dimensions – The size of the intermediate state space.
dropout – Dropout probability.
This computes a non-linear feature transformation over the inputs in
intermediate_dimensions.- forward(state: Tensor) Tensor¶
Compute forward pass.
- Parameters:
state – The input state tensor.
- Returns:
Computed state.
- class undertale.models.transformer.TransformerEncoderLayer(hidden_dimensions: int, heads: int, intermediate_dimensions: int, dropout: float)¶
Bases:
ModuleAn individual layer of an encoder with attention.
- Parameters:
hidden_dimensions – The size of the hidden state space.
heads – The number of attention heads.
intermediate_dimensions – The size of the intermediate state space.
dropout – Dropout probability.
- forward(state: Tensor, mask: Tensor | None = None) Tensor¶
Compute attention plus non-linear transform.
Includes regularization (layer normalization, dropout) and skip connections.
- Parameters:
state – The input state tensor.
mask – Optional attention mask.
- Returns:
Transformed state.
- class undertale.models.transformer.PositionEmbedding(hidden_dimensions: int, vocab_size: int, sequence_length: int, dropout: float, eps: float)¶
Bases:
ModuleStandard transformer positional encoding.
Injects positional information by embedding and adding token index.
Requires a fixed-size input vector (padding and truncation).
- Parameters:
hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
sequence_length – The fixed size of the input vector.
dropout – Dropout probability.
eps – Layer normalization stabalization parameter.
- forward(state: Tensor) Tensor¶
Inject positional information.
- Parameters:
state – The input state tensor.
- Returns:
Modified state with positional information.
- class undertale.models.transformer.TransformerEncoder(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, dropout: float, eps: float)¶
Bases:
ModuleA vanilla transformer encoder.
- Parameters:
depth – The number of stacked transformer layers.
hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
sequence_length – The fixed size of the input vector.
heads – The number of attention heads.
intermediate_dimensions – The size of the intermediate state space.
dropout – Dropout probability.
eps – Layer normalization stabalization parameter.
- forward(state: Tensor, mask: Tensor | None = None) Tensor¶
Encode the given state.
- Parameters:
state – The input state tensor.
mask – Optional attention mask.
- Returns:
Encoded state.