transformer

Basic transformer implementation.

Classes

Attention(hidden_dimensions, head_dimensions)

Single-headed attention module.

FeedForward(hidden_dimensions, ...)

A simple feed-forward network module.

MultiHeadAttention(hidden_dimensions, heads)

Multi-headed attention.

PositionEmbedding(hidden_dimensions, ...)

Standard transformer positional encoding.

TransformerEncoder(depth, hidden_dimensions, ...)

A vanilla transformer encoder.

TransformerEncoderLayer(hidden_dimensions, ...)

An individual layer of an encoder with attention.

class undertale.models.transformer.Attention(hidden_dimensions: int, head_dimensions: int)

Bases: Module

Single-headed attention module.

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • head_dimensions – The size of the output space.

forward(state: Tensor, mask: Tensor | None = None) Tensor

Compute attention.

Parameters:
  • state – The input state tensor.

  • mask – Optional attention mask.

Returns:

A tensor in attended state space.

class undertale.models.transformer.MultiHeadAttention(hidden_dimensions: int, heads: int)

Bases: Module

Multi-headed attention.

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • heads – The number of attention heads.

The number of dimensions for each head is computed as:

head_dimensions = hidden_dimensions // heads
forward(state: Tensor, mask: Tensor | None = None) Tensor

Compute attention.

Parameters:
  • state – The input state tensor.

  • mask – Optional attention mask.

Returns:

A tensor in attended state space.

class undertale.models.transformer.FeedForward(hidden_dimensions: int, intermediate_dimensions: int, dropout: float)

Bases: Module

A simple feed-forward network module.

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • intermediate_dimensions – The size of the intermediate state space.

  • dropout – Dropout probability.

This computes a non-linear feature transformation over the inputs in intermediate_dimensions.

forward(state: Tensor) Tensor

Compute forward pass.

Parameters:

state – The input state tensor.

Returns:

Computed state.

class undertale.models.transformer.TransformerEncoderLayer(hidden_dimensions: int, heads: int, intermediate_dimensions: int, dropout: float)

Bases: Module

An individual layer of an encoder with attention.

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • heads – The number of attention heads.

  • intermediate_dimensions – The size of the intermediate state space.

  • dropout – Dropout probability.

forward(state: Tensor, mask: Tensor | None = None) Tensor

Compute attention plus non-linear transform.

Includes regularization (layer normalization, dropout) and skip connections.

Parameters:
  • state – The input state tensor.

  • mask – Optional attention mask.

Returns:

Transformed state.

class undertale.models.transformer.PositionEmbedding(hidden_dimensions: int, vocab_size: int, sequence_length: int, dropout: float, eps: float)

Bases: Module

Standard transformer positional encoding.

Injects positional information by embedding and adding token index.

Requires a fixed-size input vector (padding and truncation).

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • vocab_size – The size of the vocabulary.

  • sequence_length – The fixed size of the input vector.

  • dropout – Dropout probability.

  • eps – Layer normalization stabalization parameter.

forward(state: Tensor) Tensor

Inject positional information.

Parameters:

state – The input state tensor.

Returns:

Modified state with positional information.

class undertale.models.transformer.TransformerEncoder(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, dropout: float, eps: float)

Bases: Module

A vanilla transformer encoder.

Parameters:
  • depth – The number of stacked transformer layers.

  • hidden_dimensions – The size of the hidden state space.

  • vocab_size – The size of the vocabulary.

  • sequence_length – The fixed size of the input vector.

  • heads – The number of attention heads.

  • intermediate_dimensions – The size of the intermediate state space.

  • dropout – Dropout probability.

  • eps – Layer normalization stabalization parameter.

forward(state: Tensor, mask: Tensor | None = None) Tensor

Encode the given state.

Parameters:
  • state – The input state tensor.

  • mask – Optional attention mask.

Returns:

Encoded state.