transformer¶

Basic transformer implementation.

Classes

`Attention`(hidden_dimensions, head_dimensions)	Single-headed attention module.
`FeedForward`(hidden_dimensions, ...)	A simple feed-forward network module.
`MultiHeadAttention`(hidden_dimensions, heads)	Multi-headed attention.
`PositionEmbedding`(hidden_dimensions, ...)	Standard transformer positional encoding.
`TransformerEncoder`(depth, hidden_dimensions, ...)	A vanilla transformer encoder.
`TransformerEncoderLayer`(hidden_dimensions, ...)	An individual layer of an encoder with attention.

class undertale.models.transformer.Attention(hidden_dimensions: int, head_dimensions: int)¶

Bases: Module

Single-headed attention module.

Parameters:

hidden_dimensions – The size of the hidden state space.
head_dimensions – The size of the output space.

forward(state: Tensor, mask: Tensor | None = None) → Tensor¶

Compute attention.

Parameters:

state – The input state tensor.
mask – Optional attention mask.

Returns:

A tensor in attended state space.

class undertale.models.transformer.MultiHeadAttention(hidden_dimensions: int, heads: int)¶

Bases: Module

Multi-headed attention.

Parameters:

hidden_dimensions – The size of the hidden state space.
heads – The number of attention heads.

The number of dimensions for each head is computed as:

head_dimensions = hidden_dimensions // heads

forward(state: Tensor, mask: Tensor | None = None) → Tensor¶

Compute attention.

Parameters:

state – The input state tensor.
mask – Optional attention mask.

Returns:

A tensor in attended state space.

class undertale.models.transformer.FeedForward(hidden_dimensions: int, intermediate_dimensions: int, dropout: float)¶

Bases: Module

A simple feed-forward network module.

Parameters:

hidden_dimensions – The size of the hidden state space.
intermediate_dimensions – The size of the intermediate state space.
dropout – Dropout probability.

This computes a non-linear feature transformation over the inputs in intermediate_dimensions.

forward(state: Tensor) → Tensor¶

Compute forward pass.

Parameters:: state – The input state tensor.
Returns:: Computed state.

class undertale.models.transformer.TransformerEncoderLayer(hidden_dimensions: int, heads: int, intermediate_dimensions: int, dropout: float)¶

Bases: Module

An individual layer of an encoder with attention.

Parameters:

hidden_dimensions – The size of the hidden state space.
heads – The number of attention heads.
intermediate_dimensions – The size of the intermediate state space.
dropout – Dropout probability.

forward(state: Tensor, mask: Tensor | None = None) → Tensor¶

Compute attention plus non-linear transform.

Includes regularization (layer normalization, dropout) and skip connections.

Parameters:

state – The input state tensor.
mask – Optional attention mask.

Returns:

Transformed state.

class undertale.models.transformer.PositionEmbedding(hidden_dimensions: int, vocab_size: int, sequence_length: int, dropout: float, eps: float)¶

Bases: Module

Standard transformer positional encoding.

Injects positional information by embedding and adding token index.

Requires a fixed-size input vector (padding and truncation).

Parameters:

hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
sequence_length – The fixed size of the input vector.
dropout – Dropout probability.
eps – Layer normalization stabalization parameter.

forward(state: Tensor) → Tensor¶

Inject positional information.

Parameters:: state – The input state tensor.
Returns:: Modified state with positional information.

class undertale.models.transformer.TransformerEncoder(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, dropout: float, eps: float)¶

Bases: Module

A vanilla transformer encoder.

Parameters:

depth – The number of stacked transformer layers.
hidden_dimensions – The size of the hidden state space.
vocab_size – The size of the vocabulary.
sequence_length – The fixed size of the input vector.
heads – The number of attention heads.
intermediate_dimensions – The size of the intermediate state space.
dropout – Dropout probability.
eps – Layer normalization stabalization parameter.

forward(state: Tensor, mask: Tensor | None = None) → Tensor¶

Encode the given state.

Parameters:

state – The input state tensor.
mask – Optional attention mask.

Returns:

Encoded state.

transformer¶

Undertale

Navigation

Related Topics