custom

Transformer customizations for binary code.

Classes

InstructionTracePositionEmbedding(...)

Custom instruction and argument positional encoding.

InstructionTraceTransformerEncoder(depth, ...)

A transformer encoder for instruction traces.

class undertale.models.custom.InstructionTracePositionEmbedding(hidden_dimensions: int, vocab_size: int, sequence_length: int, next_token_id: int, dropout: float, eps: float)

Bases: Module

Custom instruction and argument positional encoding.

Injects positional information by embedding and adding two features to each token:

  1. Instruction number (e.g., foo bar baz NEXT blah -> 0 0 0 - 1).

  2. Argument number (e.g., foo bar baz NEXT blah -> 0 1 2 - 0).

Requires a special NEXT token that is used to identify instruction boundaries.

Requires a fixed-size input vector (padding and truncation).

Parameters:
  • hidden_dimensions – The size of the hidden state space.

  • vocab_size – The size of the vocabulary.

  • sequence_length – The fixed size of the input vector.

  • next_token_id – The ID of the special NEXT token.

  • dropout – Dropout probability.

  • eps – Layer normalization stabalization parameter.

static compute_instruction_index(state: Tensor, next_token_id: int) Tensor

Compute the instruction index for each token.

Each token is assigned the index of the instruction it belongs to. The first instruction is index 0, incrementing after each NEXT token.

Parameters:
  • state – The input state tensor.

  • next_token_id – The ID of the special NEXT token.

Returns:

A tensor of instruction indices.

static compute_argument_index(state: Tensor, next_token_id: int) Tensor

Compute the argument index for each token within its instruction.

Each token is assigned its position within the current instruction, resetting to 0 after each NEXT token.

Parameters:
  • state – The input state tensor.

  • next_token_id – The ID of the special NEXT token.

Returns:

A tensor of argument indices.

forward(state: Tensor) Tensor

Inject positional information.

Parameters:

state – The input state tensor.

Returns:

Modified state with positional information.

class undertale.models.custom.InstructionTraceTransformerEncoder(depth: int, hidden_dimensions: int, vocab_size: int, sequence_length: int, heads: int, intermediate_dimensions: int, next_token_id: int, dropout: float, eps: float)

Bases: Module

A transformer encoder for instruction traces.

Parameters:
  • depth – The number of stacked transformer layers.

  • hidden_dimensions – The size of the hidden state space.

  • vocab_size – The size of the vocabulary.

  • sequence_length – The fixed size of the input vector.

  • heads – The number of attention heads.

  • intermediate_dimensions – The size of the intermediate state space.

  • next_token_id – The ID of the special NEXT token.

  • dropout – Dropout probability.

  • eps – Layer normalization stabalization parameter.

forward(state: Tensor, mask: Tensor | None = None) Tensor

Encode the given state.

Parameters:
  • state – The input state tensor.

  • mask – Optional attention mask.

Returns:

Encoded state.