item

The Instruction Trace Embedding Model (ITEM).

This is a custom BERT-like model built for binary code - it takes advantage of unique data (structural, dataflow) in the binary code domain to improve performance over a naive tranformer implementation.

class undertale.models.item.TransformerEncoder(depth: int, hidden_dimensions: int, vocab_size: int, input_size: int, heads: int, intermediate_dimensions: int, dropout: float, eps: float)

Bases: Module

class undertale.models.item.TransformerEncoderForMaskedLM(depth: int, hidden_dimensions: int, vocab_size: int, input_size: int, heads: int, intermediate_dimensions: int, dropout: float, eps: float, lr: float, warmup: float)

Bases: LightningModule, Module

class undertale.models.item.TransformerEncoderForSequenceSimilarity(*args, **kwargs)

Bases: Module

class undertale.models.item.TransformerEncoderForSequenceClassification(classes: int, depth: int, hidden_dimensions: int, vocab_size: int, input_size: int, heads: int, intermediate_dimensions: int, dropout: float)

Bases: Module

class undertale.models.item.TransformerEncoderForSequenceSummarizationGPT2(*args, **kwargs)

Bases: Module

Modules

evaluate_maskedlm

finetune_embedding

Finetune a pretrained model on a pairwise contrastive task.

finetune_summarization

Finetune a pretrained model on a summarization task.

infer_embedding

Compute the embedding of some code given a finetuned model.

infer_maskedlm

Predict masked tokens given a pretrained model.

infer_similarity

Compute the similarity of two code samples given a finetuned model.

infer_summarization

Generate a summary for a piece of code given a finetuned model.

model

Model implementation.

pretrain_dataflow

Pretrain a model on a Dataflow Prediction (DP) task.

pretrain_maskedlm

Pretrain a model on a Masked Language Modeling (MLM) task.

tokenizer

Tokenizer implementation and training script.