datasetsΒΆ

Dataset building scripts and reusable pipeline utilities.

Modules

apt

A dataset harvested from packages available via Advanced Package Tool.

assemblage

Assemblage datasets, built from open-source code.

base

Base classes and utilities for datasets.

googlecodejam

The Google Code Jam programming competition archives.

humanevalx

The HumanEval-X multilingual code benchmark.

nixpkgs

A dataset built from packages available via the Nix package manager.

pipeline

Custom pipeline steps for processing data.

schema

Dataset schema definition and enforcement.

scripts

Dataset parsing scripts and utilities.

xlcost

The XLCost text-to-code generation benchmark.