Datasets -------- Build a Dataset ^^^^^^^^^^^^^^^ To build a dataset, call the dataset module directly: .. code-block:: bash python -m undertale.datasets.{dataset} {input} {output} For example, to build the HumanEval-X dataset from the raw dataset at ``humanevalx-raw/`` and save it to a directory called ``humanevalx/``, run: .. code-block:: bash # Parse the HumanEvalX dataset. python -m undertale.datasets.humanevalx humanevalx-raw/ humanevalx/ # Parse the HumanEvalX dataset with 8 parallel processes. python -m undertale.datasets.humanevalx humanevalx-raw/ humanevalx/ --parallelism 8 Explore a Dataset with a Shell ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To load a dataset into a Python shell, run: .. code-block:: bash python -m undertale.datasets.scripts.shell {path} # Load the HumanEval-X dataset into a shell. python -m undertale.dataset.scripts.shell humanevalx/ Use a Dataset in a Script ^^^^^^^^^^^^^^^^^^^^^^^^^ To write a script that uses a dataset that has already been parsed, you can do something like: .. code-block:: python from undertale.datasets.base import Dataset dataset = Dataset.load(path) ... Where ``path`` is the path to the saved dataset directory. VLLM Server Integration ^^^^^^^^^^^^^^^^^^^^^^^ Some datasets make use of a `VLLM `_ server to generate summaries of code. Assuming you already have a VLLM server running (see their documentation for details on how to set that up), to build a dataset pipeline with a VLLM step, you need to set the ``VLLM_SERVER_ADDRESS`` environment variable like: .. code-block:: bash export VLLM_SERVER_ADDRESS=http://my.vllm.server:8000/v1 Additionally, if your VLLM server requires an API key, you will need to set the ``VLLM_API_KEY`` environment variable.