Installation¶

This guide provides detailed installation instructions and dependency information for the eval-framework.

1. Install `uv`¶

Follow the official installation instructions.

2. Install Eval Framework¶

Clone the repository and install all dependencies, including optional extras:

# Clone the repository
git clone https://github.com/Aleph-Alpha-Research/eval-framework/tree/main
cd eval-framework

# Install all dependencies
uv sync --all-extras

To select specific optional features, you can install them individually. Available extras are:

api for Aleph Alpha client inference
comet for COMET metric
determined for distributed evaluation
mistral for Mistral model inference
transformers for HuggingFace inference
vllm for VLLM inference
all installs all extras

You can install them as follows:

   uv sync --extra transformers

Or, you can install group extras like flash-attn:

# Install flash-attention optional extra (requires compilation)
uv sync --all-extras --group flash-attn

3. Test Your Installation¶

uv run eval_framework \
    --models src/eval_framework/llm/models.py \
    --llm-name Smollm135MInstruct \
    --task-name "MMLU" \
    --task-subjects "abstract_algebra" "anatomy" \
    --output-dir ./eval_results \
    --num-fewshot 5 \
    --num-samples 10

Environment Configuration¶

Environment Variables¶

Create a .env file in the project root:

# API Keys (if using external models)
HF_TOKEN="your_huggingface_token"        # For private HuggingFace models
OPENAI_API_KEY="your_openai_key"         # For OpenAI models as judges
AA_TOKEN="your_aleph_alpha_token"        # For Aleph Alpha API

# Optional: Inference endpoints
AA_INFERENCE_ENDPOINT="your_inference_url"

# Debug mode
DEBUG=false

Docker Installation¶

Available Dockerfiles¶

Dockerfile	Purpose
`Dockerfile`	Main evaluation framework with CUDA support
`Dockerfile_codebench`	Specialized for BigCodeBench coding tasks
`Dockerfile_Determined`	For Determined.ai cluster deployments

Build from Source¶

Main Evaluation Framework¶

# Build main image (uses Dockerfile)
docker build -t eval-framework .

# Run with GPU support
docker run -it --gpus all -v $(pwd):/workspace eval-framework

Specialized Builds¶

# BigCodeBench coding tasks
docker build -f Dockerfile_codebench -t eval-framework-codebench .

# Determined.ai cluster deployment
docker build -f Dockerfile_Determined -t eval-framework-determined .

PyPI installation¶

It is also possible to download Eval-Framework through pip:

pip install eval-framework

# or with optional extras
pip install eval-framework[transformers]

However, we recommend using the uv solver to avoid many dependency version issues, so:

uv pip install eval-framework

# or with optional extras
uv pip install eval-framework[transformers]

This allows you to run an evaluation without going through uv run:

eval_framework \
    --models src/eval_framework/llm/models.py \
    --llm-name Smollm135MInstruct \
    --task-name "MMLU" \
    --task-subjects "abstract_algebra" "anatomy" \
    --output-dir ./eval_results \
    --num-fewshot 5 \
    --num-samples 10