Installation

This guide provides detailed installation instructions and dependency information for the eval-framework.

1. Install uv

Follow the official installation instructions.

2. Install Eval Framework

Clone the repository and install all dependencies, including optional extras:

# Clone the repository
git clone https://github.com/Aleph-Alpha-Research/eval-framework/tree/main
cd eval-framework

# Install all dependencies
uv sync --all-extras

To select specific optional features, you can install them individually. Available extras are:

  • api for Aleph Alpha client inference

  • comet for COMET metric

  • determined for distributed evaluation

  • mistral for Mistral model inference

  • transformers for HuggingFace inference

  • vllm for VLLM inference

  • all installs all extras

You can install them as follows:

   uv sync --extra transformers

Or, you can install group extras like flash-attn:

# Install flash-attention optional extra (requires compilation)
uv sync --all-extras --group flash-attn

3. Test Your Installation

uv run eval_framework \
    --models src/eval_framework/llm/models.py \
    --llm-name Smollm135MInstruct \
    --task-name "MMLU" \
    --task-subjects "abstract_algebra" "anatomy" \
    --output-dir ./eval_results \
    --num-fewshot 5 \
    --num-samples 10

Environment Configuration

Environment Variables

Create a .env file in the project root:

# API Keys (if using external models)
HF_TOKEN="your_huggingface_token"        # For private HuggingFace models
OPENAI_API_KEY="your_openai_key"         # For OpenAI models as judges
AA_TOKEN="your_aleph_alpha_token"        # For Aleph Alpha API

# Optional: Inference endpoints
AA_INFERENCE_ENDPOINT="your_inference_url"

# Debug mode
DEBUG=false

Docker Installation

Available Dockerfiles

Dockerfile

Purpose

Dockerfile

Main evaluation framework with CUDA support

Dockerfile_codebench

Specialized for BigCodeBench coding tasks

Dockerfile_Determined

For Determined.ai cluster deployments

Build from Source

Main Evaluation Framework

# Build main image (uses Dockerfile)
docker build -t eval-framework .

# Run with GPU support
docker run -it --gpus all -v $(pwd):/workspace eval-framework

Specialized Builds

# BigCodeBench coding tasks
docker build -f Dockerfile_codebench -t eval-framework-codebench .

# Determined.ai cluster deployment
docker build -f Dockerfile_Determined -t eval-framework-determined .

PyPI installation

It is also possible to download Eval-Framework through pip:

pip install eval-framework

# or with optional extras
pip install eval-framework[transformers]

However, we recommend using the uv solver to avoid many dependency version issues, so:

uv pip install eval-framework

# or with optional extras
uv pip install eval-framework[transformers]

This allows you to run an evaluation without going through uv run:

eval_framework \
    --models src/eval_framework/llm/models.py \
    --llm-name Smollm135MInstruct \
    --task-name "MMLU" \
    --task-subjects "abstract_algebra" "anatomy" \
    --output-dir ./eval_results \
    --num-fewshot 5 \
    --num-samples 10