Installation¶
This guide provides detailed installation instructions and dependency information for the eval-framework.
1. Install uv¶
Follow the official installation instructions.
2. Install Eval Framework¶
Clone the repository and install all dependencies, including optional extras:
# Clone the repository
git clone https://github.com/Aleph-Alpha-Research/eval-framework/tree/main
cd eval-framework
# Install all dependencies
uv sync --all-extras
To select specific optional features, you can install them individually. Available extras are:
apifor Aleph Alpha client inferencecometfor COMET metricdeterminedfor distributed evaluationmistralfor Mistral model inferencetransformersfor HuggingFace inferencevllmfor VLLM inferenceallinstalls all extras
You can install them as follows:
uv sync --extra transformers
Or, you can install group extras like flash-attn:
# Install flash-attention optional extra (requires compilation)
uv sync --all-extras --group flash-attn
3. Test Your Installation¶
uv run eval_framework \
--models src/eval_framework/llm/models.py \
--llm-name Smollm135MInstruct \
--task-name "MMLU" \
--task-subjects "abstract_algebra" "anatomy" \
--output-dir ./eval_results \
--num-fewshot 5 \
--num-samples 10
Environment Configuration¶
Environment Variables¶
Create a .env file in the project root:
# API Keys (if using external models)
HF_TOKEN="your_huggingface_token" # For private HuggingFace models
OPENAI_API_KEY="your_openai_key" # For OpenAI models as judges
AA_TOKEN="your_aleph_alpha_token" # For Aleph Alpha API
# Optional: Inference endpoints
AA_INFERENCE_ENDPOINT="your_inference_url"
# Debug mode
DEBUG=false
Docker Installation¶
Available Dockerfiles¶
Dockerfile |
Purpose |
|---|---|
|
Main evaluation framework with CUDA support |
|
Specialized for BigCodeBench coding tasks |
|
For Determined.ai cluster deployments |
Build from Source¶
Main Evaluation Framework¶
# Build main image (uses Dockerfile)
docker build -t eval-framework .
# Run with GPU support
docker run -it --gpus all -v $(pwd):/workspace eval-framework
Specialized Builds¶
# BigCodeBench coding tasks
docker build -f Dockerfile_codebench -t eval-framework-codebench .
# Determined.ai cluster deployment
docker build -f Dockerfile_Determined -t eval-framework-determined .
PyPI installation¶
It is also possible to download Eval-Framework through pip:
pip install eval-framework
# or with optional extras
pip install eval-framework[transformers]
However, we recommend using the uv solver to avoid many dependency version issues, so:
uv pip install eval-framework
# or with optional extras
uv pip install eval-framework[transformers]
This allows you to run an evaluation without going through uv run:
eval_framework \
--models src/eval_framework/llm/models.py \
--llm-name Smollm135MInstruct \
--task-name "MMLU" \
--task-subjects "abstract_algebra" "anatomy" \
--output-dir ./eval_results \
--num-fewshot 5 \
--num-samples 10