Utils in eval-framework¶
Plot results¶
A basic utility to plot evaluation results from a set of JSON files is provided as example in utils/plot-results.py.
This script reads JSON files containing evaluation results, filters them based on specified criteria, and generates a
plot using Matplotlib. This script can be adjusted to your plotting need and will produce a basic plot if run with:
# loop over all tasks and models under a given parent folder
uv run python utils/plot-results.py --folder PARENT_RESULTS_FOLDER
More CLI arguments are available and can be listed with uv run python utils/plot-results.py --help.
Inspect JSON results¶
The detailed results and completions for each sample are saved as a JSONL. To help inspecting this file a basic utility script is provided that print the content of the file, split the line in a readable way and colorize the output.
For example:
uv run python utils/inspect-jsonl.py output.jsonl --highlight prompt,completion --strip messages,eval_kwargs,raw_completion
Use uv run python utils/inspect-jsonl.py --help to get all CLI arguments.
Document benchmark tasks¶
The utils/generate-task-docs.py can be use to update or further detail the automated task description. This script is discussed in
docs/installation.md.