eval_framework.context package¶

Submodules¶

eval_framework.context.determined module¶

class eval_framework.context.determined.DeterminedContext(**kwargs)[source]¶

Bases: EvalContext

Parameters:: kwargs (Any)

get_trial_id()[source]¶

Return type:: int | None

should_preempt()[source]¶

Return type:: bool

class eval_framework.context.determined.Hyperparameters(**data)[source]¶

Bases: BaseModel

Parameters:

llm_name (str)
output_dir (Path)
hf_upload_dir (str | None)
hf_upload_repo (str | None)
wandb_project (str | None)
wandb_entity (str | None)
wandb_run_id (str | None)
wandb_upload_results (bool | None)
description (str | None)
task_args (TaskArgs)
llm_args (dict[str, Any] | None)
extra_task_modules (list[str] | None)
delete_output_dir_after_upload (bool | None)

delete_output_dir_after_upload: bool | None¶

description: str | None¶

extra_task_modules: list[str] | None¶

hf_upload_dir: str | None¶

hf_upload_repo: str | None¶

llm_args: dict[str, Any] | None¶

llm_name: str¶

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_dir: Path¶

task_args: TaskArgs¶

wandb_entity: str | None¶

wandb_project: str | None¶

wandb_run_id: str | None¶

wandb_upload_results: bool | None¶

class eval_framework.context.determined.TaskArgs(**data)[source]¶

Bases: BaseModel

Parameters:

task_name (Annotated[str, AfterValidator(func=~eval_framework.tasks.registry.validate_task_name)])
num_fewshot (int)
num_samples (int | None)
max_tokens (int | None)
batch_size (int | None)
judge_model_name (str | None)
judge_model_args (dict[str, Any])
task_subjects (list[str] | None)
hf_revision (str | None)
perturbation_config (PerturbationConfig | None)
repeats (int | None)

batch_size: int | None¶

hf_revision: str | None¶

judge_model_args: dict[str, Any]¶

judge_model_name: str | None¶

max_tokens: int | None¶

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_fewshot: int¶

num_samples: int | None¶

perturbation_config: PerturbationConfig | None¶

repeats: int | None¶

task_name: Annotated[str, AfterValidator(func=validate_task_name)]¶

task_subjects: list[str] | None¶

eval_framework.context.eval module¶

class eval_framework.context.eval.EvalContext(llm_name, models_path, num_samples=None, max_tokens=None, num_fewshot=None, task_name=None, task_subjects=None, hf_revision=None, output_dir=None, wandb_project=None, wandb_entity=None, wandb_run_id=None, wandb_upload_results=None, hf_upload_dir=None, hf_upload_repo=None, llm_args=None, judge_models_path=None, judge_model_name=None, judge_model_args=None, batch_size=None, description=None, perturbation_type=None, perturbation_probability=None, perturbation_seed=None, randomize_judge_order=False, delete_output_dir_after_upload=None, repeats=None)[source]¶

Bases: AbstractContextManager

Parameters:

llm_name (str)
models_path (Path)
num_samples (int | None)
max_tokens (int | None)
num_fewshot (int | None)
task_name (str | None)
task_subjects (list[str] | None)
hf_revision (str | None)
output_dir (Path | None)
wandb_project (str | None)
wandb_entity (str | None)
wandb_run_id (str | None)
wandb_upload_results (bool | None)
hf_upload_dir (str | None)
hf_upload_repo (str | None)
llm_args (dict[str, Any] | None)
judge_models_path (Path | None)
judge_model_name (str | None)
judge_model_args (dict[str, Any] | None)
batch_size (int | None)
description (str | None)
perturbation_type (str | None)
perturbation_probability (float | None)
perturbation_seed (int | None)
randomize_judge_order (bool)
delete_output_dir_after_upload (bool | None)
repeats (int | None)

get_trial_id()[source]¶

Return type:: int | None

should_preempt()[source]¶

Return type:: bool

eval_framework.context.eval.import_models(models_file)[source]¶

Return type:: dict[str, type[BaseLLM]]
Parameters:: models_file (PathLike | str)

eval_framework.context.local module¶

class eval_framework.context.local.LocalContext(llm_name, models_path, num_samples=None, max_tokens=None, num_fewshot=None, task_name=None, task_subjects=None, hf_revision=None, output_dir=None, wandb_project=None, wandb_entity=None, wandb_run_id=None, wandb_upload_results=None, hf_upload_dir=None, hf_upload_repo=None, llm_args=None, judge_models_path=None, judge_model_name=None, judge_model_args=None, batch_size=None, description=None, perturbation_type=None, perturbation_probability=None, perturbation_seed=None, randomize_judge_order=False, delete_output_dir_after_upload=None, repeats=None)[source]¶

Bases: EvalContext

Parameters:

llm_name (str)
models_path (Path)
num_samples (int | None)
max_tokens (int | None)
num_fewshot (int | None)
task_name (str | None)
task_subjects (list[str] | None)
hf_revision (str | None)
output_dir (Path | None)
wandb_project (str | None)
wandb_entity (str | None)
wandb_run_id (str | None)
wandb_upload_results (bool | None)
hf_upload_dir (str | None)
hf_upload_repo (str | None)
llm_args (dict[str, Any] | None)
judge_models_path (Path | None)
judge_model_name (str | None)
judge_model_args (dict[str, Any] | None)
batch_size (int | None)
description (str | None)
perturbation_type (str | None)
perturbation_probability (float | None)
perturbation_seed (int | None)
randomize_judge_order (bool)
delete_output_dir_after_upload (bool | None)
repeats (int | None)

eval_framework.context package¶

Submodules¶

eval_framework.context.determined module¶

eval_framework.context.eval module¶

eval_framework.context.local module¶

Module contents¶