eval_framework.context package¶
Submodules¶
eval_framework.context.determined module¶
- class eval_framework.context.determined.DeterminedContext(**kwargs)[source]¶
Bases:
EvalContext- Parameters:
kwargs (Any)
- class eval_framework.context.determined.Hyperparameters(**data)[source]¶
Bases:
BaseModel- Parameters:
llm_name (str)
output_dir (Path)
hf_upload_dir (str | None)
hf_upload_repo (str | None)
wandb_project (str | None)
wandb_entity (str | None)
wandb_run_id (str | None)
wandb_upload_results (bool | None)
description (str | None)
task_args (TaskArgs)
llm_args (dict[str, Any] | None)
extra_task_modules (list[str] | None)
delete_output_dir_after_upload (bool | None)
- delete_output_dir_after_upload: bool | None¶
- description: str | None¶
- extra_task_modules: list[str] | None¶
- hf_upload_dir: str | None¶
- hf_upload_repo: str | None¶
- llm_args: dict[str, Any] | None¶
- llm_name: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_dir: Path¶
- wandb_entity: str | None¶
- wandb_project: str | None¶
- wandb_run_id: str | None¶
- wandb_upload_results: bool | None¶
- class eval_framework.context.determined.TaskArgs(**data)[source]¶
Bases:
BaseModel- Parameters:
task_name (Annotated[str, AfterValidator(func=~eval_framework.tasks.registry.validate_task_name)])
num_fewshot (int)
num_samples (int | None)
max_tokens (int | None)
batch_size (int | None)
judge_model_name (str | None)
judge_model_args (dict[str, Any])
task_subjects (list[str] | None)
hf_revision (str | None)
perturbation_config (PerturbationConfig | None)
repeats (int | None)
- batch_size: int | None¶
- hf_revision: str | None¶
- judge_model_args: dict[str, Any]¶
- judge_model_name: str | None¶
- max_tokens: int | None¶
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_fewshot: int¶
- num_samples: int | None¶
- perturbation_config: PerturbationConfig | None¶
- repeats: int | None¶
- task_name: Annotated[str, AfterValidator(func=validate_task_name)]¶
- task_subjects: list[str] | None¶
eval_framework.context.eval module¶
- class eval_framework.context.eval.EvalContext(llm_name, models_path, num_samples=None, max_tokens=None, num_fewshot=None, task_name=None, task_subjects=None, hf_revision=None, output_dir=None, wandb_project=None, wandb_entity=None, wandb_run_id=None, wandb_upload_results=None, hf_upload_dir=None, hf_upload_repo=None, llm_args=None, judge_models_path=None, judge_model_name=None, judge_model_args=None, batch_size=None, description=None, perturbation_type=None, perturbation_probability=None, perturbation_seed=None, randomize_judge_order=False, delete_output_dir_after_upload=None, repeats=None)[source]¶
Bases:
AbstractContextManager- Parameters:
llm_name (str)
models_path (Path)
num_samples (int | None)
max_tokens (int | None)
num_fewshot (int | None)
task_name (str | None)
task_subjects (list[str] | None)
hf_revision (str | None)
output_dir (Path | None)
wandb_project (str | None)
wandb_entity (str | None)
wandb_run_id (str | None)
wandb_upload_results (bool | None)
hf_upload_dir (str | None)
hf_upload_repo (str | None)
llm_args (dict[str, Any] | None)
judge_models_path (Path | None)
judge_model_name (str | None)
judge_model_args (dict[str, Any] | None)
batch_size (int | None)
description (str | None)
perturbation_type (str | None)
perturbation_probability (float | None)
perturbation_seed (int | None)
randomize_judge_order (bool)
delete_output_dir_after_upload (bool | None)
repeats (int | None)
eval_framework.context.local module¶
- class eval_framework.context.local.LocalContext(llm_name, models_path, num_samples=None, max_tokens=None, num_fewshot=None, task_name=None, task_subjects=None, hf_revision=None, output_dir=None, wandb_project=None, wandb_entity=None, wandb_run_id=None, wandb_upload_results=None, hf_upload_dir=None, hf_upload_repo=None, llm_args=None, judge_models_path=None, judge_model_name=None, judge_model_args=None, batch_size=None, description=None, perturbation_type=None, perturbation_probability=None, perturbation_seed=None, randomize_judge_order=False, delete_output_dir_after_upload=None, repeats=None)[source]¶
Bases:
EvalContext- Parameters:
llm_name (str)
models_path (Path)
num_samples (int | None)
max_tokens (int | None)
num_fewshot (int | None)
task_name (str | None)
task_subjects (list[str] | None)
hf_revision (str | None)
output_dir (Path | None)
wandb_project (str | None)
wandb_entity (str | None)
wandb_run_id (str | None)
wandb_upload_results (bool | None)
hf_upload_dir (str | None)
hf_upload_repo (str | None)
llm_args (dict[str, Any] | None)
judge_models_path (Path | None)
judge_model_name (str | None)
judge_model_args (dict[str, Any] | None)
batch_size (int | None)
description (str | None)
perturbation_type (str | None)
perturbation_probability (float | None)
perturbation_seed (int | None)
randomize_judge_order (bool)
delete_output_dir_after_upload (bool | None)
repeats (int | None)