eval_framework.context package

Submodules

eval_framework.context.determined module

class eval_framework.context.determined.DeterminedContext(**kwargs)[source]

Bases: EvalContext

Parameters:

kwargs (Any)

get_trial_id()[source]
Return type:

int | None

should_preempt()[source]
Return type:

bool

class eval_framework.context.determined.Hyperparameters(**data)[source]

Bases: BaseModel

Parameters:
  • llm_name (str)

  • output_dir (Path)

  • hf_upload_dir (str | None)

  • hf_upload_repo (str | None)

  • wandb_project (str | None)

  • wandb_entity (str | None)

  • wandb_run_id (str | None)

  • wandb_upload_results (bool | None)

  • description (str | None)

  • task_args (TaskArgs)

  • llm_args (dict[str, Any] | None)

  • extra_task_modules (list[str] | None)

  • delete_output_dir_after_upload (bool | None)

delete_output_dir_after_upload: bool | None
description: str | None
extra_task_modules: list[str] | None
hf_upload_dir: str | None
hf_upload_repo: str | None
llm_args: dict[str, Any] | None
llm_name: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_dir: Path
task_args: TaskArgs
wandb_entity: str | None
wandb_project: str | None
wandb_run_id: str | None
wandb_upload_results: bool | None
class eval_framework.context.determined.TaskArgs(**data)[source]

Bases: BaseModel

Parameters:
  • task_name (Annotated[str, AfterValidator(func=~eval_framework.tasks.registry.validate_task_name)])

  • num_fewshot (int)

  • num_samples (int | None)

  • max_tokens (int | None)

  • batch_size (int | None)

  • judge_model_name (str | None)

  • judge_model_args (dict[str, Any])

  • task_subjects (list[str] | None)

  • hf_revision (str | None)

  • perturbation_config (PerturbationConfig | None)

  • repeats (int | None)

batch_size: int | None
hf_revision: str | None
judge_model_args: dict[str, Any]
judge_model_name: str | None
max_tokens: int | None
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_fewshot: int
num_samples: int | None
perturbation_config: PerturbationConfig | None
repeats: int | None
task_name: Annotated[str, AfterValidator(func=validate_task_name)]
task_subjects: list[str] | None

eval_framework.context.eval module

class eval_framework.context.eval.EvalContext(llm_name, models_path, num_samples=None, max_tokens=None, num_fewshot=None, task_name=None, task_subjects=None, hf_revision=None, output_dir=None, wandb_project=None, wandb_entity=None, wandb_run_id=None, wandb_upload_results=None, hf_upload_dir=None, hf_upload_repo=None, llm_args=None, judge_models_path=None, judge_model_name=None, judge_model_args=None, batch_size=None, description=None, perturbation_type=None, perturbation_probability=None, perturbation_seed=None, randomize_judge_order=False, delete_output_dir_after_upload=None, repeats=None)[source]

Bases: AbstractContextManager

Parameters:
  • llm_name (str)

  • models_path (Path)

  • num_samples (int | None)

  • max_tokens (int | None)

  • num_fewshot (int | None)

  • task_name (str | None)

  • task_subjects (list[str] | None)

  • hf_revision (str | None)

  • output_dir (Path | None)

  • wandb_project (str | None)

  • wandb_entity (str | None)

  • wandb_run_id (str | None)

  • wandb_upload_results (bool | None)

  • hf_upload_dir (str | None)

  • hf_upload_repo (str | None)

  • llm_args (dict[str, Any] | None)

  • judge_models_path (Path | None)

  • judge_model_name (str | None)

  • judge_model_args (dict[str, Any] | None)

  • batch_size (int | None)

  • description (str | None)

  • perturbation_type (str | None)

  • perturbation_probability (float | None)

  • perturbation_seed (int | None)

  • randomize_judge_order (bool)

  • delete_output_dir_after_upload (bool | None)

  • repeats (int | None)

get_trial_id()[source]
Return type:

int | None

should_preempt()[source]
Return type:

bool

eval_framework.context.eval.import_models(models_file)[source]
Return type:

dict[str, type[BaseLLM]]

Parameters:

models_file (PathLike | str)

eval_framework.context.local module

class eval_framework.context.local.LocalContext(llm_name, models_path, num_samples=None, max_tokens=None, num_fewshot=None, task_name=None, task_subjects=None, hf_revision=None, output_dir=None, wandb_project=None, wandb_entity=None, wandb_run_id=None, wandb_upload_results=None, hf_upload_dir=None, hf_upload_repo=None, llm_args=None, judge_models_path=None, judge_model_name=None, judge_model_args=None, batch_size=None, description=None, perturbation_type=None, perturbation_probability=None, perturbation_seed=None, randomize_judge_order=False, delete_output_dir_after_upload=None, repeats=None)[source]

Bases: EvalContext

Parameters:
  • llm_name (str)

  • models_path (Path)

  • num_samples (int | None)

  • max_tokens (int | None)

  • num_fewshot (int | None)

  • task_name (str | None)

  • task_subjects (list[str] | None)

  • hf_revision (str | None)

  • output_dir (Path | None)

  • wandb_project (str | None)

  • wandb_entity (str | None)

  • wandb_run_id (str | None)

  • wandb_upload_results (bool | None)

  • hf_upload_dir (str | None)

  • hf_upload_repo (str | None)

  • llm_args (dict[str, Any] | None)

  • judge_models_path (Path | None)

  • judge_model_name (str | None)

  • judge_model_args (dict[str, Any] | None)

  • batch_size (int | None)

  • description (str | None)

  • perturbation_type (str | None)

  • perturbation_probability (float | None)

  • perturbation_seed (int | None)

  • randomize_judge_order (bool)

  • delete_output_dir_after_upload (bool | None)

  • repeats (int | None)

Module contents