eval_framework package

Subpackages

Submodules

eval_framework.base_config module

class eval_framework.base_config.BaseConfig(**data)[source]

Bases: BaseModel

as_dict()[source]
Return type:

dict[str, Any]

classmethod from_yaml(yml_filename)[source]
Return type:

BaseConfig

Parameters:

yml_filename (str | Path)

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

save(out_file)[source]
Return type:

None

Parameters:

out_file (Path)

eval_framework.evaluation_generator module

class eval_framework.evaluation_generator.EvaluationGenerator(config, result_processor)[source]

Bases: object

Parameters:
run_eval()[source]

Runs evaluation using saved completions.

Return type:

list[Result]

eval_framework.exceptions module

exception eval_framework.exceptions.LogicError[source]

Bases: Exception

eval_framework.logger module

eval_framework.main module

eval_framework.main.main(llm, config, should_preempt_callable=None, trial_id=None, *args, resource_cleanup=False, verbosity=1)[source]

Runs the entire evaluation process: responses generation and evaluation.

Return type:

list[Result]

Parameters:
  • llm (BaseLLM)

  • config (EvalConfig)

  • should_preempt_callable (Callable[[], bool] | None)

  • trial_id (int | None)

  • args (Any)

  • resource_cleanup (bool)

  • verbosity (int)

eval_framework.response_generator module

class eval_framework.response_generator.ResponseGenerator(llm, config, result_processor)[source]

Bases: object

Parameters:
generate(should_preempt_callable)[source]

Generates responses and saves them along with metadata. :type should_preempt_callable: Callable[[], bool] :param should_preempt_callable: function to check if preempt is called :rtype: tuple[list[Completion | Loglikelihood], bool] :return: list of responses, preempted: whether the process was preempted or not

Parameters:

should_preempt_callable (Callable[[], bool])

Return type:

tuple[list[Completion | Loglikelihood], bool]

eval_framework.response_generator.map_language_to_value(language)[source]
Return type:

str | dict[str, str] | dict[str, tuple[str, str]] | None

Parameters:

language (Language | dict[str, Language] | dict[str, tuple[Language, Language]] | None)

eval_framework.response_generator.repeat_samples(samples, repeats)[source]

Flatten repeats into a single stream of samples.

After expansion original sample indices do not point to the same sample anymore. They Original sample can be recovered by original_index = expanded_index // repeats.

Return type:

Iterable[Sample]

Parameters:
  • samples (Iterable[Sample])

  • repeats (int)

eval_framework.run module

eval_framework.run.parse_args()[source]
Return type:

Namespace

eval_framework.run.run()[source]
Return type:

None

eval_framework.run.run_with_kwargs(kwargs)[source]
Return type:

None

Parameters:

kwargs (dict)

eval_framework.run_direct module

Module contents