eval_framework package¶

Subpackages¶

Submodules¶

eval_framework.base_config module¶

class eval_framework.base_config.BaseConfig(**data)[source]¶

Bases: BaseModel

as_dict()[source]¶

Return type:: dict[str, Any]

classmethod from_yaml(yml_filename)[source]¶

Return type:: BaseConfig
Parameters:: yml_filename (str | Path)

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'protected_namespaces': ()}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

save(out_file)[source]¶

Return type:: None
Parameters:: out_file (Path)

eval_framework.evaluation_generator module¶

class eval_framework.evaluation_generator.EvaluationGenerator(config, result_processor)[source]¶

Bases: object

Parameters:

config (EvalConfig)
result_processor (ResultProcessor)

run_eval()[source]¶

Runs evaluation using saved completions.

Return type:: list[Result]

eval_framework.exceptions module¶

exception eval_framework.exceptions.LogicError[source]¶: Bases: Exception

eval_framework.logger module¶

eval_framework.main module¶

eval_framework.main.main(llm, config, should_preempt_callable=None, trial_id=None, *args, resource_cleanup=False, verbosity=1)[source]¶

Runs the entire evaluation process: responses generation and evaluation.

Return type:

list[Result]

Parameters:

llm (BaseLLM)
config (EvalConfig)
should_preempt_callable (Callable[[], bool] | None)
trial_id (int | None)
args (Any)
resource_cleanup (bool)
verbosity (int)

eval_framework.response_generator module¶

class eval_framework.response_generator.ResponseGenerator(llm, config, result_processor)[source]¶

Bases: object

Parameters:

llm (BaseLLM)
config (EvalConfig)
result_processor (ResultsFileProcessor)

generate(should_preempt_callable)[source]¶

Generates responses and saves them along with metadata. :type should_preempt_callable: Callable[[], bool] :param should_preempt_callable: function to check if preempt is called

Return type:: tuple[list[Completion | Loglikelihood], bool]
Returns:: list of responses, preempted: whether the process was preempted or not
Parameters:: should_preempt_callable (Callable[[], bool])

eval_framework.response_generator.map_language_to_value(language)[source]¶

Return type:: str | dict[str, str] | dict[str, tuple[str, str]] | None
Parameters:: language (Language | dict[str, Language] | dict[str, tuple[Language, Language]] | None)

eval_framework.response_generator.repeat_samples(samples, repeats)[source]¶

Flatten repeats into a single stream of samples.

After expansion original sample indices do not point to the same sample anymore. They Original sample can be recovered by original_index = expanded_index // repeats.

Return type:

Iterable[Sample]

Parameters:

samples (Iterable[Sample])
repeats (int)

eval_framework.run module¶

eval_framework.run.parse_args()[source]¶

Return type:: Namespace

eval_framework.run.run()[source]¶

Return type:: None

eval_framework.run.run_with_kwargs(kwargs)[source]¶

Return type:: None
Parameters:: kwargs (dict)

eval_framework package¶

Subpackages¶

Submodules¶

eval_framework.base_config module¶

eval_framework.evaluation_generator module¶

eval_framework.exceptions module¶

eval_framework.logger module¶

eval_framework.main module¶

eval_framework.response_generator module¶

eval_framework.run module¶

eval_framework.run_direct module¶

Module contents¶