eval_framework.result_processors package¶
Submodules¶
eval_framework.result_processors.base module¶
- class eval_framework.result_processors.base.Result(**data)[source]¶
Bases:
BaseModel- Parameters:
id (int)
subject (str)
num_fewshot (int)
llm_name (str)
task_name (str)
metric_class_name (str)
metric_name (str)
key (str | None)
value (float | None)
higher_is_better (bool)
prompt (str)
response (str)
llm_judge_prompt (str | None)
llm_judge_response (str | None)
code_execution_trace (str | None)
error (Error | None)
- code_execution_trace: str | None¶
- error: Error | None¶
- higher_is_better: bool¶
- id: int¶
- key: str | None¶
- llm_judge_prompt: str | None¶
- llm_judge_response: str | None¶
- llm_name: str¶
- metric_class_name: str¶
- metric_name: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_fewshot: int¶
- prompt: str¶
- response: str¶
- subject: str¶
- task_name: str¶
- value: float | None¶
- class eval_framework.result_processors.base.ResultProcessor[source]¶
Bases:
ABC- abstractmethod load_metrics_results()[source]¶
Load the aggregated results.
- Return type:
list[Result]
- abstractmethod load_responses()[source]¶
Load a list of response objects.
- Return type:
list[Completion|Loglikelihood]
- abstractmethod save_aggregated_results(result)[source]¶
Save the aggregated results.
- Return type:
None- Parameters:
result (dict[str, float | None])
- abstractmethod save_metadata(metadata)[source]¶
Save metadata.
- Return type:
None- Parameters:
metadata (dict)
- abstractmethod save_metrics_result(result)[source]¶
Save a single metric result (append into a file).
- Return type:
None- Parameters:
result (Result)
- abstractmethod save_metrics_results(results)[source]¶
Save the results of the metrics (overwrite a file).
- Return type:
None- Parameters:
results (list[Result])
- class eval_framework.result_processors.base.ResultsUploader[source]¶
Bases:
ABC- abstractmethod upload(llm_name, config, output_dir)[source]¶
Upload relevant parts from output_dir to the desired destination. Returns True if upload was successful, False otherwise.
- Return type:
bool- Parameters:
llm_name (str)
config (EvalConfig)
output_dir (Path)
eval_framework.result_processors.hf_uploader module¶
Module for writing result folder and its contents to HuggingFace
- class eval_framework.result_processors.hf_uploader.HFUploader(config)[source]¶
Bases:
ResultsUploader- Parameters:
config (EvalConfig)
- upload(llm_name, config, output_dir)[source]¶
Upload relevant parts from output_dir to the desired destination. Returns True if upload was successful, False otherwise.
- Return type:
bool- Parameters:
llm_name (str)
config (EvalConfig)
output_dir (Path)
eval_framework.result_processors.result_processor module¶
- class eval_framework.result_processors.result_processor.ResultsFileProcessor(output_dir)[source]¶
Bases:
ResultProcessor- Parameters:
output_dir (Path)
- load_responses()[source]¶
Load a list of response objects.
- Return type:
list[Completion|Loglikelihood]
- save_aggregated_results(results)[source]¶
Save the aggregated results.
- Return type:
None- Parameters:
results (dict[str, float | None])
- save_metrics_result(result)[source]¶
Save a single metric result (append into a file).
- Return type:
None- Parameters:
result (Result)
- save_metrics_results(results)[source]¶
Save the results of the metrics (overwrite a file).
- Return type:
None- Parameters:
results (list[Result])
- eval_framework.result_processors.result_processor.generate_output_dir(llm_name, config)[source]¶
- Return type:
Path- Parameters:
llm_name (str)
config (EvalConfig)
eval_framework.result_processors.wandb_uploader module¶
Module for writing result folder to a W&B artifact
- class eval_framework.result_processors.wandb_uploader.WandbUploader(config, include_all=True, compress_non_json=True, wandb_registry=None)[source]¶
Bases:
ResultsUploader- Parameters:
config (EvalConfig)
include_all (bool)
compress_non_json (bool)
wandb_registry (str | None)
- upload(llm_name, config, output_dir)[source]¶
Upload relevant parts from output_dir to the desired destination. Returns True if upload was successful, False otherwise.
- Return type:
bool- Parameters:
llm_name (str)
config (EvalConfig)
output_dir (Path)