eval_framework.result_processors package

Submodules

eval_framework.result_processors.base module

class eval_framework.result_processors.base.Result(**data)[source]

Bases: BaseModel

Parameters:
  • id (int)

  • subject (str)

  • num_fewshot (int)

  • llm_name (str)

  • task_name (str)

  • metric_class_name (str)

  • metric_name (str)

  • key (str | None)

  • value (float | None)

  • higher_is_better (bool)

  • prompt (str)

  • response (str)

  • llm_judge_prompt (str | None)

  • llm_judge_response (str | None)

  • code_execution_trace (str | None)

  • error (Error | None)

code_execution_trace: str | None
error: Error | None
higher_is_better: bool
id: int
key: str | None
llm_judge_prompt: str | None
llm_judge_response: str | None
llm_name: str
metric_class_name: str
metric_name: str
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_fewshot: int
prompt: str
response: str
subject: str
task_name: str
value: float | None
class eval_framework.result_processors.base.ResultProcessor[source]

Bases: ABC

abstractmethod load_metadata()[source]

Load metadata.

Return type:

dict

abstractmethod load_metrics_results()[source]

Load the aggregated results.

Return type:

list[Result]

abstractmethod load_responses()[source]

Load a list of response objects.

Return type:

list[Completion | Loglikelihood]

abstractmethod save_aggregated_results(result)[source]

Save the aggregated results.

Return type:

None

Parameters:

result (dict[str, float | None])

abstractmethod save_metadata(metadata)[source]

Save metadata.

Return type:

None

Parameters:

metadata (dict)

abstractmethod save_metrics_result(result)[source]

Save a single metric result (append into a file).

Return type:

None

Parameters:

result (Result)

abstractmethod save_metrics_results(results)[source]

Save the results of the metrics (overwrite a file).

Return type:

None

Parameters:

results (list[Result])

abstractmethod save_response(response)[source]

Save a single response object (append into a file).

Return type:

None

Parameters:

response (Completion | Loglikelihood)

abstractmethod save_responses(responses)[source]

Save a list of response objects (overwrite a file).

Return type:

None

Parameters:

responses (list[Completion | Loglikelihood])

class eval_framework.result_processors.base.ResultsUploader[source]

Bases: ABC

abstractmethod upload(llm_name, config, output_dir)[source]

Upload relevant parts from output_dir to the desired destination. Returns True if upload was successful, False otherwise.

Return type:

bool

Parameters:
  • llm_name (str)

  • config (EvalConfig)

  • output_dir (Path)

eval_framework.result_processors.hf_uploader module

Module for writing result folder and its contents to HuggingFace

class eval_framework.result_processors.hf_uploader.HFUploader(config)[source]

Bases: ResultsUploader

Parameters:

config (EvalConfig)

upload(llm_name, config, output_dir)[source]

Upload relevant parts from output_dir to the desired destination. Returns True if upload was successful, False otherwise.

Return type:

bool

Parameters:
  • llm_name (str)

  • config (EvalConfig)

  • output_dir (Path)

eval_framework.result_processors.result_processor module

class eval_framework.result_processors.result_processor.ResultsFileProcessor(output_dir)[source]

Bases: ResultProcessor

Parameters:

output_dir (Path)

load_metadata()[source]

Load metadata.

Return type:

dict

load_metrics_results()[source]

Load the aggregated results.

Return type:

list[Result]

load_responses()[source]

Load a list of response objects.

Return type:

list[Completion | Loglikelihood]

save_aggregated_results(results)[source]

Save the aggregated results.

Return type:

None

Parameters:

results (dict[str, float | None])

save_metadata(metadata)[source]

Save metadata.

Return type:

None

Parameters:

metadata (dict)

save_metrics_result(result)[source]

Save a single metric result (append into a file).

Return type:

None

Parameters:

result (Result)

save_metrics_results(results)[source]

Save the results of the metrics (overwrite a file).

Return type:

None

Parameters:

results (list[Result])

save_response(response)[source]

Save a single response object (append into a file).

Return type:

None

Parameters:

response (Completion | Loglikelihood)

save_responses(responses)[source]

Save a list of response objects (overwrite a file).

Return type:

None

Parameters:

responses (list[Completion | Loglikelihood])

eval_framework.result_processors.result_processor.generate_output_dir(llm_name, config)[source]
Return type:

Path

Parameters:

eval_framework.result_processors.wandb_uploader module

Module for writing result folder to a W&B artifact

class eval_framework.result_processors.wandb_uploader.WandbUploader(config, include_all=True, compress_non_json=True, wandb_registry=None)[source]

Bases: ResultsUploader

Parameters:
  • config (EvalConfig)

  • include_all (bool)

  • compress_non_json (bool)

  • wandb_registry (str | None)

upload(llm_name, config, output_dir)[source]

Upload relevant parts from output_dir to the desired destination. Returns True if upload was successful, False otherwise.

Return type:

bool

Parameters:
  • llm_name (str)

  • config (EvalConfig)

  • output_dir (Path)

eval_framework.result_processors.wandb_uploader.artifact_upload_function(artifact_name, subpath, file_paths)[source]
Return type:

str | None

Parameters:
  • artifact_name (str)

  • subpath (str)

  • file_paths (list[Path])

eval_framework.result_processors.wandb_uploader.register_artifact_upload_function(func)[source]
Return type:

None

Parameters:

func (Callable[[str, str, list[Path]], str | None] | None)

Module contents