eval_framework.metrics package¶
Subpackages¶
- eval_framework.metrics.completion package
- Submodules
- eval_framework.metrics.completion.accuracy_completion module
- eval_framework.metrics.completion.aidanbench module
- eval_framework.metrics.completion.bleu module
- eval_framework.metrics.completion.chrf module
- eval_framework.metrics.completion.code_assertion module
- eval_framework.metrics.completion.code_execution_pass_at_one module
- eval_framework.metrics.completion.comet module
- eval_framework.metrics.completion.concordance_index module
- eval_framework.metrics.completion.csv_format module
- eval_framework.metrics.completion.cwe_accuracy module
- eval_framework.metrics.completion.exponential_similarity module
- eval_framework.metrics.completion.f1 module
- eval_framework.metrics.completion.format_checker module
- eval_framework.metrics.completion.grid_difference module
- eval_framework.metrics.completion.ifeval module
- eval_framework.metrics.completion.json_format module
- eval_framework.metrics.completion.language_checker module
- eval_framework.metrics.completion.length_control module
- eval_framework.metrics.completion.math_reasoning_completion module
- eval_framework.metrics.completion.niah_accuracy module
- eval_framework.metrics.completion.placeholder_checker module
- eval_framework.metrics.completion.repetition module
- eval_framework.metrics.completion.rouge_1 module
- eval_framework.metrics.completion.rouge_2 module
- eval_framework.metrics.completion.rouge_geometric_mean module
- eval_framework.metrics.completion.rouge_l module
- eval_framework.metrics.completion.struct_eval_metrics module
- eval_framework.metrics.completion.ter module
- eval_framework.metrics.completion.text_counter module
- Module contents
- eval_framework.metrics.efficiency package
- eval_framework.metrics.llm package
- Submodules
- eval_framework.metrics.llm.base module
- eval_framework.metrics.llm.llm_judge_chatbot_style module
- eval_framework.metrics.llm.llm_judge_coherence module
- eval_framework.metrics.llm.llm_judge_completion_accuracy module
- eval_framework.metrics.llm.llm_judge_conciseness module
- eval_framework.metrics.llm.llm_judge_contains_names module
- eval_framework.metrics.llm.llm_judge_format_correctness module
- eval_framework.metrics.llm.llm_judge_instruction module
- eval_framework.metrics.llm.llm_judge_mtbench_pair module
- eval_framework.metrics.llm.llm_judge_mtbench_single module
- eval_framework.metrics.llm.llm_judge_refusal module
- eval_framework.metrics.llm.llm_judge_sql module
- eval_framework.metrics.llm.llm_judge_world_knowledge module
- eval_framework.metrics.llm.utils module
- Module contents
- eval_framework.metrics.loglikelihood package
- Submodules
- eval_framework.metrics.loglikelihood.accuracy_loglikelihood module
- eval_framework.metrics.loglikelihood.base module
- eval_framework.metrics.loglikelihood.confidence_weighted_accuracy module
- eval_framework.metrics.loglikelihood.dcs module
- eval_framework.metrics.loglikelihood.probability_mass module
- eval_framework.metrics.loglikelihood.ternary module
- Module contents
Submodules¶
eval_framework.metrics.base module¶
- class eval_framework.metrics.base.BaseMetric[source]¶
Bases:
ABC,Generic- KEYS: list[str] | None = None¶
- NAME: str¶
- NAMES¶
- abstractmethod calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Response)
- class eval_framework.metrics.base.MetricResult(**data)[source]¶
Bases:
BaseModel- Parameters:
metric_name (str)
value (float | None)
higher_is_better (bool)
llm_judge_prompt (str | None)
llm_judge_response (str | None)
code_execution_trace (str | None)
error (Error | None)
- code_execution_trace: str | None¶
- error: Error | None¶
- higher_is_better: bool¶
- llm_judge_prompt: str | None¶
- llm_judge_response: str | None¶
- metric_name: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: float | None¶