eval_framework.metrics.loglikelihood package¶
Submodules¶
eval_framework.metrics.loglikelihood.accuracy_loglikelihood module¶
- class eval_framework.metrics.loglikelihood.accuracy_loglikelihood.AccuracyLoglikelihood[source]¶
Bases:
BaseMetric[Loglikelihood]- NAME: str = 'Accuracy Loglikelihood'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)
- class eval_framework.metrics.loglikelihood.accuracy_loglikelihood.AccuracyNormLoglikelihood[source]¶
Bases:
BaseMetric[Loglikelihood]- NAME: str = 'Accuracy Normalized Loglikelihood'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)
eval_framework.metrics.loglikelihood.base module¶
- class eval_framework.metrics.loglikelihood.base.BaseLoglikelihoodMetric(*, len_normalised=True)[source]¶
Bases:
BaseMetric[Loglikelihood]Base class for metrics that operate on loglikelihood responses.
- Parameters:
len_normalised (bool)
eval_framework.metrics.loglikelihood.confidence_weighted_accuracy module¶
- class eval_framework.metrics.loglikelihood.confidence_weighted_accuracy.ConfidenceWeightedAccuracy(*, len_normalised=True)[source]¶
Bases:
BaseLoglikelihoodMetric- Parameters:
len_normalised (bool)
- NAME: str = 'Confidence-weighted Accuracy'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)
eval_framework.metrics.loglikelihood.dcs module¶
- class eval_framework.metrics.loglikelihood.dcs.DistributionalCorrectnessScore(*, lc=1.0, lw=1.0, len_normalised=True)[source]¶
Bases:
BaseLoglikelihoodMetricBased on Burns (2025) Measuring Language Model Hallucinations Through Distributional Correctness.
- Parameters:
lc (float)
lw (float)
len_normalised (bool)
- NAME: str = 'Distributional Correctness Score'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)
eval_framework.metrics.loglikelihood.probability_mass module¶
- class eval_framework.metrics.loglikelihood.probability_mass.ProbabilityMass[source]¶
Bases:
BaseMetric[Loglikelihood]- NAME: str = 'Probability Mass'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)
- class eval_framework.metrics.loglikelihood.probability_mass.ProbabilityMassNorm[source]¶
Bases:
BaseMetric[Loglikelihood]- NAME: str = 'Probability Mass Normalized'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)
eval_framework.metrics.loglikelihood.ternary module¶
- class eval_framework.metrics.loglikelihood.ternary.TernaryScore(*, lc=1.0, lw=1.0, len_normalised=True)[source]¶
Bases:
BaseLoglikelihoodMetricBased on Kalai et al. (2025) Why language models hallucinate. arXiv:2509.04664
- Parameters:
lc (float)
lw (float)
len_normalised (bool)
- NAME: str = 'Ternary Score'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Loglikelihood)