eval_framework.metrics.loglikelihood package

Submodules

eval_framework.metrics.loglikelihood.accuracy_loglikelihood module

class eval_framework.metrics.loglikelihood.accuracy_loglikelihood.AccuracyLoglikelihood[source]

Bases: BaseMetric[Loglikelihood]

NAME: str = 'Accuracy Loglikelihood'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

class eval_framework.metrics.loglikelihood.accuracy_loglikelihood.AccuracyNormLoglikelihood[source]

Bases: BaseMetric[Loglikelihood]

NAME: str = 'Accuracy Normalized Loglikelihood'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

eval_framework.metrics.loglikelihood.base module

class eval_framework.metrics.loglikelihood.base.BaseLoglikelihoodMetric(*, len_normalised=True)[source]

Bases: BaseMetric[Loglikelihood]

Base class for metrics that operate on loglikelihood responses.

Parameters:

len_normalised (bool)

eval_framework.metrics.loglikelihood.confidence_weighted_accuracy module

class eval_framework.metrics.loglikelihood.confidence_weighted_accuracy.ConfidenceWeightedAccuracy(*, len_normalised=True)[source]

Bases: BaseLoglikelihoodMetric

Parameters:

len_normalised (bool)

NAME: str = 'Confidence-weighted Accuracy'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

eval_framework.metrics.loglikelihood.dcs module

class eval_framework.metrics.loglikelihood.dcs.DistributionalCorrectnessScore(*, lc=1.0, lw=1.0, len_normalised=True)[source]

Bases: BaseLoglikelihoodMetric

Based on Burns (2025) Measuring Language Model Hallucinations Through Distributional Correctness.

Parameters:
  • lc (float)

  • lw (float)

  • len_normalised (bool)

NAME: str = 'Distributional Correctness Score'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

eval_framework.metrics.loglikelihood.probability_mass module

class eval_framework.metrics.loglikelihood.probability_mass.ProbabilityMass[source]

Bases: BaseMetric[Loglikelihood]

NAME: str = 'Probability Mass'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

class eval_framework.metrics.loglikelihood.probability_mass.ProbabilityMassNorm[source]

Bases: BaseMetric[Loglikelihood]

NAME: str = 'Probability Mass Normalized'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

eval_framework.metrics.loglikelihood.ternary module

class eval_framework.metrics.loglikelihood.ternary.TernaryScore(*, lc=1.0, lw=1.0, len_normalised=True)[source]

Bases: BaseLoglikelihoodMetric

Based on Kalai et al. (2025) Why language models hallucinate. arXiv:2509.04664

Parameters:
  • lc (float)

  • lw (float)

  • len_normalised (bool)

NAME: str = 'Ternary Score'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Loglikelihood)

Module contents