eval_framework.metrics.completion package

Submodules

eval_framework.metrics.completion.accuracy_completion module

class eval_framework.metrics.completion.accuracy_completion.AccuracyCompletion[source]

Bases: BaseMetric[Completion]

NAME: str = 'Accuracy Completion'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.aidanbench module

class eval_framework.metrics.completion.aidanbench.AidanBenchMetric[source]

Bases: BaseMetric[Completion]

NAME: str = 'AidanBench'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.bleu module

class eval_framework.metrics.completion.bleu.BLEU[source]

Bases: BaseMetric[Completion]

The Bilingual Evaluation Understudy Score, or BLEU for short, is a metric for evaluating a generated sentence to a reference sentence. It counts matching n-grams in the candidate translation to n-grams in the reference text, where 1-gram or unigram would be each token and a bigram comparison would be each word pair. The comparison is made regardless of word order Source: https://machinelearningmastery.com/calculate-bleu-score-for-text-python/ Paper: https://www.aclweb.org/anthology/P02-1040/

NAME: str = 'BLEU'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.bleu.LINEWISE_BLEU[source]

Bases: BaseMetric[Completion]

Maximum Line-level BLEU score.

NAME: str = 'Linewise BLEU'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.bleu.ResponseToOriginalBLEU[source]

Bases: BaseMetric[Completion]

NAME: str = 'Response to Original BLEU'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.chrf module

class eval_framework.metrics.completion.chrf.CHRF[source]

Bases: BaseMetric[Completion]

chrF++ is a tool for automatic evaluation of machine translation output based on character n-gram precision and recall enhanced with word n-grams. Source: https://github.com/m-popovic/chrF Paper: https://www.aclweb.org/anthology/W15-3049.pdf

NAME: str = 'chrF'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.chrf.LINEWISE_CHRF[source]

Bases: BaseMetric[Completion]

Maximum Line-level chrF++ (Character n-gram F-score) score. Paper: https://aclanthology.org/W15-3049/

NAME: str = 'Linewise chrF'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.code_assertion module

class eval_framework.metrics.completion.code_assertion.CodeCompletionAssertion[source]

Bases: BaseMetric[Completion]

NAME: str = 'Code Completion Accuracy'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.code_execution_pass_at_one module

class eval_framework.metrics.completion.code_execution_pass_at_one.CodeExecutionBaseContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • run_env (str)

  • code_prompt (str)

  • test_code (str)

  • benchmark_timeout (int)

  • package_downloads (dict[str, str | None])

  • extra_data (Any)

benchmark_timeout: int
code_prompt: str
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

package_downloads: dict[str, str | None]
run_env: str
test_code: str
class eval_framework.metrics.completion.code_execution_pass_at_one.CodeExecutionPassAtOne[source]

Bases: BaseMetric[Completion]

NAME: str = 'code-execution-pass@1'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.code_execution_pass_at_one.CodeExecutionPassAtOneContext(**data)[source]

Bases: CodeExecutionBaseContext

Parameters:
  • run_env (str)

  • code_prompt (str)

  • test_code (str)

  • benchmark_timeout (int)

  • package_downloads (dict[str, str | None])

  • snippet_merge_fn (str)

  • output_parse_fn (str)

  • extra_data (Any)

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_parse_fn: str
snippet_merge_fn: str
class eval_framework.metrics.completion.code_execution_pass_at_one.RealtimeCodeExectionContext(**data)[source]

Bases: CodeExecutionBaseContext

Parameters:
  • run_env (str)

  • code_prompt (str)

  • test_code (str)

  • benchmark_timeout (int)

  • package_downloads (dict[str, str | None])

  • snippet_merge_fn (Callable[[str, str], str])

  • output_parse_fn (Callable[[str], ExecutionResult])

  • extra_data (Any)

classmethod from_context(context)[source]
Return type:

Self

Parameters:

context (CodeExecutionPassAtOneContext)

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_parse_fn: Callable[[str], ExecutionResult]
snippet_merge_fn: Callable[[str, str], str]
eval_framework.metrics.completion.code_execution_pass_at_one.estimate_pass_at_k(n, c, k)[source]

Estimates pass@k for a single problem.

Parameters: n (int): Total number of generated samples. c (int): Number of correct samples. k (int): Number of attempts or samples considered.

Returns: float: The pass@k value.

Return type:

float

Parameters:
  • n (int)

  • c (int)

  • k (int)

eval_framework.metrics.completion.comet module

class eval_framework.metrics.completion.comet.COMET[source]

Bases: BaseMetric[Completion]

COMET is a neural, multilingual framework for evaluating machine translation quality by leveraging cross-lingual pretrained language models to achieve state-of-the-art correlation with human judgments Note: this requires a Hugging Face token with access to the model: https://huggingface.co/Unbabel/XCOMET-XL Source: https://github.com/Unbabel/COMET Paper: https://arxiv.org/abs/2009.09025

NAME: str = 'COMET'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.concordance_index module

class eval_framework.metrics.completion.concordance_index.ConcordanceIndex[source]

Bases: BaseMetric[Completion]

NAME: str = 'ConcordanceIndex'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.concordance_index.calculate_concordance_index(ground_truth, completion)[source]
Return type:

float

Parameters:
  • ground_truth (str)

  • completion (str)

eval_framework.metrics.completion.csv_format module

class eval_framework.metrics.completion.csv_format.CSVFormat[source]

Bases: BaseMetric[Completion]

KEYS: list[str] | None = ['has_csv', 'is_separator_respected', 'is_column_count_respected']
NAME: str = 'CSV Format'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.csv_format.CSVFormatEvaluation(**data)[source]

Bases: BaseModel

Parameters:
  • implicit (bool)

  • has_csv (bool)

  • is_separator_respected (bool)

  • is_column_count_respected (bool)

has_csv: bool
implicit: bool
is_column_count_respected: bool
is_separator_respected: bool
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

eval_framework.metrics.completion.csv_format.evaluate_csv_format(response)[source]
Return type:

CSVFormatEvaluation

Parameters:

response (Completion)

eval_framework.metrics.completion.csv_format.extract_csv_from_text(text, min_rows=2, min_columns=2)[source]
Return type:

tuple[list[str] | None, str | None]

Parameters:
  • text (str)

  • min_rows (int)

  • min_columns (int)

eval_framework.metrics.completion.cwe_accuracy module

class eval_framework.metrics.completion.cwe_accuracy.CWEAccuracy[source]

Bases: BaseMetric[Completion]

Metric for Common Word Extraction tasks

NAME: str = 'CWEAccuracy'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.exponential_similarity module

class eval_framework.metrics.completion.exponential_similarity.ExponentialSimilarity[source]

Bases: BaseMetric[Completion]

NAME: str = 'ExponentialSimilarity'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.exponential_similarity.calculate_exponential_similarity(p_true, p_pred)[source]

Compute the exponential similarity (SpaceDigest version) between the gold percentage and predicted value.

Parameters: - p_true (float): The gold/reference percentage. - p_pred (float): The predicted scalar. - d (float): Base of the exponent. Default is 2. - c (float): Coefficient in exponent. Default is 10.

Returns: - float: Similarity score between 0 and 1.

Return type:

float

Parameters:
  • p_true (float)

  • p_pred (float)

eval_framework.metrics.completion.f1 module

class eval_framework.metrics.completion.f1.F1[source]

Bases: BaseMetric[Completion]

NAME: str = 'F1'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.f1.calculate_f1(ref_tokens, hyp_tokens)[source]

Calculate F1 score between two texts based on token overlap.

Return type:

float

Parameters:
  • ref_tokens (list[Any])

  • hyp_tokens (list[Any])

eval_framework.metrics.completion.format_checker module

class eval_framework.metrics.completion.format_checker.CheckJsonFormat[source]

Bases: BaseMetric[Completion]

NAME: str = 'JSON Format'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.format_checker.CheckPostScriptFormat[source]

Bases: BaseMetric[Completion]

This metric is honestly not that great In the original IFEval implementation it just checks whether the text contains the string (P.)P.S. or variants thereof such as p. s. It doesn’t check for parsing

NAME: str = 'Postscript Format'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.grid_difference module

class eval_framework.metrics.completion.grid_difference.GridDifference[source]

Bases: BaseMetric[Completion]

NAME: str = 'grid_difference'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

calculate_score(output_ground_truth_difference_count, input_ground_truth_difference_count)[source]
Return type:

float

Parameters:
  • output_ground_truth_difference_count (int)

  • input_ground_truth_difference_count (int)

count_differences(character_list_1, character_list_2)[source]
Return type:

int

Parameters:
  • character_list_1 (list[str])

  • character_list_2 (list[str])

extract_grid_from_prompt(prompt)[source]
Return type:

str

Parameters:

prompt (str)

eval_framework.metrics.completion.ifeval module

class eval_framework.metrics.completion.ifeval.IFEvalMetric[source]

Bases: BaseMetric[Completion]

NAME: str = 'IFEval'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.ifeval.IFEvalMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • key (int)

  • instruction_id_list (list[str])

  • prompt (str)

  • additional_kwargs (list[dict[str, Any]])

  • extra_data (Any)

additional_kwargs: list[dict[str, Any]]
instruction_id_list: list[str]
key: int
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

prompt: str

eval_framework.metrics.completion.json_format module

class eval_framework.metrics.completion.json_format.JsonFormat[source]

Bases: BaseMetric[Completion]

NAME: str = 'JSON Format'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.json_format.JsonFormatEvaluation(**data)[source]

Bases: BaseModel

Parameters:
  • is_just_json (bool)

  • is_valid_json (bool)

  • fulfills_schema (bool | None)

  • exact_match (bool | None)

  • json_parsing_error (str | None)

  • schema_validation_error (str | None)

exact_match: bool | None
fulfills_schema: bool | None
is_just_json: bool
is_valid_json: bool
json_parsing_error: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

schema_validation_error: str | None
eval_framework.metrics.completion.json_format.get_json_object(text)[source]

Extract the first valid JSON object or array from text.

This function handles nested brackets properly by using a bracket counting approach to find complete JSON structures, rather than using regex which can incorrectly match outer brackets containing non-JSON content.

Return type:

str

Parameters:

text (str)

eval_framework.metrics.completion.json_format.remove_comments(text, comment_indicator='//')[source]
Return type:

str

Parameters:
  • text (str)

  • comment_indicator (str)

eval_framework.metrics.completion.language_checker module

class eval_framework.metrics.completion.language_checker.GermanCompletionChecker[source]

Bases: BaseMetric[Completion]

NAME: str = 'German Completion Check'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.language_checker.LanguageChecker[source]

Bases: BaseMetric[Completion]

NAME: str = 'Language Check'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.language_checker.LanguageConsistencyChecker[source]

Bases: BaseMetric[Completion]

NAME: str = 'Language Consistency'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.language_checker.LanguageRawConsistencyChecker[source]

Bases: BaseMetric[Completion]

NAME: str = 'Language Consistency Raw'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.length_control module

class eval_framework.metrics.completion.length_control.LengthControl(tolerance=0.16666666666666666)[source]

Bases: BaseMetric[Completion]

Parameters:

tolerance (float)

NAME: str = 'length_control'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.length_control.LengthRequirementType(*values)[source]

Bases: Enum

MAX = 'maximum'
MIN = 'minimum'
TARGET = 'target'
class eval_framework.metrics.completion.length_control.LengthRequirementUnit(*values)[source]

Bases: Enum

PARAGRAPHS = 'paragraphs'
SENTENCES = 'sentences'
WORDS = 'words'

eval_framework.metrics.completion.math_reasoning_completion module

class eval_framework.metrics.completion.math_reasoning_completion.MathReasoningCompletion[source]

Bases: BaseMetric[Completion]

NAME: str = 'Math Reasoning Completion (symbolic)'
REMOVED_EXPRESSIONS_FORMAT = ['\\text{s}', '\\text{.}', '\\text{\ns}', '\\text{}^2', '\\text{}^3', '\\text{\n}', '\\text{}', '\\mathrm{th}', '^\\circ', '^{\\circ}', '\\;', ',\\!', '{,}', '"', '\\dots']
REMOVED_EXPRESSIONS_UNITS = ['square', 'ways', 'integers', 'dollars', 'mph', 'inches', 'ft', 'hours', 'km', 'units', '\\ldots', 'sue', 'points', 'feet', 'minutes', 'digits', 'cents', 'degrees', 'cm', 'gm', 'pounds', 'meters', 'meals', 'edges', 'students', 'childrentickets', 'multiples']
SUBSTITUTIONS = [('\\ban\\b(?!\\w)', ''), ('\\ba\\b(?!\\w)', ''), ('\\.\\$', '$'), ('\\\\\\$', ''), ('\\\\ ', ''), ('\\s+', ''), ('\\\\mbox', 'text'), (',\\\\text\\{and\\}', ','), ('\\\\text\\{and\\}', ','), ('\\\\text\\{m\\}', '\\text{}')]
calculate(response)[source]

Calculate the accuracy of the completion

performs several verification and simplification steps to ensure that the completion is correct

the completion may either be a latex or string response which sympy will parse, factor, and simplify

Parameters:

response (Completion) – Completion object

Return type:

list[MetricResult]

Returns:

list of MetricResult

check_for_equation(final_answer)[source]

Check if the final answer is an equation and split it into left hand side and right hand side :type final_answer: str :param final_answer: the expression to evaluate :rtype: list :return: list of left hand side and right hand side of the equation

Parameters:

final_answer (str)

Return type:

list

normalize_expression(final_answer)[source]

Function to normalize LaTeX expressions :type final_answer: str :param final_answer: raw LaTeX expression :rtype: str :return: normalized LaTeX expression NOTE: Changed logic, because before the substitution randomly replaced characters in the string, i.e., turned “infty” into “iny” by removing “ft”

Parameters:

final_answer (str)

Return type:

str

eval_framework.metrics.completion.math_reasoning_completion.timeout_handler(signum, frame)[source]
Return type:

None

Parameters:
  • signum (Any)

  • frame (Any)

eval_framework.metrics.completion.niah_accuracy module

class eval_framework.metrics.completion.niah_accuracy.NIAHAccuracy[source]

Bases: BaseMetric[Completion]

Metric for Needle in a Haystack tasks

NAME: str = 'NIAHAccuracy'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.niah_accuracy.clean_text(text)[source]

Clean text by removing spaces and normalizing

Return type:

str

Parameters:

text (str)

eval_framework.metrics.completion.placeholder_checker module

class eval_framework.metrics.completion.placeholder_checker.PlaceholderChecker[source]

Bases: BaseMetric[Completion]

NAME: str = 'Placeholder Check'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.placeholder_checker.PlaceholderCheckerMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • num_placeholders (int)

  • extra_data (Any)

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_placeholders: int

eval_framework.metrics.completion.repetition module

class eval_framework.metrics.completion.repetition.WordRepetition(window_size=128, min_repetitions=1)[source]

Bases: BaseMetric[Completion]

Word Repetition Metric

This metric checks for repetitions of words in the completion text for a given window size and repetition threshold. The window size defines the consecutive word count to consider a repetition, and min_repetitions specifies the minimum repetition count that triggers the metric. This metric returns 0.0 if no repetitions are found, and 1.0 if a sufficient number of repetitions are found. For example, if the completion contains a two-word sequence that repeats once (such as “hello world hello world”), this metric would trigger with a window size of 2 and min_repetitions set to 1.

Parameters:
  • window_size (int)

  • min_repetitions (int)

HIGHER_IS_BETTER: Final[bool] = False
NAME: str = 'WordRepetition'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.rouge_1 module

class eval_framework.metrics.completion.rouge_1.ROUGE_1[source]

Bases: BaseMetric[Completion]

ROUGE-1

NAME: str = 'ROUGE-1'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.rouge_2 module

class eval_framework.metrics.completion.rouge_2.ROUGE_2[source]

Bases: BaseMetric[Completion]

ROUGE-2

NAME: str = 'ROUGE-2'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.rouge_geometric_mean module

class eval_framework.metrics.completion.rouge_geometric_mean.ROUGE_GEOMETRIC_MEAN[source]

Bases: BaseMetric[Completion]

ROUGE Geometric Mean

NAME: str = 'ROUGE-Geometric-Mean'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.rouge_l module

class eval_framework.metrics.completion.rouge_l.ROUGE_L[source]

Bases: BaseMetric[Completion]

ROUGE-L

NAME: str = 'ROUGE-L'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.struct_eval_metrics module

class eval_framework.metrics.completion.struct_eval_metrics.RenderableStructMetric[source]

Bases: StructMetric

NAME: str = 'RenderableStructMetric'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.struct_eval_metrics.RenderableStructMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • output_type (str)

  • keywords (list[str])

  • extra_data (Any)

keywords: list[str]
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_type: str
class eval_framework.metrics.completion.struct_eval_metrics.StructMetric[source]

Bases: BaseMetric[Completion]

NAME: str = 'StructMetric'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.struct_eval_metrics.StructMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • output_type (str)

  • paths (list[str])

  • extra_data (Any)

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_type: str
paths: list[str]
eval_framework.metrics.completion.struct_eval_metrics.is_valid_html(html)[source]
Return type:

bool

Parameters:

html (str)

eval_framework.metrics.completion.struct_eval_metrics.path_exists(data, path)[source]

Check if a path exists in a structured data object.

Parameters:
  • data (Any) – The structured data to check

  • path (str) – The path to check (dot notation)

Return type:

bool

Returns:

True if path exists, False otherwise

eval_framework.metrics.completion.struct_eval_metrics.tokenize_path(path)[source]

Tokenize a dot-notation path, handling back-ticks and array indices.

Parameters:

path (str) – The path string (e.g. “users.0.name” or “users[0].name”)

Return type:

list[str]

Returns:

List of path tokens

eval_framework.metrics.completion.ter module

class eval_framework.metrics.completion.ter.LINEWISE_TER[source]

Bases: BaseMetric[Completion]

Minimum Line-level TER (Translation Edit Rate) score.

NAME: str = 'Linewise TER'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.ter.TER[source]

Bases: BaseMetric[Completion]

Translation Error Rate is an error metric for machine translation that measures the number of edits required to change a system output into one of the references Source: http://www.cs.umd.edu/~snover/tercom/ Paper: http://mt-archive.info/AMTA-2006-Snover.pdf

NAME: str = 'TER'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

eval_framework.metrics.completion.text_counter module

class eval_framework.metrics.completion.text_counter.ParagraphCounter[source]

Bases: BaseMetric[Completion]

NAME: str = 'Paragraph Count'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.text_counter.ParagraphCounterMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • comparison (str)

  • paragraph_count (int)

  • extra_data (Any)

comparison: str
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

paragraph_count: int
class eval_framework.metrics.completion.text_counter.ResponseToOriginalLengthRatio[source]

Bases: BaseMetric[Completion]

NAME: str = 'Response to Original Length Ratio'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.text_counter.SentenceCounter[source]

Bases: BaseMetric[Completion]

NAME: str = 'Sentence Count'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.text_counter.SentenceCounterMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • comparison (str)

  • sentence_count (int)

  • extra_data (Any)

comparison: str
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sentence_count: int
class eval_framework.metrics.completion.text_counter.WordCounter[source]

Bases: BaseMetric[Completion]

NAME: str = 'Word Count'
calculate(response)[source]
Return type:

list[MetricResult]

Parameters:

response (Completion)

class eval_framework.metrics.completion.text_counter.WordCounterMetricContext(**data)[source]

Bases: BaseMetricContext

Parameters:
  • comparison (str)

  • word_count (int)

  • extra_data (Any)

comparison: str
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

word_count: int

Module contents