eval_framework.metrics.completion package¶
Submodules¶
eval_framework.metrics.completion.accuracy_completion module¶
- class eval_framework.metrics.completion.accuracy_completion.AccuracyCompletion[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Accuracy Completion'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.aidanbench module¶
- class eval_framework.metrics.completion.aidanbench.AidanBenchMetric[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'AidanBench'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.bleu module¶
- class eval_framework.metrics.completion.bleu.BLEU[source]¶
Bases:
BaseMetric[Completion]The Bilingual Evaluation Understudy Score, or BLEU for short, is a metric for evaluating a generated sentence to a reference sentence. It counts matching n-grams in the candidate translation to n-grams in the reference text, where 1-gram or unigram would be each token and a bigram comparison would be each word pair. The comparison is made regardless of word order Source: https://machinelearningmastery.com/calculate-bleu-score-for-text-python/ Paper: https://www.aclweb.org/anthology/P02-1040/
- NAME: str = 'BLEU'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.bleu.LINEWISE_BLEU[source]¶
Bases:
BaseMetric[Completion]Maximum Line-level BLEU score.
- NAME: str = 'Linewise BLEU'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.bleu.ResponseToOriginalBLEU[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Response to Original BLEU'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.chrf module¶
- class eval_framework.metrics.completion.chrf.CHRF[source]¶
Bases:
BaseMetric[Completion]chrF++ is a tool for automatic evaluation of machine translation output based on character n-gram precision and recall enhanced with word n-grams. Source: https://github.com/m-popovic/chrF Paper: https://www.aclweb.org/anthology/W15-3049.pdf
- NAME: str = 'chrF'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.chrf.LINEWISE_CHRF[source]¶
Bases:
BaseMetric[Completion]Maximum Line-level chrF++ (Character n-gram F-score) score. Paper: https://aclanthology.org/W15-3049/
- NAME: str = 'Linewise chrF'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.code_assertion module¶
- class eval_framework.metrics.completion.code_assertion.CodeCompletionAssertion[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Code Completion Accuracy'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.code_execution_pass_at_one module¶
- class eval_framework.metrics.completion.code_execution_pass_at_one.CodeExecutionBaseContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
run_env (str)
code_prompt (str)
test_code (str)
benchmark_timeout (int)
package_downloads (dict[str, str | None])
extra_data (Any)
- benchmark_timeout: int¶
- code_prompt: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- package_downloads: dict[str, str | None]¶
- run_env: str¶
- test_code: str¶
- class eval_framework.metrics.completion.code_execution_pass_at_one.CodeExecutionPassAtOne[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'code-execution-pass@1'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.code_execution_pass_at_one.CodeExecutionPassAtOneContext(**data)[source]¶
Bases:
CodeExecutionBaseContext- Parameters:
run_env (str)
code_prompt (str)
test_code (str)
benchmark_timeout (int)
package_downloads (dict[str, str | None])
snippet_merge_fn (str)
output_parse_fn (str)
extra_data (Any)
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_parse_fn: str¶
- snippet_merge_fn: str¶
- class eval_framework.metrics.completion.code_execution_pass_at_one.RealtimeCodeExectionContext(**data)[source]¶
Bases:
CodeExecutionBaseContext- Parameters:
run_env (str)
code_prompt (str)
test_code (str)
benchmark_timeout (int)
package_downloads (dict[str, str | None])
snippet_merge_fn (Callable[[str, str], str])
output_parse_fn (Callable[[str], ExecutionResult])
extra_data (Any)
- classmethod from_context(context)[source]¶
- Return type:
Self- Parameters:
context (CodeExecutionPassAtOneContext)
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_parse_fn: Callable[[str], ExecutionResult]¶
- snippet_merge_fn: Callable[[str, str], str]¶
- eval_framework.metrics.completion.code_execution_pass_at_one.estimate_pass_at_k(n, c, k)[source]¶
Estimates pass@k for a single problem.
Parameters: n (int): Total number of generated samples. c (int): Number of correct samples. k (int): Number of attempts or samples considered.
Returns: float: The pass@k value.
- Return type:
float- Parameters:
n (int)
c (int)
k (int)
eval_framework.metrics.completion.comet module¶
- class eval_framework.metrics.completion.comet.COMET[source]¶
Bases:
BaseMetric[Completion]COMET is a neural, multilingual framework for evaluating machine translation quality by leveraging cross-lingual pretrained language models to achieve state-of-the-art correlation with human judgments Note: this requires a Hugging Face token with access to the model: https://huggingface.co/Unbabel/XCOMET-XL Source: https://github.com/Unbabel/COMET Paper: https://arxiv.org/abs/2009.09025
- NAME: str = 'COMET'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.concordance_index module¶
- class eval_framework.metrics.completion.concordance_index.ConcordanceIndex[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'ConcordanceIndex'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.csv_format module¶
- class eval_framework.metrics.completion.csv_format.CSVFormat[source]¶
Bases:
BaseMetric[Completion]- KEYS: list[str] | None = ['has_csv', 'is_separator_respected', 'is_column_count_respected']¶
- NAME: str = 'CSV Format'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.csv_format.CSVFormatEvaluation(**data)[source]¶
Bases:
BaseModel- Parameters:
implicit (bool)
has_csv (bool)
is_separator_respected (bool)
is_column_count_respected (bool)
- has_csv: bool¶
- implicit: bool¶
- is_column_count_respected: bool¶
- is_separator_respected: bool¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
eval_framework.metrics.completion.cwe_accuracy module¶
- class eval_framework.metrics.completion.cwe_accuracy.CWEAccuracy[source]¶
Bases:
BaseMetric[Completion]Metric for Common Word Extraction tasks
- NAME: str = 'CWEAccuracy'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.exponential_similarity module¶
- class eval_framework.metrics.completion.exponential_similarity.ExponentialSimilarity[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'ExponentialSimilarity'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- eval_framework.metrics.completion.exponential_similarity.calculate_exponential_similarity(p_true, p_pred)[source]¶
Compute the exponential similarity (SpaceDigest version) between the gold percentage and predicted value.
Parameters: - p_true (float): The gold/reference percentage. - p_pred (float): The predicted scalar. - d (float): Base of the exponent. Default is 2. - c (float): Coefficient in exponent. Default is 10.
Returns: - float: Similarity score between 0 and 1.
- Return type:
float- Parameters:
p_true (float)
p_pred (float)
eval_framework.metrics.completion.f1 module¶
- class eval_framework.metrics.completion.f1.F1[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'F1'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.format_checker module¶
- class eval_framework.metrics.completion.format_checker.CheckJsonFormat[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'JSON Format'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.format_checker.CheckPostScriptFormat[source]¶
Bases:
BaseMetric[Completion]This metric is honestly not that great In the original IFEval implementation it just checks whether the text contains the string (P.)P.S. or variants thereof such as p. s. It doesn’t check for parsing
- NAME: str = 'Postscript Format'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.grid_difference module¶
- class eval_framework.metrics.completion.grid_difference.GridDifference[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'grid_difference'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- calculate_score(output_ground_truth_difference_count, input_ground_truth_difference_count)[source]¶
- Return type:
float- Parameters:
output_ground_truth_difference_count (int)
input_ground_truth_difference_count (int)
eval_framework.metrics.completion.ifeval module¶
- class eval_framework.metrics.completion.ifeval.IFEvalMetric[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'IFEval'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.ifeval.IFEvalMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
key (int)
instruction_id_list (list[str])
prompt (str)
additional_kwargs (list[dict[str, Any]])
extra_data (Any)
- additional_kwargs: list[dict[str, Any]]¶
- instruction_id_list: list[str]¶
- key: int¶
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- prompt: str¶
eval_framework.metrics.completion.json_format module¶
- class eval_framework.metrics.completion.json_format.JsonFormat[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'JSON Format'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.json_format.JsonFormatEvaluation(**data)[source]¶
Bases:
BaseModel- Parameters:
is_just_json (bool)
is_valid_json (bool)
fulfills_schema (bool | None)
exact_match (bool | None)
json_parsing_error (str | None)
schema_validation_error (str | None)
- exact_match: bool | None¶
- fulfills_schema: bool | None¶
- is_just_json: bool¶
- is_valid_json: bool¶
- json_parsing_error: str | None¶
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- schema_validation_error: str | None¶
- eval_framework.metrics.completion.json_format.get_json_object(text)[source]¶
Extract the first valid JSON object or array from text.
This function handles nested brackets properly by using a bracket counting approach to find complete JSON structures, rather than using regex which can incorrectly match outer brackets containing non-JSON content.
- Return type:
str- Parameters:
text (str)
eval_framework.metrics.completion.language_checker module¶
- class eval_framework.metrics.completion.language_checker.GermanCompletionChecker[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'German Completion Check'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.language_checker.LanguageChecker[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Language Check'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.language_checker.LanguageConsistencyChecker[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Language Consistency'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.language_checker.LanguageRawConsistencyChecker[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Language Consistency Raw'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.length_control module¶
- class eval_framework.metrics.completion.length_control.LengthControl(tolerance=0.16666666666666666)[source]¶
Bases:
BaseMetric[Completion]- Parameters:
tolerance (float)
- NAME: str = 'length_control'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.math_reasoning_completion module¶
- class eval_framework.metrics.completion.math_reasoning_completion.MathReasoningCompletion[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Math Reasoning Completion (symbolic)'¶
- REMOVED_EXPRESSIONS_FORMAT = ['\\text{s}', '\\text{.}', '\\text{\ns}', '\\text{}^2', '\\text{}^3', '\\text{\n}', '\\text{}', '\\mathrm{th}', '^\\circ', '^{\\circ}', '\\;', ',\\!', '{,}', '"', '\\dots']¶
- REMOVED_EXPRESSIONS_UNITS = ['square', 'ways', 'integers', 'dollars', 'mph', 'inches', 'ft', 'hours', 'km', 'units', '\\ldots', 'sue', 'points', 'feet', 'minutes', 'digits', 'cents', 'degrees', 'cm', 'gm', 'pounds', 'meters', 'meals', 'edges', 'students', 'childrentickets', 'multiples']¶
- SUBSTITUTIONS = [('\\ban\\b(?!\\w)', ''), ('\\ba\\b(?!\\w)', ''), ('\\.\\$', '$'), ('\\\\\\$', ''), ('\\\\ ', ''), ('\\s+', ''), ('\\\\mbox', 'text'), (',\\\\text\\{and\\}', ','), ('\\\\text\\{and\\}', ','), ('\\\\text\\{m\\}', '\\text{}')]¶
- calculate(response)[source]¶
Calculate the accuracy of the completion
performs several verification and simplification steps to ensure that the completion is correct
the completion may either be a latex or string response which sympy will parse, factor, and simplify
- Parameters:
response (
Completion) – Completion object- Return type:
list[MetricResult]- Returns:
list of MetricResult
- check_for_equation(final_answer)[source]¶
Check if the final answer is an equation and split it into left hand side and right hand side :type final_answer:
str:param final_answer: the expression to evaluate :rtype:list:return: list of left hand side and right hand side of the equation- Parameters:
final_answer (str)
- Return type:
list
- normalize_expression(final_answer)[source]¶
Function to normalize LaTeX expressions :type final_answer:
str:param final_answer: raw LaTeX expression :rtype:str:return: normalized LaTeX expression NOTE: Changed logic, because before the substitution randomly replaced characters in the string, i.e., turned “infty” into “iny” by removing “ft”- Parameters:
final_answer (str)
- Return type:
str
eval_framework.metrics.completion.niah_accuracy module¶
- class eval_framework.metrics.completion.niah_accuracy.NIAHAccuracy[source]¶
Bases:
BaseMetric[Completion]Metric for Needle in a Haystack tasks
- NAME: str = 'NIAHAccuracy'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.placeholder_checker module¶
- class eval_framework.metrics.completion.placeholder_checker.PlaceholderChecker[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Placeholder Check'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.placeholder_checker.PlaceholderCheckerMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
num_placeholders (int)
extra_data (Any)
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_placeholders: int¶
eval_framework.metrics.completion.repetition module¶
- class eval_framework.metrics.completion.repetition.WordRepetition(window_size=128, min_repetitions=1)[source]¶
Bases:
BaseMetric[Completion]Word Repetition Metric
This metric checks for repetitions of words in the completion text for a given window size and repetition threshold. The window size defines the consecutive word count to consider a repetition, and min_repetitions specifies the minimum repetition count that triggers the metric. This metric returns 0.0 if no repetitions are found, and 1.0 if a sufficient number of repetitions are found. For example, if the completion contains a two-word sequence that repeats once (such as “hello world hello world”), this metric would trigger with a window size of 2 and min_repetitions set to 1.
- Parameters:
window_size (int)
min_repetitions (int)
- HIGHER_IS_BETTER: Final[bool] = False¶
- NAME: str = 'WordRepetition'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.rouge_1 module¶
- class eval_framework.metrics.completion.rouge_1.ROUGE_1[source]¶
Bases:
BaseMetric[Completion]ROUGE-1
- NAME: str = 'ROUGE-1'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.rouge_2 module¶
- class eval_framework.metrics.completion.rouge_2.ROUGE_2[source]¶
Bases:
BaseMetric[Completion]ROUGE-2
- NAME: str = 'ROUGE-2'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.rouge_geometric_mean module¶
- class eval_framework.metrics.completion.rouge_geometric_mean.ROUGE_GEOMETRIC_MEAN[source]¶
Bases:
BaseMetric[Completion]ROUGE Geometric Mean
- NAME: str = 'ROUGE-Geometric-Mean'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.rouge_l module¶
- class eval_framework.metrics.completion.rouge_l.ROUGE_L[source]¶
Bases:
BaseMetric[Completion]ROUGE-L
- NAME: str = 'ROUGE-L'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.struct_eval_metrics module¶
- class eval_framework.metrics.completion.struct_eval_metrics.RenderableStructMetric[source]¶
Bases:
StructMetric- NAME: str = 'RenderableStructMetric'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.struct_eval_metrics.RenderableStructMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
output_type (str)
keywords (list[str])
extra_data (Any)
- keywords: list[str]¶
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_type: str¶
- class eval_framework.metrics.completion.struct_eval_metrics.StructMetric[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'StructMetric'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.struct_eval_metrics.StructMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
output_type (str)
paths (list[str])
extra_data (Any)
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_type: str¶
- paths: list[str]¶
- eval_framework.metrics.completion.struct_eval_metrics.is_valid_html(html)[source]¶
- Return type:
bool- Parameters:
html (str)
- eval_framework.metrics.completion.struct_eval_metrics.path_exists(data, path)[source]¶
Check if a path exists in a structured data object.
- Parameters:
data (
Any) – The structured data to checkpath (
str) – The path to check (dot notation)
- Return type:
bool- Returns:
True if path exists, False otherwise
eval_framework.metrics.completion.ter module¶
- class eval_framework.metrics.completion.ter.LINEWISE_TER[source]¶
Bases:
BaseMetric[Completion]Minimum Line-level TER (Translation Edit Rate) score.
- NAME: str = 'Linewise TER'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.ter.TER[source]¶
Bases:
BaseMetric[Completion]Translation Error Rate is an error metric for machine translation that measures the number of edits required to change a system output into one of the references Source: http://www.cs.umd.edu/~snover/tercom/ Paper: http://mt-archive.info/AMTA-2006-Snover.pdf
- NAME: str = 'TER'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
eval_framework.metrics.completion.text_counter module¶
- class eval_framework.metrics.completion.text_counter.ParagraphCounter[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Paragraph Count'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.text_counter.ParagraphCounterMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
comparison (str)
paragraph_count (int)
extra_data (Any)
- comparison: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- paragraph_count: int¶
- class eval_framework.metrics.completion.text_counter.ResponseToOriginalLengthRatio[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Response to Original Length Ratio'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.text_counter.SentenceCounter[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Sentence Count'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.text_counter.SentenceCounterMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
comparison (str)
sentence_count (int)
extra_data (Any)
- comparison: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sentence_count: int¶
- class eval_framework.metrics.completion.text_counter.WordCounter[source]¶
Bases:
BaseMetric[Completion]- NAME: str = 'Word Count'¶
- calculate(response)[source]¶
- Return type:
list[MetricResult]- Parameters:
response (Completion)
- class eval_framework.metrics.completion.text_counter.WordCounterMetricContext(**data)[source]¶
Bases:
BaseMetricContext- Parameters:
comparison (str)
word_count (int)
extra_data (Any)
- comparison: str¶
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- word_count: int¶