eval_framework.llm package¶
Submodules¶
eval_framework.llm.aleph_alpha module¶
- class eval_framework.llm.aleph_alpha.AlephAlphaAPIModel(formatter=None, checkpoint_name=None, temperature=None, max_retries=100, max_async_concurrent_requests=32, request_timeout_seconds=1805, queue_full_timeout_seconds=1805, bytes_per_token=None, token='dummy', base_url='dummy_endpoint')[source]¶
Bases:
BaseLLM- Parameters:
formatter (BaseFormatter | None)
checkpoint_name (str | None)
temperature (float | None)
max_retries (int)
max_async_concurrent_requests (int)
request_timeout_seconds (int)
queue_full_timeout_seconds (int)
bytes_per_token (float | None)
token (str)
base_url (str)
- BYTES_PER_TOKEN: float = 4.0¶
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = None¶
- LLM_NAME: str¶
- generate_from_messages(messages, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
stop_sequences and max_tokens are injected by the task if exist. They should be overwritten or extended with the properties of the model. This includes but is not limited to the stop tokens by the evaluated checkpoint (e.g. <|eot_id|> for an instruction finetuned Llama3.1, <|endoftext|> for a pretrained Llama3.1).
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something impedes the expected completion of a task.
Important! The completion is expected to be detokenized and to NOT contain special tokens.
Returns: List[RawCompletion]
- Return type:
list[RawCompletion]- Parameters:
messages (list[Sequence[Message]])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- logprobs(samples)[source]¶
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something prevents the expected completion of a task.
- Return type:
list[RawLoglikelihood]- Parameters:
samples (list[Sample])
- class eval_framework.llm.aleph_alpha.Llama31_8B_Instruct_API(formatter=None, checkpoint_name=None, temperature=None, max_retries=100, max_async_concurrent_requests=32, request_timeout_seconds=1805, queue_full_timeout_seconds=1805, bytes_per_token=None, token='dummy', base_url='dummy_endpoint')[source]¶
Bases:
AlephAlphaAPIModel- Parameters:
formatter (BaseFormatter | None)
checkpoint_name (str | None)
temperature (float | None)
max_retries (int)
max_async_concurrent_requests (int)
request_timeout_seconds (int)
queue_full_timeout_seconds (int)
bytes_per_token (float | None)
token (str)
base_url (str)
- DEFAULT_FORMATTER¶
alias of
Llama3Formatter
- LLM_NAME: str = 'llama-3.1-8b-instruct'¶
eval_framework.llm.base module¶
- class eval_framework.llm.base.BaseLLM[source]¶
Bases:
ABC- generate(samples, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
Generates a model response for each sample.
Uses ‘generate_from_samples’ to generate responses if implemented, otherwise falls back to ‘generate_from_messages’.
- Return type:
list[RawCompletion]- Parameters:
samples (list[Sample])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- abstractmethod generate_from_messages(messages, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
stop_sequences and max_tokens are injected by the task if exist. They should be overwritten or extended with the properties of the model. This includes but is not limited to the stop tokens by the evaluated checkpoint (e.g. <|eot_id|> for an instruction finetuned Llama3.1, <|endoftext|> for a pretrained Llama3.1).
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something impedes the expected completion of a task.
Important! The completion is expected to be detokenized and to NOT contain special tokens.
Returns: List[RawCompletion]
- Return type:
list[RawCompletion]- Parameters:
messages (list[Sequence[Message]])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- generate_from_samples(samples, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
stop_sequences and max_tokens are injected by the task if exist. They should be overwritten or extended with the properties of the model. This includes but is not limited to the stop tokens by the evaluated checkpoint (e.g. <|eot_id|> for an instruction finetuned Llama3.1, <|endoftext|> for a pretrained Llama3.1).
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something impedes the expected completion of a task.
Important! The completion is expected to be detokenized and to NOT contain special tokens.
Returns: List[RawCompletion]
- Return type:
list[RawCompletion]- Parameters:
samples (list[Sample])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- abstractmethod logprobs(samples)[source]¶
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something prevents the expected completion of a task.
- Return type:
list[RawLoglikelihood]- Parameters:
samples (list[Sample])
- property name: str¶
This property is used to name the results folder and identify the eval results. Overwrite this property in the subclass with e.g. the checkpoint name/huggingface model name.
- post_process_completion(completion, sample)[source]¶
Model-specific post-processing of generated completions.
Override this method to apply model-specific cleanup or transformations (e.g., removing specific artifacts such as reasoning traces, handling special tokens).
- Parameters:
completion (
str) – The raw completion string from the modelsample (
Sample) – The sample that was used to generate the completion
- Return type:
str- Returns:
The post-processed completion string
eval_framework.llm.huggingface module¶
- class eval_framework.llm.huggingface.BaseHFLLM(formatter=None, bytes_per_token=None)[source]¶
Bases:
BaseLLM- Parameters:
formatter (BaseFormatter | None)
bytes_per_token (float | None)
- BYTES_PER_TOKEN: float = 4.0¶
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = None¶
- LLM_NAME: str¶
- SEQ_LENGTH: int | None = None¶
- count_tokens(text, /)[source]¶
Count the number of tokens in a string.
- Return type:
int- Parameters:
text (str)
- generate_from_messages(messages, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
stop_sequences and max_tokens are injected by the task if exist. They should be overwritten or extended with the properties of the model. This includes but is not limited to the stop tokens by the evaluated checkpoint (e.g. <|eot_id|> for an instruction finetuned Llama3.1, <|endoftext|> for a pretrained Llama3.1).
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something impedes the expected completion of a task.
Important! The completion is expected to be detokenized and to NOT contain special tokens.
Returns: List[RawCompletion]
- Return type:
list[RawCompletion]- Parameters:
messages (list[Sequence[Message]])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- logprobs(samples)[source]¶
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something prevents the expected completion of a task.
- Return type:
list[RawLoglikelihood]- Parameters:
samples (list[Sample])
- property seq_length: int | None¶
- class eval_framework.llm.huggingface.HFLLM(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
BaseHFLLMA class to create HFLLM instances from various model sources.
- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
bytes_per_token (float | None)
kwargs (Any)
- property name: str¶
This property is used to name the results folder and identify the eval results. Overwrite this property in the subclass with e.g. the checkpoint name/huggingface model name.
- class eval_framework.llm.huggingface.HFLLMRegistryModel(artifact_name, version='latest', formatter='', formatter_identifier='', **kwargs)[source]¶
Bases:
HFLLMA class to create HFLLM instances from registered models in Wandb registry. Downloads the model artifacts from Wandb and creates a local HFLLM instance.
- Parameters:
artifact_name (str)
version (str)
formatter (str)
formatter_identifier (str)
kwargs (Any)
- class eval_framework.llm.huggingface.HFLLM_from_name(model_name, formatter='Llama3Formatter', **kwargs)[source]¶
Bases:
HFLLMA generic class to create HFLLM instances from a given model name.
- Parameters:
model_name (str)
formatter (str)
kwargs (Any)
- class eval_framework.llm.huggingface.Pythia410m(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
HFLLM- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
bytes_per_token (float | None)
kwargs (Any)
- DEFAULT_FORMATTER¶
alias of
ConcatFormatter
- LLM_NAME: str = 'EleutherAI/pythia-410m'¶
- class eval_framework.llm.huggingface.Qwen3_0_6B(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
HFLLM- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
bytes_per_token (float | None)
kwargs (Any)
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = functools.partial(<class 'template_formatting.formatter.HFFormatter'>, 'Qwen/Qwen3-0.6B', chat_template_kwargs={'enable_thinking': True})¶
- Parameters:
chat_template_kwargs (dict[str, Any] | None)
- Return type:
None
- LLM_NAME: str = 'Qwen/Qwen3-0.6B'¶
- class eval_framework.llm.huggingface.RepeatedTokenSequenceCriteria(tokenizer, completion_start_index)[source]¶
Bases:
StoppingCriteria- Parameters:
tokenizer (Tokenizer)
completion_start_index (int)
- class eval_framework.llm.huggingface.SmolLM135M(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
HFLLM- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
bytes_per_token (float | None)
kwargs (Any)
- DEFAULT_FORMATTER¶
alias of
ConcatFormatter
- LLM_NAME: str = 'HuggingFaceTB/SmolLM-135M'¶
- class eval_framework.llm.huggingface.Smollm135MInstruct(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
HFLLM- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
bytes_per_token (float | None)
kwargs (Any)
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = functools.partial(<class 'template_formatting.formatter.HFFormatter'>, 'HuggingFaceTB/SmolLM-135M-Instruct')¶
- Parameters:
chat_template_kwargs (dict[str, Any] | None)
- Return type:
None
- LLM_NAME: str = 'HuggingFaceTB/SmolLM-135M-Instruct'¶
eval_framework.llm.mistral module¶
- class eval_framework.llm.mistral.MistralAdapter(target_mdl)[source]¶
Bases:
VLLMTokenizerAPI[list[Message]]- Parameters:
target_mdl (str)
- encode_formatted_struct(struct)[source]¶
Encode prompt to token IDs.
- Return type:
- Parameters:
struct (list[Message])
- class eval_framework.llm.mistral.MistralVLLM(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, max_model_len=None, tensor_parallel_size=1, gpu_memory_utilization=0.9, batch_size=1, sampling_params=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
VLLMModel- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
max_model_len (int | None)
tensor_parallel_size (int)
gpu_memory_utilization (float)
batch_size (int)
sampling_params (SamplingParams | dict[str, Any] | None)
bytes_per_token (float | None)
kwargs (Any)
- property formatter_output_mode: Literal['string', 'list']¶
Determine the correct output mode for the formatter based on tokenizer type.
- property tokenizer: VLLMTokenizerAPI¶
eval_framework.llm.models module¶
This is just a default model file with some small models for testing.
Please define your own model file externally and pass it to the eval-framework entrypoint to use it.
eval_framework.llm.openai module¶
- class eval_framework.llm.openai.DeepseekModel(model_name=None, formatter=None, temperature=None, api_key=None, organization=None, base_url=None, tokenizer_name=None)[source]¶
Bases:
OpenAIModelGeneral Deepseek model wrapper using OpenAI-compatible API for deepseek-chat and deepseek-reasoner models.
Using the deepseek API: https://api-docs.deepseek.com/quick_start/pricing
- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
tokenizer_name (str | None)
- class eval_framework.llm.openai.Deepseek_chat(model_name=None, formatter=None, temperature=None, api_key=None, organization=None, base_url=None, tokenizer_name=None)[source]¶
Bases:
DeepseekModel- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
tokenizer_name (str | None)
- LLM_NAME: str | None = 'deepseek-chat'¶
- class eval_framework.llm.openai.Deepseek_chat_with_formatter(model_name=None, formatter=None, temperature=None, api_key=None, organization=None, base_url=None, tokenizer_name=None)[source]¶
Bases:
DeepseekModel- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
tokenizer_name (str | None)
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = functools.partial(<class 'template_formatting.formatter.HFFormatter'>, 'deepseek-ai/DeepSeek-V3.2-Exp')¶
What color is the night sky? <|Assistant|></think>Answer:
- Type:
<|begin▁of▁sentence|><|User|>Question
- Parameters:
chat_template_kwargs (dict[str, Any] | None)
- Return type:
None
- LLM_NAME: str | None = 'deepseek-chat'¶
- class eval_framework.llm.openai.Deepseek_reasoner(model_name=None, formatter=None, temperature=None, api_key=None, organization=None, base_url=None, tokenizer_name=None)[source]¶
Bases:
DeepseekModel- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
tokenizer_name (str | None)
- LLM_NAME: str | None = 'deepseek-reasoner'¶
- class eval_framework.llm.openai.OpenAIEmbeddingModel(model_name='text-embedding-3-large', formatter=None, api_key=None, organization=None, base_url=None)[source]¶
Bases:
BaseLLM- Parameters:
model_name (str)
formatter (BaseFormatter | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
- generate_embeddings(messages)[source]¶
- Return type:
list[list[float]]- Parameters:
messages (list[Sequence[Message]])
- generate_from_messages(messages, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
stop_sequences and max_tokens are injected by the task if exist. They should be overwritten or extended with the properties of the model. This includes but is not limited to the stop tokens by the evaluated checkpoint (e.g. <|eot_id|> for an instruction finetuned Llama3.1, <|endoftext|> for a pretrained Llama3.1).
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something impedes the expected completion of a task.
Important! The completion is expected to be detokenized and to NOT contain special tokens.
Returns: List[RawCompletion]
- Return type:
list[RawCompletion]- Parameters:
messages (list[Sequence[Message]])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- logprobs(samples)[source]¶
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something prevents the expected completion of a task.
- Return type:
list[RawLoglikelihood]- Parameters:
samples (list[Sample])
- class eval_framework.llm.openai.OpenAIModel(model_name=None, formatter=None, temperature=None, api_key='', organization=None, base_url=None, bytes_per_token=None)[source]¶
Bases:
BaseLLMLLM wrapper for OpenAI API providing text/chat completions and log-probability evaluation output.
- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
bytes_per_token (float | None)
- BYTES_PER_TOKEN: float = 4.0¶
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = None¶
- LLM_NAME: str | None = None¶
- generate_from_messages(messages, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
Generate completions for a list of message sequences concurrently.
Uses text completion API when a formatter is configured, otherwise uses chat completion API.
- Parameters:
messages (
list[Sequence[Message]]) – Sequence of messages.stop_sequences (
list[str] |None) – Optional list of stop sequences.max_tokens (
int|None) – Optional maximum number of tokens to generate.temperature (
float|None) – Sampling temperature.
- Return type:
list[RawCompletion]- Returns:
List of RawCompletion objects containing prompts and completions.
- logprobs(samples)[source]¶
Compute total log-probabilities for possible completions given each sample’s prompt.
- Parameters:
samples (
list[Sample]) – List of Sample objects, each with prompt messages and possible completions.- Return type:
list[RawLoglikelihood]- Returns:
List of RawLoglikelihood objects mapping each prompt and completion to its log-probability.
Note
Uses the OpenAI completions API with echo=True; chat logprobs are not supported.
- class eval_framework.llm.openai.OpenAI_davinci_002(model_name=None, formatter=None, temperature=None, api_key='', organization=None, base_url=None, bytes_per_token=None)[source]¶
Bases:
OpenAIModel- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
bytes_per_token (float | None)
- DEFAULT_FORMATTER¶
alias of
ConcatFormatter
- LLM_NAME: str | None = 'davinci-002'¶
- class eval_framework.llm.openai.OpenAI_gpt_4o_mini(model_name=None, formatter=None, temperature=None, api_key='', organization=None, base_url=None, bytes_per_token=None)[source]¶
Bases:
OpenAIModel- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
bytes_per_token (float | None)
- LLM_NAME: str | None = 'gpt-4o-mini-2024-07-18'¶
- class eval_framework.llm.openai.OpenAI_gpt_4o_mini_with_ConcatFormatter(model_name=None, formatter=None, temperature=None, api_key='', organization=None, base_url=None, bytes_per_token=None)[source]¶
Bases:
OpenAIModel- Parameters:
model_name (str | None)
formatter (BaseFormatter | None)
temperature (float | None)
api_key (str | None)
organization (str | None)
base_url (str | None)
bytes_per_token (float | None)
- DEFAULT_FORMATTER¶
alias of
ConcatFormatter
- LLM_NAME: str | None = 'gpt-4o-mini-2024-07-18'¶
eval_framework.llm.vllm module¶
- class eval_framework.llm.vllm.BaseVLLMModel(formatter=None, max_model_len=None, tensor_parallel_size=1, gpu_memory_utilization=0.9, batch_size=1, checkpoint_path=None, checkpoint_name=None, sampling_params=None, bytes_per_token=None, **kwargs)[source]¶
Bases:
BaseLLM- Parameters:
formatter (BaseFormatter | None)
max_model_len (int | None)
tensor_parallel_size (int)
gpu_memory_utilization (float)
batch_size (int)
checkpoint_path (str | Path | None)
checkpoint_name (str | None)
sampling_params (SamplingParams | dict[str, Any] | None)
bytes_per_token (float | None)
kwargs (Any)
- BYTES_PER_TOKEN: float = 4.0¶
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = None¶
- LLM_NAME: str¶
- SEQ_LENGTH: int | None = None¶
- build_redis_key_from_prompt_objs(prompt_objs, sampling_params)[source]¶
Build a redis key from a list of prompt objects and sampling parameters. TokenizedContainers are not serializable so we just pass the tokens and sampling params.
- Return type:
Any- Parameters:
prompt_objs (list[TokenizedContainer])
sampling_params (SamplingParams)
- property formatter_output_mode: Literal['string', 'list']¶
- generate_from_messages(messages, stop_sequences=None, max_tokens=None, temperature=None)[source]¶
stop_sequences and max_tokens are injected by the task if exist. They should be overwritten or extended with the properties of the model. This includes but is not limited to the stop tokens by the evaluated checkpoint (e.g. <|eot_id|> for an instruction finetuned Llama3.1, <|endoftext|> for a pretrained Llama3.1).
This function is expected to raise errors which are caught and reported when running the eval. Please also make sure to raise an error in case of sequence length issues. We expect to always raise an error if something impedes the expected completion of a task.
Important! The completion is expected to be detokenized and to NOT contain special tokens.
Returns: List[RawCompletion]
- Return type:
list[RawCompletion]- Parameters:
messages (list[Sequence[Message]])
stop_sequences (list[str] | None)
max_tokens (int | None)
temperature (float | None)
- logprobs(samples)[source]¶
Batched version of logprobs for improved performance.
- Return type:
list[RawLoglikelihood]- Parameters:
samples (list[Sample])
- property max_seq_length: int¶
Returns the maximum sequence length for this model. Priority order: 1. max_model_len parameter passed to __init__ 2. SEQ_LENGTH class attribute 3. Model’s actual max_model_len from config 4. Default fallback of 2048
- property name: str¶
This property is used to name the results folder and identify the eval results. Overwrite this property in the subclass with e.g. the checkpoint name/huggingface model name.
- property seq_length: int | None¶
Kept for backward compatibility.
- property tokenizer: VLLMTokenizerAPI¶
- class eval_framework.llm.vllm.HFTokenizerProtocol(*args, **kwargs)[source]¶
Bases:
Protocol- property chat_template: str | None¶
Chat template for the tokenizer.
- class eval_framework.llm.vllm.Qwen3_0_6B_VLLM(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, max_model_len=None, tensor_parallel_size=1, gpu_memory_utilization=0.9, batch_size=1, sampling_params=None, **kwargs)[source]¶
Bases:
VLLMModel- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
max_model_len (int | None)
tensor_parallel_size (int)
gpu_memory_utilization (float)
batch_size (int)
sampling_params (SamplingParams | dict[str, Any] | None)
kwargs (Any)
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = functools.partial(<class 'template_formatting.formatter.HFFormatter'>, 'Qwen/Qwen3-0.6B', chat_template_kwargs={'enable_thinking': True})¶
- Parameters:
chat_template_kwargs (dict[str, Any] | None)
- Return type:
None
- LLM_NAME: str = 'Qwen/Qwen3-0.6B'¶
- class eval_framework.llm.vllm.Qwen3_0_6B_VLLM_No_Thinking(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, max_model_len=None, tensor_parallel_size=1, gpu_memory_utilization=0.9, batch_size=1, sampling_params=None, **kwargs)[source]¶
Bases:
VLLMModel- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
max_model_len (int | None)
tensor_parallel_size (int)
gpu_memory_utilization (float)
batch_size (int)
sampling_params (SamplingParams | dict[str, Any] | None)
kwargs (Any)
- DEFAULT_FORMATTER: Callable[[], BaseFormatter] | None = functools.partial(<class 'template_formatting.formatter.HFFormatter'>, 'Qwen/Qwen3-0.6B', chat_template_kwargs={'enable_thinking': False})¶
- Parameters:
chat_template_kwargs (dict[str, Any] | None)
- Return type:
None
- LLM_NAME: str = 'Qwen/Qwen3-0.6B'¶
- class eval_framework.llm.vllm.TokenizedContainer(tokens, text)[source]¶
Bases:
objectContainer object to store tokens and formatted prompt
- Parameters:
tokens (list[int])
text (str)
- text: str¶
- tokens: list[int]¶
- class eval_framework.llm.vllm.VLLMModel(checkpoint_path=None, model_name=None, artifact_name=None, formatter=None, formatter_name=None, formatter_kwargs=None, checkpoint_name=None, max_model_len=None, tensor_parallel_size=1, gpu_memory_utilization=0.9, batch_size=1, sampling_params=None, **kwargs)[source]¶
Bases:
BaseVLLMModelA class to create VLLM instances from various model sources.
- Parameters:
checkpoint_path (str | Path | None)
model_name (str | None)
artifact_name (str | None)
formatter (BaseFormatter | None)
formatter_name (str | None)
formatter_kwargs (dict[str, Any] | None)
checkpoint_name (str | None)
max_model_len (int | None)
tensor_parallel_size (int)
gpu_memory_utilization (float)
batch_size (int)
sampling_params (SamplingParams | dict[str, Any] | None)
kwargs (Any)
- class eval_framework.llm.vllm.VLLMRegistryModel(artifact_name, version='latest', formatter='', formatter_identifier='', **kwargs)[source]¶
Bases:
VLLMModelA class to create VLLM instances from registered models in Wandb registry. Downloads the model artifacts from Wandb and creates a local VLLM instance.
- Parameters:
artifact_name (str)
version (str)
formatter (str)
formatter_identifier (str)
kwargs (Any)
- class eval_framework.llm.vllm.VLLMTokenizer(target_mdl)[source]¶
Bases:
VLLMTokenizerAPI[str]- Parameters:
target_mdl (str | Path)
- property chat_template: str | None¶
- encode_formatted_struct(struct)[source]¶
Encode prompt to token IDs.
- Return type:
- Parameters:
struct (str)
- class eval_framework.llm.vllm.VLLMTokenizerAPI[source]¶
Bases:
ABC,GenericProtocol for tokenizer interface that defines required methods. Needed for type checking because of the vllm tokenizer.
- property chat_template: str | None¶
- abstractmethod encode_formatted_struct(struct)[source]¶
Encode prompt to token IDs.
- Return type:
- Parameters:
struct (prompt_type)