Skip to main content

A converter for Promptfoo test formats into the EVA-LLM ecosystem eva-run tasks.

NOTE! It supports restricted Promptfoo format and extends it with own features (see examples below).

Quick Start

npm i @eva-llm/eva-parser
import { parsePromptfoo } from '@eva-llm/eva-parser';

const evaTests = parsePromptfoo(promptfooYamlContent);

Supported Promptfoo Items

Providers

providers:
- openai:gpt-5-mini
- openai:gpt-4.1-mini
providers:
- id: openai:gpt-5.2
config:
temperature: 0

Prompts

prompts:
- Hello, how are you?
- What is the capital of France?
prompts:
- What is the capital of {{country}}

Variables

test:
- vars:
country: France

Asserts

NOTE! All LLM asserts support natively Dark Teaming to measure Epistemic Honesty via Symmetry Deviation, and extend Promptfoo format with field must_fail

b-eval (binary g-eval - eva-llm specific)

test:
- assert:
- type: b-eval
value: answer is coherent to question # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvement

g-eval

test:
- assert:
- type: g-eval
value: answer is coherent to question # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvement

llm-rubric

test:
- assert:
- type: llm-rubric
value: answer is polite # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field

equals

test:
- assert:
- type: equals
value: Paris
case_sensitive: false # optional (default true, eva-run specific)

not-equals

test:
- assert:
- type: not-equals
value: Chicago
case_sensitive: false # optional (default true, eva-run specific)

contains

test:
- assert:
- type: contains
value: Paris
case_sensitive: false # optional (default true, eva-run specific)

not-contains

test:
- assert:
- type: not-contains
value: Chicago
case_sensitive: false # optional (default true, eva-run specific)

regex

test:
- assert:
- type: regex
value: /paris/i

License

MIT