A converter for Promptfoo test formats into the EVA-LLM ecosystem eva-run tasks.
NOTE! It supports restricted Promptfoo format and extends it with own features (see examples below).
Quick Start
npm i @eva-llm/eva-parser
import { parsePromptfoo } from '@eva-llm/eva-parser';
const evaTests = parsePromptfoo(promptfooYamlContent);
Supported Promptfoo Items
Providers
providers:
- openai:gpt-5-mini
- openai:gpt-4.1-mini
providers:
- id: openai:gpt-5.2
config:
temperature: 0
Prompts
prompts:
- Hello, how are you?
- What is the capital of France?
prompts:
- What is the capital of {{country}}
Variables
test:
- vars:
country: France
Asserts
NOTE! All LLM asserts support natively Dark Teaming to measure Epistemic Honesty via Symmetry Deviation, and extend Promptfoo format with field must_fail
b-eval (binary g-eval - eva-llm specific)
test:
- assert:
- type: b-eval
value: answer is coherent to question # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvement
g-eval
test:
- assert:
- type: g-eval
value: answer is coherent to question # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
answer_only: true # optional (default false, eva-run specific) - analyze only LLM answer without prompt involvement
llm-rubric
test:
- assert:
- type: llm-rubric
value: answer is polite # can be array as well
threshold: 0.5 # optional (default is 0.5 in eva-run)
provider: # optional (default is test provider)
- id: openai:gpt-4.1-mini
config:
temperature: 0 # optional (default is 0 in eva-run as factual standard for better judging)
must_fail: true # optional (default false, eva-run specific) - Dark Teaming field
equals
test:
- assert:
- type: equals
value: Paris
case_sensitive: false # optional (default true, eva-run specific)
not-equals
test:
- assert:
- type: not-equals
value: Chicago
case_sensitive: false # optional (default true, eva-run specific)
contains
test:
- assert:
- type: contains
value: Paris
case_sensitive: false # optional (default true, eva-run specific)
not-contains
test:
- assert:
- type: not-contains
value: Chicago
case_sensitive: false # optional (default true, eva-run specific)
regex
test:
- assert:
- type: regex
value: /paris/i
License
MIT