From AI Experiments to Industrial-Grade Validation
A comprehensive open-source ecosystem for professional AI Quality Assurance.
Created by the author of G-Eval implementation in AI automated testing and security platform Promptfoo
Don't just hope your AI is safe — prove it with data.
In the era of the EU AI Act and evolving global regulations, the gap between experimental R&D and production-ready verification is widening. To meet strict transparency and safety requirements, enterprises need automated AI QA instruments.
EVA-LLM provides the ecosystem to run massive-scale automated testing, transforming unpredictable model behavior into a Statistical SLA.
The Brain. Unified abstraction for LLM-as-a-Judge: G-Eval, B-Eval, and LLM-Rubric.
ReleaseAI evaluation for complex agentic scenarios in industry-standard Jest workflow.
ReleaseManifesto and methodology for measuring LLM Epistemic Honesty with Symmetry Deviation.
MVPThe Heart. High-performance "Fire & Forget" I/O-bound server for horizontal scaling of thousands and millions of tests.
ReleaseA bridge for the ecosystem, converting industry-standard Promptfoo-format into internal eva-run tasks.
ReleaseA terminal interface for local debugging and seamless CI/CD integration, supporting Promptfoo formats.
MVPMCP-compatible guardrails server basing on eva-judge for production runtime.
WIPA tool for Red Teaming, focusing on security and adversarial probes.
WIPA visual dashboard to manage high-volume test runs and analyze historical performance.