EVA-LLM | Industrial-Grade AI Validation

Release

The Brain. Unified abstraction for LLM-as-a-Judge: G-Eval, B-Eval, and LLM-Rubric.

Release

AI evaluation for complex agentic scenarios in industry-standard Jest workflow.

Release

Manifesto and methodology for measuring LLM Epistemic Honesty with Symmetry Deviation.

MVP

The Heart. High-performance "Fire & Forget" I/O-bound server for horizontal scaling of thousands and millions of tests.

Release

A bridge for the ecosystem, converting industry-standard Promptfoo-format into internal eva-run tasks.

Release

A terminal interface for local debugging and seamless CI/CD integration, supporting Promptfoo formats.

MVP

MCP-compatible guardrails server basing on eva-judge for production runtime.

WIP

A tool for Red Teaming, focusing on security and adversarial probes.

WIP

A visual dashboard to manage high-volume test runs and analyze historical performance.