Skip to main content

Lightweight, high-performance orchestration engine designed to manage eva-run clusters.

It acts as a centralized control plane and test load balancer, utilizing a Redis-based message bus to ensure seamless task distribution and fault tolerance.


Quick Start

git clone https://github.com/eva-llm/eva-desk
cd eva-desk
nvm use
pnpm install
export CLUSTER_REDIS_URL="redis://..."
export DATABASE_URL="postgresql://..."
pnpm run server

Once the control plane is active, eva-run worker nodes can be provisioned and connected.

Infrastructure Note: For Enterprise-Grade deployments, node scaling should be managed via automated orchestration frameworks such as Kubernetes (K8s), Docker Swarm, or Terraform to ensure infrastructure-level resilience.


Operating Principle

eva-desk implements a robust, fault-tolerant architecture for cluster management, ensuring maximum delivery guarantees for massive-scale evaluation workloads.

Key Mechanisms:

  • Heartbeat Monitoring: Each eva-run node maintains a persistent heartbeat, reporting its health status and available capacity to the orchestrator.

  • Adaptive Balancing: Tasks are dynamically distributed across the cluster based on real-time node telemetry and completion rates.

  • Horizontal Scalability: The architecture is engineered for massive horizontal scaling, capable of orchestrating test runs exceeding millions of scenarios across heterogeneous clusters.

  • Self-Healing Queue: Utilizes a non-blocking logic to ensure that if a node fails, tasks are transparently re-queued and re-routed to healthy instances.


API Reference

POST /eval

Interface Compatibility: Fully compliant with the eva-run API.

Enables seamless eva-cli integration for manual execution and debugging.

Note: This endpoint is restricted during active automated test runs to prevent state conflicts.

POST /run

Payload: { "run_id": "uuid_v7" }

Triggers an automated evaluation cycle. The orchestrator retrieves test schemas from the PostgreSQL repository associated with the run_id and initiates the distribution protocol across the active cluster.

GET /current_run

Returns the identifier of the execution context currently being processed by the cluster.

GET /runs_queue

Retrieves the prioritized queue of run_ids awaiting orchestration.

GET /nodes

Provides a real-time registry of active, authorized nodes within the cluster.

GET /health

Standard health check endpoint for monitoring and ingress controllers.