Eval · Confident AI

DeepEval

Pytest-style LLM evaluation framework. Open source.

FREEMIUMOpen coreHybridCLIAPI

Open-source (Apache 2.0) framework for evaluating LLM apps the way Pytest tests code — assertions backed by 50+ ready metrics spanning LLM-as-judge, RAG, agents, conversation, and safety. Plugs into LangChain, CrewAI, OpenAI Agents and more. Confident AI is the paid cloud platform that adds test management, dashboards, and observability on top.

Model support

BYO key / model

Bring any provider/model for the LLM-as-judge metrics.

Where it runs

DeepEval

BYO key / model

Braintrust

Promptfoo