Open-source (Apache 2.0) framework for evaluating LLM apps the way Pytest tests code — assertions backed by 50+ ready metrics spanning LLM-as-judge, RAG, agents, conversation, and safety. Plugs into LangChain, CrewAI, OpenAI Agents and more. Confident AI is the paid cloud platform that adds test management, dashboards, and observability on top.
Eval · Confident AI
DeepEval
Pytest-style LLM evaluation framework. Open source.
FREEMIUMOpen coreHybridCLIAPI
Model support
BYO key / model
Bring any provider/model for the LLM-as-judge metrics.
Where it runs
- CLI
- API
Tags
- #eval
- #open-source
- #llm-as-judge
- #rag
- #ci