#LLM Evaluation
continuous-eval
continuous-eval is an open-source tool providing a data-driven evaluation for LLM applications. It uses a modular approach to assess each pipeline segment with specific metrics, supporting RAG, code generation, and classification through diverse metric types. Leverage its ability to use feedback and synthetic datasets for thorough testing. Explore custom metrics for comprehensive evaluations.
deepeval
DeepEval is an open-source tool that evaluates large-language models (LLMs) using metrics such as G-Eval and answer relevancy. It operates locally, supports CI/CD workflows, and offers integration with platforms like Hugging Face. DeepEval helps determine hyperparameters for optimal LLM performance and facilitates transitions between systems such as OpenAI and self-hosted Llama2.
Feedback Email: [email protected]