PhaseLLM: Empowering AI-Driven Experiences
PhaseLLM is a comprehensive evaluation and workflow framework for large language models (LLMs), developed by Phase AI. This platform is designed to aid developers, product managers, and data scientists in leveraging AI models to create captivating and efficient user experiences.
Get Connected
- Keep up with the latest updates by following PhaseLLM on Twitter.
- Show your support on GitHub by starring the project.
- Explore detailed documentation and tutorials at Read the Docs.
Easy Installation
Installing PhaseLLM is straightforward using pip:
pip install phasellm
For those looking to run LLMs locally, such as using PhaseLLM's DollyWrapper
, it is recommended to install the complete package:
pip install phasellm[complete]
Additionally, for practical applications and demos, delve into the demos-and-products
folder of the repository, where detailed instructions for various products can be found within each product's README.md
.
What is PhaseLLM?
As the landscape of AI and LLMs grows with models like ChatGPT, PhaseLLM emerges as a pivotal tool. It aids in assessing and refining LLM-driven experiences, whether it's for content, products, or any AI-based interaction. With numerous models to choose from, optimizing the performance and relevance of your AI solutions becomes essential.
Key Functions of PhaseLLM:
-
Unified API Interaction: Integrate effortlessly with LLMs from providers such as OpenAI, Cohere, and Anthropic.
-
Robust Evaluation Frameworks: Analyze model outputs to identify the ones delivering superior user experiences.
-
Automated Evaluations: Utilize advanced models like GPT-4 to evaluate simpler ones like GPT-3, optimizing for both cost-effectiveness and performance speed.
PhaseLLM is an open-source initiative, aiming to expand its features to enhance model understanding and ease of product development. PhaseLLM is eager to collaborate with those working on LLM projects to support this vision.
Feature Spotlight: Travel Chatbot Prompt Evaluation
PhaseLLM simplifies the evaluation of different LLMs using other LLMs. Imagine developing a travel chatbot and comparing models like Claude and Cohere with the assistance of GPT-3.5. This framework allows for easy model integration and prompt testing through minimal coding, adaptable to complex requirements.
Example Workflow:
-
Setup API Keys:
import os from dotenv import load_dotenv load_dotenv() openai_api_key = os.getenv("OPENAI_API_KEY") anthropic_api_key = os.getenv("ANTHROPIC_API_KEY") cohere_api_key = os.getenv("COHERE_API_KEY")
-
Initialize Your Evaluator:
from phasellm.eval import GPTEvaluator e = GPTEvaluator(openai_api_key)
-
Define Experiment Parameters:
objective = "We're building a chatbot to discuss a user's travel preferences and provide advice." travel_chat_starts = [ "I'm planning to visit Poland in spring.", "I'm looking for the cheapest flight to Europe next week.", "I am trying to decide between Prague and Paris for a 5-day trip", "I want to visit Europe but can't decide if spring, summer, or fall would be better.", "I'm unsure I should visit Spain by flying via the UK or via France." ]
-
Set Up Models:
from phasellm.llms import CohereWrapper, ClaudeWrapper cohere_model = CohereWrapper(cohere_api_key) claude_model = ClaudeWrapper(anthropic_api_key)
-
Execute the Test:
print("Running test. 1 = Cohere, and 2 = Claude.") for tcs in travel_chat_starts: messages = [{"role":"system", "content":objective}, {"role":"user", "content":tcs}] response_cohere = cohere_model.complete_chat(messages, "assistant") response_claude = claude_model.complete_chat(messages, "assistant") pref = e.choose(objective, tcs, response_cohere, response_claude) print(f"{pref}")
This example highlights how PhaseLLM facilitates comprehensive model testing, allowing users to swiftly implement and compare major LLMs with ease.
Reach Out
Have questions, ideas, or feedback? Contact us at w (at) phaseai (dot) com. We're eager to connect with anyone interested in exploring LLM opportunities!