Introduction to PandasAI
PandasAI is a dynamic Python platform designed for seamless interaction with data using natural language. This innovative tool empowers non-technical users to engage with their data intuitively, while it streamlines the data-handling process for more technical users, saving both time and effort.
Deploying PandasAI
PandasAI offers versatile deployment options. Users can integrate it into Jupyter notebooks or Streamlit applications effortlessly. Furthermore, it can be deployed as a REST API using frameworks like FastAPI or Flask. For those interested in more robust solutions such as PandasAI Cloud or self-hosted enterprise offerings, the team encourages direct contact.
Getting Started
Comprehensive documentation for PandasAI is readily available online, guiding users through its diverse functionalities. Whether implemented in Jupyter notebooks, within Streamlit applications, or through a client-server setup from the repository, PandasAI offers a flexible approach to data interaction.
Using the Platform
PandasAI is crafted using a dockerized client-server architecture. Setting it up requires Docker, and the installation process involves cloning the repository and building it using Docker Compose.
git clone https://github.com/sinaptik-ai/pandas-ai/
cd pandas-ai
docker-compose build
After building the platform, it can be launched effortlessly:
docker-compose up
The client becomes accessible at http://localhost:3000
.
Using the Library
Installing the PandasAI library is straightforward and can be done using popular package managers like pip or poetry.
With pip:
pip install pandasai
With poetry:
poetry add pandasai
Demonstrations and Usage
PandasAI can be explored firsthand in a web browser, thanks to its availability on platforms like Google Colab.
Engaging with Data
Once set up, PandasAI allows users to query data using simple, intuitive questions. For instance:
import os
import pandas as pd
from pandasai import Agent
# Sample DataFrame
sales_by_country = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"revenue": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})
# Setting an API key
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"
agent = Agent(sales_by_country)
agent.chat('Which are the top 5 countries by sales?')
The output is straightforward:
China, United States, Japan, Germany, Australia
Users are also capable of asking more intricate questions or generating visually appealing charts, adding a layer of versatility to the data analysis process.
Multiple DataFrames
PandasAI handles queries across multiple DataFrames, making it exceptionally powerful for complex data interactions. Users can explore connections between varied datasets seamlessly.
# Example with multiple DataFrames
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
agent = Agent([employees_df, salaries_df])
agent.chat("Who gets paid the most?")
The response is clear and succinct:
Olivia gets paid the most.
Privacy and Security
PandasAI is committed to maintaining user privacy. When generating Python code, it securely handles data by randomizing samples and ensuring sensitive information remains protected. Users can further enhance privacy by opting out of sending data to the LLM, focusing solely on column names.
Licensing and Contributions
PandasAI is distributed under the MIT license, although specific directories may have different licensing terms. It encourages community involvement, welcoming contributions and providing ample resources for users, including thorough documentation and community support through Discord.
The project thrives on collaboration, and contributors are acknowledged for their valuable input in enhancing PandasAI's capabilities.
In summary, PandasAI stands as a powerful tool for transforming data interaction through intuitive natural language queries, providing robust functionality for both non-technical and technical users alike.