Skip to main content

Pandas AI: 5 Examples to Talk with Your Pandas Dataframe

Unleashing the Power of Conversational Data Analysis with Pandas AI

What is Pandas AI?

Step into a world where data analysis is as natural as having a conversation. That's what you get with Pandas AI. It is a cutting-edge Python library that revolutionizes data manipulation by bringing together the robustness of the Pandas library and the power of generative artificial intelligence.

Unlike traditional data manipulation libraries, Pandas AI enables you to interact with your datasets through natural language, allowing for seamless, interactive communication, which tremendously reduces the programming load on data scientists and analysts. Think of it as enhancing your favorite Pandas library with a conversational interface, which significantly streamlines data preparation efforts.

Quick Start: How to Install Pandas AI

Here is a basic setup:

# Install the necessary packages !pip install pandas pandasai   # Import the necessary modules import pandas as pd from pandasai import PandasAI from pandasai.llm.openai import OpenAI   # Load the OpenAI LLM OPENAI_API_KEY = "{Your_API_Key}" llm = OpenAI(api_token=OPENAI_API_KEY)   # Create a DataFrame from sample data df = "demo.csv"   # Print out the dataset df   # Create PandasAI object pandas_ai = PandasAI(llm)

Prepare Data with Pandas AI

With Pandas AI, data preparation becomes more conversational and interactive. We can load and inspect data using a conversational approach. For instance, suppose we have a DataFrame named 'df' loaded as demonstrated in the previous section.

# Print the first few entries in a conversational manner df_head = pandas_ai.run(df, prompt=''' Show me the first few entries of the DataFrame ''', is_conversational_answer=False)

The run function of the PandasAI object executes a conversational command. The prompt should contain the conversation with the AI, while the is_conversational_answer parameter determines the format of the AI's response.

Data Cleaning and Data Transformation with Pandas AI

Pandas AI comes equipped with powerful tools to clean and transform your data. For instance, consider filling missing values and removing duplicates:

# Fill missing data and remove duplicates df_clean = pandas_ai.run(df, prompt=''' give me a dataframe that  1) fills in the missing data  2) deletes the duplicated rows ''', is_conversational_answer=False)

In the above example, the prompt includes a task list for the AI. The AI interprets these tasks and performs them on the DataFrame, returning a cleaned DataFrame.

Predictive Analytics

PandasAI can be used for simplifying predictive analytics tasks as well. In this example, we'll leverage the Titanic dataset once again. Suppose we want to predict the survival chances of passengers based on the available data. The first step would be to prepare a model.

model = pandas_ai(df,                   prompt='Create a predictive model for Survival based on Pclass, Sex, Age, and Fare',                   show_code=True)

The returned object, model, is a predictive model trained on the specified parameters. The corresponding code to generate this model will also be printed out, if show_code is set to True.

The next step is to use this model to make predictions. Suppose we have another dataset df_test that contains information about passengers for whom we want to predict survival. We can do this as follows:

predictions = pandas_ai(df_test,                         prompt='Predict Survival using the model',                         model=model,                         show_code=True)

The predictions will be stored in the predictions object. Again, if show_code is set to True, the code to perform this prediction will also be printed out.

Please note that the quality of these predictions heavily depends on the quality of your model, which in turn is reliant on the quality and relevancy of your input data. It's always important to make sure your data is clean, relevant, and diverse enough to make accurate predictions.

Anomaly Detection

PandasAI can also assist in detecting anomalies in your data, a crucial aspect of data analysis and machine learning. For instance, if you have a dataset containing network traffic logs and you want to identify unusual patterns, you can use PandasAI for this task.

anomalies = pandas_ai(df,                       prompt='Detect anomalies in network traffic',                       show_code=True)

The anomalies object will contain rows from the original dataframe that have been identified as anomalies, based on the LLM's understanding of the dataset. The specific code to perform this detection will also be printed out if show_code is set to True.

It's important to remember that the effectiveness of this feature would heavily depend on the LLM's understanding of what constitutes an anomaly in the specific context of your data.

The above examples showcase just a few of the many capabilities of the PandasAI library. This library integrates natural language understanding capabilities of LLMs into Pandas, one of the most popular data analysis libraries in Python, to make data analytics more accessible to non-programmers and simplify the workflow for experienced data analysts. Remember to thoroughly read and understand the terms of service for each LLM provider you use, to ensure compliance with data privacy and usage guidelines.

Expanding the Power of Pandas with PandasAI

As data scientists and analysts, we're always on the lookout for tools that can make our lives easier and our work more efficient. One such tool that's been gaining traction in the data science community is PandasAI, an advanced natural language processing tool that integrates with pandas, the go-to data manipulation library for Python.

PandasAI leverages Language Learning Models (LLMs) to allow us to communicate with our data using natural language. It simplifies data manipulation and analysis tasks by reducing the need for extensive coding. It does not replace Pandas, but rather complements it, making it more accessible and powerful.

Now, it's time to dive into a practical example of using PandasAI. We'll be using a supermarket sales dataset for our demonstration. We've selected just three columns from this dataset: 'Gender', 'Product line', and 'Total'. After setting up our environment and reading in our data, we can instantiate a PandasAI object using an OpenAI API key.

With PandasAI set up, it's easy to extract information from our dataframe. For example, to find out what products are in the 'Product line' column, we can just ask the pandas_ai object:

pandas_ai.run(df, prompt="Which products are in Product line")

What if we want to do something more complex, like calculate the total amount spent by each gender? Again, all we need to do is ask:

pandas_ai.run(df, prompt="Calculate the total spent by each gender")

Even though these tasks could be accomplished using traditional pandas methods, PandasAI shines in its ability to handle such queries efficiently with minimal coding.

PandasAI can also generate plots. However, during our experiment, it was found that the library struggled to accurately plot a bar chart showing the total amount spent by each gender. Although it could correctly calculate the amounts, the visualization seemed off. This might be an issue with the dataset or the library itself, but it's likely to be fixed in future updates.

Even with a few glitches, PandasAI still demonstrates its utility when it comes to pivot tables and data visualization. With a little help from traditional pandas coding, we managed to create a barplot illustrating how much money each gender spent on each product line.

Conclusion

In conclusion, while PandasAI may not be perfect, it does bring a fresh approach to data analysis by blending the power of natural language processing and data manipulation. It has potential for those who are looking to reduce their coding load and approach data analysis from a more intuitive perspective.

Remember, for more detailed information about PandasAI and to stay updated with its latest features, be sure to visit the official GitHub repository (opens in a new tab).

FAQ

And now, let's wrap up with some Frequently Asked Questions.

  1. What is the main purpose of PandasAI?

PandasAI is designed to make data analysis easier and more accessible by allowing users to interact with their dataframes using natural language. This can simplify tasks that would typically require more complex code.

  1. Does PandasAI replace pandas?

No, PandasAI does not replace pandas. It complements it by providing an additional, user-friendly layer of functionality. You can still use traditional pandas methods alongside PandasAI.

  1. What are some limitations of PandasAI?

As of now, PandasAI seems to struggle with some complex data visualizations. However, the library is still being updated, and these issues are likely to be resolved in the future.


Read more about Data Analysis

Comments

Popular posts from this blog

Easily Embed PyGWalker in Streamlit for Data Visuzlization

  Streamlit and Pygwalker: Simplify Data Visualization and Exploration Welcome to an exciting journey where we explore the amazing capabilities of Streamlit and Pygwalker in analyzing and visualizing data effortlessly. Get ready to immerse yourself in the world of interactive data exploration! Introducing Streamlit Streamlit is a powerful Python library that simplifies the process of transforming your data scripts into interactive web applications. With Streamlit, you can bid farewell to the complexities of web development and coding challenges. It's a fast, open-source, and free solution for building and sharing data applications. Exploring Data Made Easy with Pygwalker Pygwalker, on the other hand, is a popular Python library designed specifically for data analysis and visualization. It provides data scientists and analysts with an intuitive interface for generating captivating visualizations, including scatter plots, line plots, bar charts, and histograms. The best part? You don...

Unpack List in Column Pandas: The Ultimate Guide!

Have you ever been stuck with a column in Pandas where the values are lists? Have you ever wondered how to unpack them and convert them into separate columns? If so, you're in the right place! Unpacking lists in Pandas is a fundamental skill that every data scientist should master. It enables you to convert complex nested lists into separate columns, allowing you to manipulate your data more efficiently. But how do you unpack lists in Pandas? And what are the best practices when doing so? In this ultimate guide, we'll answer all these questions and more. Let's dive in! What are Lists in Pandas? Before we start unpacking lists, let's first understand what they are in Pandas. Lists are a type of data structure in Pandas that can store multiple objects of different data types. They can be used to represent arrays of values, hierarchical data, and much more. For example, let's say you have a dataframe with a column that contains a list of values: import pandas a...