Statistical Analysis in Python with Skimpy

3 min readApr 3, 2024

What is Skimpy?

Skimpy is an open-source python library that is used to generate a statistical summary of the quantitative datasets and can be used in Juptyer Notebook as console also.

It provides a comprehensive set of tools for users to efficiently explore, manipulate, and derive insights from datasets. From loading and summarizing data to visualizing patterns and detecting anomalies, Skimpy offers a user-friendly interface and robust functionalities to facilitate effective data analysis tasks.

The name “Skimpy” likely reflects its purpose of providing quick and concise insights into data. Just as “skimpy” means small or minimal, the library may aim to offer essential and succinct summaries of data without unnecessary complexity.

Additionally, the term “skim” suggests the act of quickly examining or reviewing, aligning with the library’s focus on facilitating efficient data analysis. Overall, the name “Skimpy” may evoke the idea of a lightweight and straightforward tool for exploring and understanding data.

In this article, we will explore Skimpy and create some statistical analysis using it.

Let’s get started…

Installing required libraries

We will start by installing Skimpy using pip installation. The command given below will install Skimpy using pip.

pip install skimpy

Importing required libraries

In this step, we will import all the libraries that are required for creating the statistical analysis and loading the data.

from skimpy import skim, generate_test_data
import seaborn as sns

Creating Statistical Summary

We will start by creating the statistical summary in a jupyter notebook, the dataset that we will be using here is defined under seaborn with the name Tips.

Let’s create the Statistical Summary:

df = sns.load_dataset("tips")
skim(df)

Here we can clearly see the analysis generated which contains all the data points and summary related to it. It contains Data Types, categories, Missing data, etc.

Now let us see how to create this analysis in the console. We can do that by simply running the command given below. Remember to change the file name while running the command.

skimpy Diabetes.csv

Here you can see how easily we can create the statistical summary easily from both console and notebook.

Try this with different datasets, create a Statistical Summary, and let me know your comments in the response section.

Key Takeaways

Skimpy is a Python library for generating statistical summaries of datasets.
These summaries include information like data types, categories, and missing data.

Happy Learning !!☺️

👉 If you liked this post, please leave a like ❤️ and Subscribe to DataMantra.