This dataset is found on Kaggle and is based on the Gallup World Poll. It ranks 149 countries in terms of happiness while also providing different scores for a variety of factors which would impact a country’s happiness score, including ‘Freedom to make life choices’, ‘Life expectancy’, ‘Generosity’ and ‘GDP per capita’.

The data is straightforward to load into Python and contains no missing values with all neccesary columns in their required data type formats.

report = pd.read_csv('world-happiness-report-2021.csv')print(report.columns.isnull().sum())
# there are no null values in the dataset so we can continue

print(report.dtypes) …

Mushrooms come in a wide variety of shapes and sizes and colors and some are edible while others should be kept far away from the dinner table. Mushroom classification for those who live in rural areas is something crucial to survival — at least it might have been centuries ago. Skilled horticulturist might not even need a second glance to classify a mushroom. Let’s see if we can build a model to have it automatically detected using a number of variables.

Our data comes from to us from a 1987 dataset found on Kaggle now with over 8124 records.


I’ve been in an active WhatsApp group chat for nearly 5 years with a bunch of friends. It interested me to find out about the habits of the participants.

WhatsApp chat data is available to download with or without media. I downloaded a 18MB file of over 250,000 messages sent in the span of 1,737 days. Information is in a text file, timestamped with the sender and message sent.

Text file from WhatsApp

Cleaning and Pre-processing the data

Packages used:

import pandas as pd
import re
import numpy as np
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns

Because all…

The Statistical Committee of the Republic of Armenia kindly makes publicly available the most popular newborn names from 2003 to 2019. I wanted to see if there are any interesting trends or habits in baby naming. We first download the data for both boys and girls into two comma-separated files.


girl = pd.read_csv('girl.csv', skiprows=1, na_values='...').dropna()
boy = pd.read_csv('boy.csv', skiprows=1, na_values='...').dropna()

The data is in wide format, so we need to convert to long format and then add a column indicating gender in both long dataframes.

Boy names in wide format
years = [str(i) for i in np.arange(2003,2020).tolist()]
girl = girl.iloc[1:,:]; boy = boy.iloc[1:,:]
girl = pd.melt(girl…


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store