This dataset is found on Kaggle and is based on the Gallup World Poll. It ranks 149 countries in terms of happiness while also providing different scores for a variety of factors which would impact a country’s happiness score, including ‘Freedom to make life choices’, ‘Life expectancy’, ‘Generosity’ and ‘GDP per capita’.
The data is straightforward to load into Python and contains no missing values with all neccesary columns in their required data type formats.
report = pd.read_csv('world-happiness-report-2021.csv')print(report.columns.isnull().sum())
# there are no null values in the dataset so we can continue
Mushrooms come in a wide variety of shapes and sizes and colors and some are edible while others should be kept far away from the dinner table. Mushroom classification for those who live in rural areas is something crucial to survival — at least it might have been centuries ago. Skilled horticulturist might not even need a second glance to classify a mushroom. Let’s see if we can build a model to have it automatically detected using a number of variables.
Our data comes from to us from a 1987 dataset found on Kaggle now with over 8124 records.
I’ve been in an active WhatsApp group chat for nearly 5 years with a bunch of friends. It interested me to find out about the habits of the participants.
WhatsApp chat data is available to download with or without media. I downloaded a 18MB file of over 250,000 messages sent in the span of 1,737 days. Information is in a text file, timestamped with the sender and message sent.
Cleaning and Pre-processing the data
import pandas as pd
import numpy as np
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns
The Statistical Committee of the Republic of Armenia kindly makes publicly available the most popular newborn names from 2003 to 2019. I wanted to see if there are any interesting trends or habits in baby naming. We first download the data for both boys and girls into two comma-separated files.
girl = pd.read_csv('girl.csv', skiprows=1, na_values='...').dropna()
boy = pd.read_csv('boy.csv', skiprows=1, na_values='...').dropna()
The data is in wide format, so we need to convert to long format and then add a column indicating gender in both long dataframes.
years = [str(i) for i in np.arange(2003,2020).tolist()]
girl = girl.iloc[1:,:]; boy = boy.iloc[1:,:]
girl = pd.melt(girl…