1.3 C
Washington

I found a hidden gem in Matplotlib’s library: Packed Bubble Charts in Python | by Anna Gordun Peiro | Jul, 2024

For my chart, I am using an Olympic Historical Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public domain license.Screenshot of datasetIt contains event to Athlete level Olympic Games Results from Athens 1896 to Beijing 2022. After an EDA (Exploratory Data Analysis) I transformed it into a dataset that details the number of female athletes in each sport/event per year. My bubble chart idea is to show which sports have a 50/50 female to male ratio athletes and how it has evolved during time.My plotting data is composed of two different datasets, one for each year: 2020 and 1996. For each dataset I’ve computed the total sum of athletes that participated to each event (athlete_sum) and how much that sum represents compared to the number of total athletes (male + female) (difference). See a screenshot of the data below:Screen shot of plotting datasetThis is my approach to visualise it:Size proportion. Using radius of bubbles to compare number athletes per sport. Bigger bubbles will represent highly competitive events, such as AthleticsMulti variable interpretation. Making use of colours to represent female representation. Light green bubbles will represent events with a 50/50 split, such as Hockey.Here is my starting point (using the code and approach from above):First resultSome easy fixes: increasing figure size and changing labels to empty if the size isn’t over 250 to avoid having words outside bubbles.fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(aspect=”equal”))#Labels edited directly in datasetSecond resultWell, now at least it’s readable. But, why is Athletics pink and Boxing blue? Let’s add a legend to illustrate the relationship between colours and female representation.Because it’s not your regular barplot chart, plt.legend() doesn’t do the trick here.Using matplotlib Annotation Bbox we can create rectangles (or circles) to show meaning behind each colour. We can also do the same thing to show a bubble scale.import matplotlib.pyplot as pltfrom matplotlib.offsetbox import (AnnotationBbox, DrawingArea,TextArea,HPacker)from matplotlib.patches import Circle,Rectangle# This is an example for one section of the legend# Define where the annotation (legend) will bexy = [50, 128]# Create your colored rectangle or circleda = DrawingArea(20, 20, 0, 0)p = Rectangle((10 ,10),10,10,color=”#fc8d62ff”)da.add_artist(p)# Add text text = TextArea(“20%”, textprops=dict(color=”#fc8d62ff”, size=14,fontweight=’bold’))# Combine rectangle and textvbox = HPacker(children=[da, text], align=”top”, pad=0, sep=3)# Annotate both in a box (change alpha if you want to see the box)ab = AnnotationBbox(vbox, xy,xybox=(1.005, xy[1]),xycoords=’data’,boxcoords=(“axes fraction”, “data”),box_alignment=(0.2, 0.5),bboxprops=dict(alpha=0))#Add to your bubble chartax.add_artist(ab)I’ve also added a subtitle and a text description under the chart just by using plt.text()Final visualisationStraightforward and user friendly interpretations of the graph:Majority of bubbles are light green → green means 50% females → majority of Olympic competitions have an even 50/50 female to male split (yay🙌)Only one sport (Baseball), in dark green colour, has no female participation.3 sports have only female participation but the number of athletes is fairly low.The biggest sports in terms of athlete number (Swimming, Athletics and Gymnastics) are very close to having a 50/50 split

━ more like this

Newbury BS cuts resi, expat, landlord rates by up to 30bps  – Mortgage Strategy

Newbury Building Society has cut fixed-rate offers by up to 30 basis points across a range of mortgage products including standard residential, shared...

Rate and Term Refinances Are Up a Whopping 300% from a Year Ago

What a difference a year makes.While the mortgage industry has been purchase loan-heavy for several years now, it could finally be starting to shift.A...

Goldman Sachs loses profit after hits from GreenSky, real estate

Second-quarter profit fell 58% to $1.22 billion, or $3.08 a share, due to steep declines in trading and investment banking and losses related to...

Building Data Science Pipelines Using Pandas

Image generated with ChatGPT   Pandas is one of the most popular data manipulation and analysis tools available, known for its ease of use and powerful...

#240 – Neal Stephenson: Sci-Fi, Space, Aliens, AI, VR & the Future of Humanity

Podcast: Play in new window | DownloadSubscribe: Spotify | TuneIn | Neal Stephenson is a sci-fi writer (Snow Crash, Cryptonomicon, and new book Termination...