Home » Data Analyst Project For Beginner : Analysis of 2024 Boston Marathon

Data Analyst Project For Beginner : Analysis of 2024 Boston Marathon

Data Analyst Project For Beginner : Analysis of 2024 Boston Marathon

Introduction

The Boston Marathon, one of the world’s oldest and most prestigious marathons, offers a unique opportunity to explore the interplay between weather conditions, runner performance, and race dynamics. The 2024 Boston Marathon dataset, available on Kaggle, provides comprehensive data on weather conditions and split times for marathon participants. This article delves into the process of analyzing this dataset to uncover performance patterns and identify key factors influencing race times, leveraging advanced data analytics techniques and tools.

Overview of the 2024 Boston Marathon Dataset

The 2024 Boston Marathon dataset encompasses detailed information about individual runner performances and weather conditions during the race, capturing essential parameters such as:

  • Split Times: Times recorded at various checkpoints throughout the race.
  • Finish Times: Final times for each runner.
  • Weather Conditions: Temperature, humidity, wind speed, and other weather variables at different points during the race.
  • Runner Information: Age, gender, and nationality of participants.
  • Race Date and Time: Specific timing details of the race day.

Objectives

The primary objectives of this analysis are:

  • Understanding Performance Dependencies: Investigating how runner performances correlate with factors such as age, gender, nationality, and weather conditions.
  • Exploring Split Time Patterns: Examining how runners’ speeds vary across different segments of the race.
  • Assessing Weather Impacts: Determining how weather conditions influence overall race times and split times.

Hypotheses

  • H1: Age and Performance: Younger runners will have faster finish times and more consistent split times compared to older runners.
  • H2: Gender and Performance: Male runners will have faster finish times on average compared to female runners, but the difference will be minimal among elite runners.
  • H3: Weather Impact: Higher temperatures and humidity levels will negatively impact finish times and split consistency, with significant slowdowns in the latter half of the race.
  • H4: Nationality and Performance: Runners from countries with a history of strong marathon performances (e.g., Kenya, Ethiopia) will have faster finish times compared to runners from other countries.
  • H5: Consistency in Splits: Runners who maintain a consistent pace throughout the race will have better overall performance compared to those with significant variations in their split times.

Analytical Process

1. Preliminary Exploration using Google Sheets

The initial step involves importing the 2024 Boston Marathon dataset into Google Sheets for a high-level overview. This phase focuses on:

  • Data Structuring: Understanding the dataset’s structure and dimensions.
  • Basic Statistics: Calculating summary statistics such as mean finish times, peak performance ages, and gender distribution.
  • Identifying Data Quality Issues: Flagging missing values, outliers, and inconsistencies that may require further cleaning.

2. Data Cleaning and Analysis with Python

Transitioning to Python, the dataset undergoes rigorous cleaning and transformation steps using libraries such as pandas, numpy, and matplotlib:

  • Cleaning Data: Handling missing values, duplicates, and correcting data types for accurate analysis.
  • Feature Engineering: Creating new features like pace per mile and weather impact scores.
  • Exploratory Data Analysis (EDA): Visualizing distributions, trends, and relationships between variables using seaborn and matplotlib to uncover insights.

3. Visualization and Reporting with Power BI

For comprehensive visualization and reporting, the cleaned dataset is imported into an SQL database and connected to Power BI:

  • Interactive Dashboards: Creating dynamic dashboards in Power BI to visualize:
    • Distribution of finish times.
    • Performance patterns by age, gender, and nationality.
    • Weather condition impacts on different race segments.
    • Correlations between split times and overall race performance.

Insights and Applications

The insights derived from this analysis can offer substantial benefits to marathon organizers, coaches, and runners:

  • Optimized Training Programs: Tailoring training regimens based on performance patterns observed across different age groups and genders.
  • Race Day Strategies: Adjusting strategies for race day based on predicted weather conditions and their historical impact on performance.
  • Enhanced Runner Experience: Providing personalized feedback to runners to help them improve their race performance.
  • Improved Event Planning: Informing logistical decisions such as hydration station placements and medical support based on performance and weather data.

Conclusion

Analyzing the 2024 Boston Marathon dataset provides a compelling glimpse into marathon dynamics and runner performances. By leveraging data analytics techniques—from initial exploration and cleaning to advanced visualization and interpretation—this analysis not only uncovers actionable insights but also demonstrates the power of data-driven decision-making in enhancing marathon experiences and performance outcomes.

Whether you’re a data enthusiast, coach, or marathon runner, exploring such datasets offers invaluable opportunities to understand and improve the way we approach marathon training and race day strategies, one mile at a time.

Frequently Asked Questions

1. What is the 2024 Boston Marathon dataset, and why is it significant?

The 2024 Boston Marathon dataset contains detailed information about runner performances and weather conditions during the marathon. It includes data points such as split times, finish times, runner demographics, and weather variables. This dataset is significant as it provides insights into how various factors influence marathon performance, helping optimize training and race strategies.

2. What tools and technologies are used for analyzing the 2024 Boston Marathon dataset?

Tools commonly used include:
Python: For data cleaning, analysis (using libraries like pandas, numpy), and visualization (matplotlib, seaborn).
SQL: To manage and query data when working with large datasets or relational databases.
Power BI or Tableau: For creating interactive visualizations and dashboards to present insights.
Google Sheets: For preliminary data exploration and basic analysis.

3. How can insights from analyzing the 2024 Boston Marathon dataset benefit marathon training and race planning?

Insights derived can help:
Optimize Training Programs: Improve training regimens based on observed performance patterns.
Enhance Race Day Strategies: Adjust strategies based on predicted weather conditions and historical impact on performance.
Improve Runner Experience: Provide personalized feedback to runners to help them improve their race performance.
Inform Event Planning: Make logistical decisions such as hydration station placements and medical support based on performance and weather data.