Our Data
-
Sourced
Our data originates from the Formula 1 World Championship dataset accessible on Kaggle (https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020), offering an extensive repository covering the years 1950 to 2023.
Comprising 14 CSV files and a total of 120 columns, our dataset is rich and multifaceted, providing a comprehensive view of Formula 1 history and statistics.
-
Pre- Processing
To ensure the accuracy and reliability of our analyses, our data undergoes various preparation processes. We employ join operations to seamlessly integrate multiple datasets, facilitating holistic analyses across various dimensions.
Utilizing Tableau Prep, we meticulously combine, shape, and clean our data, ensuring consistency and coherence across the dataset.
Enriching our dataset with calculated variables enhances its analytical depth and breadth, we categorized podium finishes into 1st, 2nd, and 3rd places, providing insights into driver performance and race outcomes. Additionally, we converted lap and Pitstop times to better fit the analysis.
Our data preparation also included some comprehensive work on first and last names of drivers since there are many “formula 1 families” with drivers with the same last name across history.