1- Data Analytics and Visualization in American Football (NFL, CFL, NCAAF)
Progression of leaguewide passer rating in the NFL since stats first started being recorded in 1932 until 2020
Raw data is here. All data sourced from Pro Football Reference.. You can find out what passer rating is and how to calculate it here.
2- Positions of NFL Hall of Famers
Description: This is chart (version 2) shows the distribution of positions played by the 300 player inductees in the Pro Football Hall of Fame.
Some pre-modern era positions were consolidated into their modern positional equivalents. All running positions (fullback, tailback, etc...) were consolidated into the running back position to best illuminate the type of positions played.
3- Top 20 all-time in NFL Sacks per game
Source: pro football reference
Data: The data in my source isn’t in a sacks-per-game format, I worked with the data in excel to get that info, as part of a larger project I’m working on where I created a new metric called the Sack Index
Note on current players: this chart has an abundance of current players because many of these guys are in their prime. It’s extremely likely that they will all move down the list as they age. For example, we are starting to see J.J Watt move down the list. In Deacon Jones last 4 seasons he averaged under 0.4 sacks per game. Guys like Chandler Jones, Myles Garrett, T.J. Watt will likely not sustain their current rates, but they are beasts right now and deserve the spot they are in.
4- Frequency of NFL team total wins after starting the season 0-3
Source: Pro football reference
11% of 3-0 teams have won the Super Bowl
75% of 3-0 teams have gone the the playoffs
The average win total to end the season is 10.7 (11.4 adjusted to 17-game season) for a 3-0 team.
Here is the frequency of win totals for teams to start the season 3-0.
5- Using machine learning methods to group NFL quarterbacks into archetypes
Data collected from a series of rushing and passing statistics for NFL Quarterbacks from 2015-2020 and performed a machine learning algorithm called clustering, which automatically sorts observations into groups based on shared common characteristics using a mathematical "distance metric."
The idea was to use machine learning to determine NFL Quarterback Archetype to agnostically determine which quarterbacks were truly "mobile" quarterbacks, and which were "pocket passers" that relied more on passing. I used a number of metrics in my actual clustering analysis, but they can be effectively summarized across two dimensions: passing and rushing, which can be further roughly summarized across two metrics: passer rating and rushing yards per year. Plotting the quarterbacks along these dimensions and plotting the groups chosen by the clustering methodology shows how cleanly the methodology selected the groups.
Read this blog article on the process for more information if you're interested, or just check out this blog in general if you found this interesting!
Data: Collected from the ESPN API
6- Map of the most popular Division 1 college football teams in the United States. Each county's color reflects the team with the most Twitter followers.
Map was created using data downloaded through the Twitter API in R.
The data were analyzed in R and visualized in Tableau.
Areas of the map with no clear pattern indicate that the area either had a low amount of data available or that there is no clear favorite.
To view breakdown of the top 5 in each county, check out this interactive version: web map
5- How long did cover athletes for the "NCAA Football" series play in the NFL after their college careers?
7- All NFL leads greater than 14 points where the team went on to lose, since 1920
Source: Pro Football Reference
Tools: MS Excel
8- Analysis of the correlation of NFL team records from one year to the next. (Teams at the far ends of the standings show regression; weak year-to-year correlation)
Source: Exported season standings for the last 20 years from pro-football-reference
9- Progression of NFL scoring over time, by points per team per game
Raw data is here. All data sourced from Pro Football Reference.
Wondering if covid had anything to do with scoring being so high in 2020-2021?
It affected the league's offseason training schedule, and historically when that schedule has been disrupted it has harmed defenses more than offenses, leading to increased scoring. The league has also been changing the rules lately to encourage higher scoring.
There's also the matter of a lack of fans in the stands, which makes it easier for offensive players to communicate with each other.
10- Combined faces of top 200 NFL players (with positions)
Data used : here
Technologies used : OpenCV, Dlib
A timeline of running backs drafted in the first round of the NFL draft since 2000
11- Analysis of NFL quarterbacks drafted in the top 10 (49% success rate)
Scoring/Ranking system: My own analysis. Like any ranking, stat or rating, my system is arbitrary and some will argue the threshold is too low, others might say it’s too high.
I used the metrics I did, because statistics have less meaning when comparing across different eras. The 4 metrics I used: Years as a team’s primary starter, years total, pro bowls and All-Pros are still comparable across eras. One can argue that Pro-Bowls are less reliable because they are a popularity contest, but this is only 1/4th of the stats and I still think it’s directionally valid over the long term.
Why did I pick 7.5 as the threshold? Of course with a top pick you want a perennial all-pro and a HOFer. But if a player is a 7 year starting QB in the league with an all-pro section, I think that’s enough to say they were a success. Other examples of qualifying as a success:
10-year career, 7 starting and 3 as a backup
5-year career, all as a starters with two pro bowls and an All-Pro
Source: the attendance data was taken from a Tidy Tuesday post.There are additional links to Pro Football Reference within. The author also got data on the NFL International Series from wikipedia
Tools: R, mainly tidyverse packages including dplyr and ggplot2, as well as rvest for web scraping. The code for downloading the data, cleaning it and creating the chart is here
The chart shows the regular season average home game attendance for each NFL team from 2000 to 2019. I also included the season win percentage as the background color, and marked seasons where teams went on to the playoffs or won the Superbowl that year (playoff attendance is not included). I have not counted the attendance for games played as part of the NFL International Series in London and Mexico. The attendances at these games often inflated the average attendance for teams that don't typically have such large crowds at their actual home games.