Danny Malter

Data Science Manager - Accenture
M.S. in Predictive Analytics - DePaul University

Danny Malter

Me
Malter Analytics
GitHub
LinkedIn
YouTube Channel
Twitter
Kaggle

Other Work
General Assembly
AriBall

Media
Built In

Anthony Rizzo Didn’t Only Beat Cancer

library(Lahman)
library(dplyr)
library(data.table)
library(knitr)
library(kableExtra)

# Subset master table to only playerID and player names since that's all we need.
master <- select(Master, playerID, nameFirst, nameLast)

# Find primary positions
fielding <- Lahman::Fielding

PrimPos <- fielding %>% 
  subset(select=c("playerID", "yearID", "teamID", "lgID","G", "POS")) %>% 
  group_by(playerID, yearID, teamID, lgID, POS) %>%
  summarise(G = sum(G))

PrimPos <- PrimPos %>%
  group_by(playerID, yearID) %>%
  slice(which.max(G))

PrimPos <- PrimPos[,c("playerID", "yearID", "POS")]

# group by in case players played on multiple teams in one season
batting <- Batting
batting$teamID <- NULL
batting$stint <- NULL
batting$lgID <- NULL
batting <- batting %>%
  group_by(playerID, yearID) %>%
  summarise_all(funs(sum))

# Join to Master and Fielding to get the player names in master and position from fielding
batting <- left_join(batting, master, by = "playerID")
batting <- merge(batting, PrimPos, all = TRUE)

batting <- subset(batting, AB > 0 & yearID >= 1970 & POS != 'P')

# Add some statistics to batting table
batting <- batting %>%
  mutate(BA = 0 + (AB > 0) * round(H/AB, 3),
         TB = H + X2B + 2 * X3B + 3 * HR,
         SA = 0 + (AB > 0) * round(TB/AB, 3),
         PA = AB + BB + IBB + HBP + SH + SF,
         OB = H + BB + IBB + HBP,
         OBP = 0 + (AB > 0) * round(OB/PA, 3) )

# Add a column for the number of seasons played
batting <- batting %>%
  group_by(playerID) %>%
  mutate(years = seq(n())) %>%
  mutate(seasons = n())

batting <- as.data.frame(batting)

# Players with a bad first season
bad.first <- subset(batting, AB >= 50 & BA < .150 & years == 1 & yearID >= 1970)
bad.first.player <- bad.first[,1]
length(bad.first.player) # 34 players

batting$bad_first <- batting$playerID %in% bad.first.player

bad.df <- subset(batting, bad_first == TRUE)
bad.df <- setDT(bad.df)[order(playerID, years),]
bad.df <- as.data.frame(bad.df)

# Players with a bad first year that only played 1 season
bad.one.season <- subset(bad.df, seasons == 1)
bad.one.season[is.na(bad.one.season)] <- 0
nrow(bad.one.season)

# Players with a bad first year that only played at least 2 seasons
bad.mult.seasons <- subset(bad.df, seasons > 1)
bad.mult.seasons[is.na(bad.mult.seasons)] <- 0
bad.mult.seasons <- as.data.frame(bad.mult.seasons)

unique.bad <- bad.mult.seasons %>%
  group_by(playerID) %>%
  summarise(n_distinct(playerID))

# 26 offensive players with a poor first season that played at least 2 seasons
nrow(unique.bad)

# Count the number of players that played multiple seaons; at least 50 AB in season 1
mult.seasons <- subset(batting, seasons > 1 & AB > 50)
mult.seasons[is.na(mult.seasons)] <- 0

unique.players <- mult.seasons %>%
  group_by(playerID) %>%
  summarise(n_distinct(playerID))

# 3177 offensive players that played at least 2 seasons
nrow(unique.players)

Note:
All code can be toggled on the right
The data used is from 1970-2015


There’s no doubt about it, Anthon Rizzo is a class act for Major League Baseball. Every time I view or read something from Chicago media, the Anothony Rizzo Family Foundation has just donated millions of dollars to help fight cancer, he’s visiting children at Lurie Children’s Hospital in Chicago's northside, or he’s spending his off days riding a bike on Chicago’s lake front path just like any other person in Chicago.

Rizzo, a sixth round draft pick by the Boston Red Sox in the 2007 MLB Draft, found out just a year later that he’d be diagnosed with Hodgkin’s lymphoma. There is no need to explain how difficult it is to get drafted by a MLB team in the first place, but I can’t imagine how much more difficult it would be after going through six months of chemotherapy. As of today, we all know the ending of the story that includes a successful MLB career, a World Series ring and a fan base in Chicago that loves him. However, little do people realize that Rizzo actually started his MLB career off terriblly and succeeded in a way that only a few other MLB players have ever done. In other words, Rizzo beat more than just cancer.

With 128 AB in his rookie season, Rizzo batted just .141. Since 1970, only 34 offensive rookie year players have been given an opportunity to have over 50 at-bats while batting less than .150. Of those 34 players, 8 of them played only one season in the Majors and 26 played at least two seasons. In total, 3,177 offensive players have played at least two seasons while having at least 50 AB in his rookie year. In other words, Rizzo was in the bottom 1% of multi-season players going into his second year.

To sum things up, Rizzo not only overcame cancer, he overcame some big odds against him ever succeeding after a poor rookie year.

rizzo <- subset(batting, playerID == 'rizzoan01')
new.rizzo <- rizzo[c('nameFirst','nameLast','yearID','G','AB','R','H','X2B','X3B','HR','RBI','BB','SO','BA','PA','OBP')]

kable(new.rizzo) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"))
rownames(new.rizzo) <- 1:nrow(new.rizzo)
  
nameFirst nameLast yearID G AB R H X2B X3B HR RBI BB SO BA PA OBP
1 Anthony Rizzo 2011 49 128 9 18 8 1 1 9 21 46 0.141 154 0.286
2 Anthony Rizzo 2012 87 337 44 96 15 0 15 48 27 62 0.285 369 0.344
3 Anthony Rizzo 2013 160 606 71 141 40 2 23 80 76 127 0.233 697 0.330
4 Anthony Rizzo 2014 140 524 89 150 28 1 32 78 73 116 0.286 623 0.393
5 Anthony Rizzo 2015 160 586 94 163 38 3 31 101 78 105 0.278 710 0.394


comments powered by Disqus