Using Statcast data to estimate minor league home run distance

JFtC writer, Jeff Quattrociocchi presents a way to estimate home runs at the MiLB level!





Note: This post has been modified. Tom Tango, Senior Database Architect of Stats for MLBAM, reminded me that the equation converting HR distance from coordinate units to feet should not have a y-intercept—a ball that travels zero coordinate units must also travel zero feet. It was an oversight on my part that I’m happy to correct.


For a couple of years now, baseball fans have enjoyed publicly-available Statcast data for the MLB level. This data allows us to examine the exit velocity, launch angle, estimated distance and countless other aspects of every batted ball. This data has also resulted in “expected” stats, very useful additions to the toolbox of any baseball fanalyst. While this data is collected at the minor league level as well, it is not made publicly-available, leaving us with a more limited toolbox when evaluating prospects via statistics.


Fortunately, one piece of Statcast-adjacent MiLB data is publicly-available. MLB’s Prospect site includes a search engine for MiLB statistics. For each batted ball, the site reports two numbers: “hc_x” and “hc_y”, the “hit coordinates” of each batted ball. These coordinates appear to tell us the point on the field where the batted ball hit the ground or was caught. Using these hit coordinates, we can estimate (with some accuracy) the distance of home runs hit at the MiLB level.


Below are the hit coordinates of every batted ball by the Toronto Blue Jays in 2018 at the MLB level.



The picture of a baseball diamond becomes even clearer if we multiply each hc_y by -1, flipping the image about the horizontal axis.



The first step in estimating a home run’s distance is to establish the coordinates of home plate. I was unable to find any official coordinates, so I worked on establishing them myself. The most accurate method I came up with was finding a batted ball that landed on (or very, very close to) home plate. Kevin Pillar was the lucky winner. On September 4th, at home against the Rays, Pillar hit a ball that landed just beside home plate. Its hit coordinates were (126.57, 204.1). I found a few other similar examples. In each case, the hc_x was just above or below 126 and the hc_y was just above or below 204, so I opted to use the coordinates (126, 204) for home plate.



The second step in estimating a home run’s distance is to calculate the distance (in as-yet-meaningless coordinate units) between home plate and the hit coordinates of a given home run. This requires using the Pythagorean Theorem (which is a lot more fun to use, unforced, as an adult). The equation for estimated home run distance (in terms of coordinate units) is:

Calculated HR Distance (coordinate units) =

((hc_x − 126)2 + (204 − hc_y)2)0.5


The image below, capturing Justin Smoak’s grand slam off of the Yankees’ David Robertson, helps illustrate the equation above. There were a total of 5,319 over-the-fence home runs hit in 2018 for which Statcast had hit coordinates and hit distance (out of 5,571 over-the-fence homers altogether). For each of these home runs, I inputted the hit coordinates into the equation above. The calculated distances of these home runs ranged from 140 to 210. Smoak’s grand slam had hit coordinates of (89.47, 27.86). Inputting these into the equation above gives us a calculated distance (in terms of coordinate units) of 179.9.



The third and final step in estimating a home run’s distance is to convert the distances calculated above into feet. To do so, I plotted the calculated distances for each home run against the distances given by Statcast. The resulting graph suggests that there is a very strong relationship between the two, with the calculated distances explaining about 87% of the variation in Statcast’s distances.



The trendline equation can be used to convert the calculated distances into feet:

Calculated HR Distance (feet) = 2.29 x Calculated HR Distance (coordinate units)


The resulting distances seem quite similar to the Statcast distances. The calculated distances have a mean of 396 feet, a median of 397 feet, a minimum of 322 feet and a maximum of 481 feet. The Statcast distances have a mean of 397 feet, a median of 398 feet, a minimum of 324 feet and a maximum of 481 feet.


In general, there is no more than a small difference between a given home run’s calculated and Statcast distances. For 79.8% of home runs, the difference is less than one foot. The difference is less than five feet for 83.5% of the home runs, less than ten feet for 87.8% of them and less than 25 feet for 95.9% of them.


While these estimates are imperfect, they seem fairly reliable in the vast majority of cases. Smoak’s grand slam is a particularly good case to highlight. Inputting 179.9 coordinate units into the equation above gives us an estimated HR distance of 412 feet, exactly the same as the Statcast estimate.


Combining the two equations above give us a handy tool that can be used to convert hit coordinates of home runs in the MiLB dataset into estimated distances (in feet):

Calculated HR Distance (feet) = 2.29 x ((hc_x − 126)2 + (204 − hc_y)2)0.5


It’s necessary to highlight some important caveats. The first is a simple reminder of what I’ve said throughout this post: the estimated home run distances come with a margin of error. Be conscious of that when using the equation above. The main purpose I have for these estimates is to examine the number of 400 foot-plus home runs hit by various minor leaguers—without more data on batted balls at the MiLB level, this seems like a good proxy for barrels. Plus, with a high threshold, I can be confident that a home run estimated to travel 400 feet did indeed go quite far—95% of the MLB home runs with a calculated distance of 400+ feet had a Statcast distance of 400+ feet, while 99.5% had a Statcast distance of 375+ feet.


The second caveat is that, given the lack of Statcast data on home run distances at the MiLB level, I am unable to test for the accuracy of the MiLB estimates (at least in the exact way that I tested the MLB estimates). That said, the calculated distances at the MiLB and MLB levels seem to jibe well. For example, in terms of both mean and median, the MiLB estimates were about ten feet shorter than the MLB estimates, which makes sense given the shorter fences in some MiLB parks.


The MiLB estimates also seem reliable because they describe the world as we know it. Vladimir Guerrero Jr. ranked among the MiLB leaders in 2018 with 11 homers estimated to travel at least 400 feet, representing 2.8% of his plate appearances across the MiLB levels he played at. Ditto for other big power prospects, like Kyle Tucker (3.2%), Dylan Cozens (3.2%), Bobby Dalbec (2.9%) and Eloy Jimenez (2.0%).


Former MLB slugger Chris Carter is also a good example of the reliability of this metric. He spent all of 2018 at the Triple-A level, between the Angels’ and Twins’ systems. Over 312 PA, Carter hit 13 homers that have an estimated distance of 400+ feet. His 4.2% long homer rate was good for third across the minors. Back in 2016, his last full season in the majors, he hit 28 homers that travelled at least 400 feet, 4.3% of his plate appearances that year.


Ideally, the estimated distances of MiLB home runs could be checked against the actual distances found using Statcast. Obviously, though, if we had those Statcast distances, this entire post would be unnecessary. When an educated guess is one’s only option, a leap of faith of some degree is a necessary cost.


The third caveat is that this equation is based on 2018 data. When this exercise is replicated using MLB data from 2015, 2016 and 2017 (separately), the calculated HR distances aren’t as tightly correlated to the Statcast HR distances. The correlation is particularly weak in 2015 (R2 of 0.30) and 2016 (R2 of 0.41). In 2017, the calculated HR distances explain about 68% of the variation in Statcast’s HR distances (75% if five particularly weird cases, out of 5,855, are excluded). As such, for best results, it seems wise to limit use of the equation above to 2018 MiLB data.


Let’s end by doing exactly that, highlighting minor leaguers who excelled at mashing (what seem likely to be) particularly long dingers in 2018. For context, hitting a 400+ foot homer in 0.5% of one’s plate appearances usually puts a player around the 50th percentile for their level.


At Triple-A, Jabari Blash led the way with 22 homers estimated to have traveled at least 400 feet (6.4% of his total PA). Blash also led among the MiLB altogether. Among prospects of note at the level, with at least 200 PA, Tyler O’Neill (4.4%), Franmil Reyes (3.6%) and Kyle Tucker (3.2%) each acquitted themselves very well.


At Double-A, Vladimir Guerrero Jr. hit nine 400+ foot homers (3.4%), narrowly edged out by Peter O’Brien (3.5%) for tops at the level (min. 200 PA). Fellow uber-prospect Eloy Jimenez wasn’t far behind, producing a long dinger in 2.6% of his PA. Also among the leaders were prospects Austin Hays (2.1%), Peter Alonso (1.8%), Brendan Rodgers (1.7%) and Monte Harrison (1.7%). While he wasn’t among the league leaders, Cavan Biggio (1.1%) hit 400+ foot homers at a well above-average rate.


At High-A, Roberto Ramos and Ibandel Isabel share top honours—Ramos produced the highest rate of big homers (5.5%), while Isabel produced the highest number (22). Jo Adell stood out by hitting a long bomb in 2.3% of his PA, as a 19 year old. Only one other teenager, Christian Pache (1.0%), cracked one percent at the level. Kevin Smith, a big riser on top prospect lists in 2018, also stood out, producing a 400+ foot homer in 1.6% of his PA.


At Low-A, Seuly Matias (2.4%) was a standout masher, hitting ten 400+ foot homers at 19 years old and leading in terms of rate (2.7%). The level’s absolute leader was Casey Golden, who cracked 400 feet on 13 occasions (2.5%). The Blue Jays system stands out at this level, with trade deadline acquisitions Chad Spanberger (1.7%) and Demi Orimoloye (1.6%) joining Ryan Noda (1.1%) and Brock Lundquist (1.0%) in the top fifteen percent of batters.


At the Short Season-A level, Sean Reynolds was the runaway leader, with 13 bombs of 400+ feet (4.1%). Behind him is recent Blue Jays draftee, Griffin Conine, who hit six (2.6%). Joey Bart, a 2018 first round pick, also flashed his power with four long homers (2.0%).


At the Advanced Rookie level, 18 year old Jeremiah Jackson led by hitting long homers in 5% of his PA. Ronny Brito was another young standout, with eight 400+ foot homers this season (3.3%). First rounder Nolan Gorman hit a bomb in 2.4% of his PA, while former Brave Kevin Maitan did so in 2.1% of his PA. Wander Franco, 17 year old wunderkind, also impressed, with five 400+ foot homers (1.8%).


Ultimately, this sort of tool can be applied in a number of ways. One potential use is to find prospects with more power potential than their top-line stats suggest. The recently-traded Jeter Downs seems like a good example. At 19, he was a bit young for his level (Low-A), but produced well overall (118 wRC+). He walked (9.9%) and struck out (19.7%) at slightly better-than-average rates and ran an average BABIP (.306). His ISO (.145) was solid too, ranking in the 66th percentile among batters with 200+ PA at the level in 2018. However, he hit an impressive seven home runs with an estimated distance of 400+ feet, accounting for 1.3% of his plate appearances (93rd percentile), suggesting that his power ceiling might be much higher than just above-average.


Hopefully, in the very near future, some Statcast data for the MiLB level will be made available to the public. Until then, this approach seems like a useful workaround.






Featured Image Credit: R Widrig- JFtC






Related Posts