Thursday, June 28, 2012

Batting Order and Simulating Games

Welcome to Measuring Shadows, a blog about thoughts, games, and ethics.  We will post about more serious subjects later, but first a foray into Baseball batting orders.


Whether or not batting orders significantly impact a Baseball offense is an often asked question, and similar treatments to mine can be found all over the web, for instance here and here.  To my knowledge, though, mine is slightly more thorough than others I've found.

My strategy is to take a possible lineup of ten players, look at the frequency with which they create different hits, and simulate entire baseball games based on those, recording the number of runs the offense scores; averaging runs scored over many (~100,000) simulations gives an estimate of the strength of an offense.

My test case has been this year's Giants lineup: Gregor Blanco, Ryan Theriot, Melkey Cabrera, Buster Posey, Angel Pagan, Pablo Sandoval, Brandon Belt, Brandon Crawford, and Barry Zito (chosen as a pitcher with roughly average batting stats).  In future posts I'll write about exactly what my method was and extensions, but to start off with, some results:

1) An average (i.e. random) ordering of those nine players creates about 3.78 runs/game.

2) The Giants' typical batting order creates roughly 3.82 runs/game.

3) The best performing lineups in the simulation create roughly 3.99 runs per game.  In particular, the following lineup performed best in my simulation: Theriot, Sandoval, Blanco, Belt, Cabrera, Posey, Pagan, Crawford, Zito; it created 3.992 runs/game on average (over 200,000 trials).

Due to sample size issues there might be another lineup which performs slightly better, but I think that 3.99 runs/game is a reasonable value for the best the Giants can do.  Taking the canonical value of 10 extra runs creating one win, the Giants could get roughly 2.75 more wins in the season, or roughly .015 extra winning percentage, by changing their batting order.  This isn't a huge difference, but given that most winning percentages are between .400 and .600, the difference between .550 and .565  is non trivial.

This kind of calculation is kind of cool, but perhaps a  more important application of these simulations is in valuing players: one thing that can be done is to take a lineup composed of nine of a given player and see how many runs it scores.  How well does this predict how "good" a player is?  Which has a stronger correlation with a team's number of runs scored: the (weighted) sum of the runs created per game (using these simulations) of each of its players, or the sum of the WAR of its players?  I'll look at these questions in a future post.



  1. Sam
    Most research I've seen has found very different results. The most notable are that 1) a good hitter should hit first (1 of the top 3, most often the best) 2) average hitters hitting 3 (usually the 5th best hitter) 3) elite player with power hitting 4 and 4) pitchers should hit 8 with an bop guy hitting 9

    Can you speak to why your results may have been so different to the conventional wisdom among the saber metric community?

    Josh Arfin

  2. Hey Josh,

    I ran the simulation on all 9! permutations of lineups, so I got a lot of lineups close to the top; also, I did it by first running only a relatively small (250) number of runs on each lineup and then running a lot more on the ones that performed the best, so it's possible I'm missing some. That being said, it seems like about half the lineups in the top had the pitcher batting 8th and half batting 9th. Frequently batting first were Theriot, Belt, and Blanco; curiously, Cabrera wasn't (I'd expected him to, given that he he was a) good and b) had a high batting average, but I guess his relatively lack of walks hurt him).

    I'll have to think about it some more to figure out exactly what's going on, but one other thing I'd note is that the Giants don't really have a best player this year; instead they have six roughly equivalent players and three worse ones (Theriot, Crawford, and Zito).

    I'll also post the list of lineups that did best soon; if you don't mind the ugly format, here are the ten best lineups, listed in the form (lineup, avg # runs scored).

    [(['Brandon Belt', 'Brandon Crawford', 'Angel Pagan', 'Melkey Cabrera', 'Gregor Blanco', 'Pablo Sandoval', 'Buster Posey', 'Ryan Theriot', 'Barry Zito'], 3.9178799999999998), (['Brandon Belt', 'Gregor Blanco', 'Buster Posey', 'Melkey Cabrera', 'Pablo Sandoval', 'Angel Pagan', 'Brandon Crawford', 'Ryan Theriot', 'Barry Zito'], 3.9418299999999999), (['Pablo Sandoval', 'Buster Posey', 'Brandon Belt', 'Angel Pagan', 'Gregor Blanco', 'Brandon Crawford', 'Melkey Cabrera', 'Ryan Theriot', 'Barry Zito'], 3.9466899999999998), (['Ryan Theriot', 'Brandon Belt', 'Gregor Blanco', 'Melkey Cabrera', 'Angel Pagan', 'Pablo Sandoval', 'Brandon Crawford', 'Buster Posey', 'Barry Zito'], 3.9525899999999998), (['Brandon Belt', 'Pablo Sandoval', 'Melkey Cabrera', 'Angel Pagan', 'Buster Posey', 'Gregor Blanco', 'Ryan Theriot', 'Brandon Crawford', 'Barry Zito'], 3.95295), (['Buster Posey', 'Brandon Belt', 'Melkey Cabrera', 'Gregor Blanco', 'Pablo Sandoval', 'Angel Pagan', 'Ryan Theriot', 'Brandon Crawford', 'Barry Zito'], 3.9538099999999998), (['Brandon Belt', 'Buster Posey', 'Pablo Sandoval', 'Gregor Blanco', 'Angel Pagan', 'Melkey Cabrera', 'Ryan Theriot', 'Barry Zito', 'Brandon Crawford'], 3.95512), (['Gregor Blanco', 'Brandon Belt', 'Buster Posey', 'Melkey Cabrera', 'Pablo Sandoval', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito', 'Ryan Theriot'], 3.95661), (['Gregor Blanco', 'Brandon Belt', 'Pablo Sandoval', 'Angel Pagan', 'Melkey Cabrera', 'Brandon Crawford', 'Buster Posey', 'Ryan Theriot', 'Barry Zito'], 3.96306), (['Ryan Theriot', 'Pablo Sandoval', 'Gregor Blanco', 'Brandon Belt', 'Melkey Cabrera', 'Buster Posey', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito'], 3.98292)]

  3. (Those were done with 100,000 runs; running the last one for longer gave a truer average of 3.9922599999999999.)

  4. By the way, lining up the hitters from best to worst gives 3.94 runs/game, about about 70% of the way from the 3.82 current lineup to the 3.99 best lineup from the simulation.

  5. Bug found in stolen base code, running new version...

  6. They became mandatory in MLB in 1971. However, they had been in use for several years before the rule. In the 1950s and 1960s, many players batted without outer helmets, but used the Dodger-style plastic inserts inside their baseball caps.

    Baseball Batting Helmets
    Football Training Equipment