Thursday, July 5, 2012

Baseball Game Simulation details

Earlier I wrote some early results from a baseball game simulator I wrote.  Here I'll describe exactly how the simulator works.

As an input the simulator takes a lineup of nine players, along with the following stats for each player: plate appearances, walks+IBB+hit by pitch, singles, doubles, triples, homers, strikeouts, ground into double play rate, stolen base opportunities, stolen bases, caught stealing, number of times reached on error, and extra bases taken percentage; all stats were taken from

The simulator then simulates an entire (offensive) game from that lineup.

Each time a batter steps to the plate, the following things are simulated:

1) The play that the batter creates is randomly chosen based on how frequently they hit into that type of play.  The options are single, double, triple, home run, walk, strikeout, ground into double play opportunity, other out.  Reached on error is treated as a single, and intentional walks and hit by pitches are treated as walks.

2) Baserunners (sometimes) try to advance.  Each baserunner other than the batter attempts to take an extra base with probability governed by their extra bases taken percentage, and if they try to advance they are thrown out with 3% probability.  Baserunners try to advance on all base hits and all outs other than strikeouts.

3) If it was a ground into double play opportunity, then if there was a runner on first a double play is executed.  Otherwise it's just a normal groundout.

4) After each plate appearance, if there are runners on base with the next base empty they attempt to steal with probability (sb+cs)/sbo, and are thrown out with the appropriate frequency.  Runners do not try to steal third with zero or two outs, and they do not try to steal home.

Using this method an inning is simulated, and the runs scored, along with the next batter up, are returned; the simulator then clears all bases and outs, and starts simulating the next inning.

Thoughts on how to improve it?  Right now the frequent bunting of pitchers is only implicitly counter for in their high number of outs but low number of strikeouts and ground into double plays, which will advance runners a good deal (but not as much as they should).


  1. I suppose this would be slightly difficult to implement due to the extra data you would have to incorporate (as well as a lower sample size), but do you think there's any merit to tweaking probabilities based on the situation of the game (# outs, RISP, etc.)?

  2. I've thought about that a bit. An argument against is that in baseball, as opposed to e.g. football, players don't have that much control over how they act; it's hard for someone to choose to hit for average or power that much, and even harder to hit "better" when it's more important. On the other hand, it would be interesting to actually see the extent to which that's true empirically.

    In the end I think it's lower down on my list than running simulations on more teams to get more robust data, and on trying to come up with an empirical equivalent of WAR based on something like the number of runs scored by a lineup composed of 9 of a player.