Sunday, July 29, 2012

Basim Live: Like baseball, but without the commercial breaks

Earlier I posted about Basim, a baseball simulator I designed that simulates baseball games based on the stats of people in the lineups.  My long term project has been to design a stat eWAR, empirical wins above replacement, that attempts to determine how good offensively a baseball player is by seeing how many runs lineups including them produce.

Recently, I put up a widget on the top right of the site that runs Basim simulations live every time you load the  page.  It randomly chooses two 2011 mlb teams from the same league and plays a game between them, the same way I simulate games to calculate eWAR.

I'll continue to work on it--eventually I'd like to set up a simulated baseball season, shown live on the blog--but for now I'm interested in feedback on how to improve it.

Known things that could be improved:

1) Groundouts and flyouts could be treated differently, with batters replacing runners on first on groundouts that don't turn into double plays.

2) The batting orders, roughly speaking, are the most frequently used lineups from the team from 2011, but this can have some weird consequences, like not using someone who is generally in the lineup because they switch spots frequently enough that no single lineup involving them has been used very much.  So, some of the orders that I'm using aren't very good.  To that extent, if you see a lineup that is not very representative of a team, tell me and I can change it.  (There are also potential oddities where the lineup lists I'm reading from only use last names, so, for example, for a while the Pirates were using Donald McCutchen in their lineup instead of Andrew McCutchen.)

3) Right now people are always thrown out trying to take an extra base on 3% of balls put in play; if someone could find better player-by-player data for this, that'd be awesome.

Anyway, feel free to shoot me any comments you have.  And in the mean time, I hope you spend as much time staring at text-based baseball games as I have.

Addendum: if you know a good way to synthesize pitcher stats with hitter stats to predict an at bat, I'm all ears; right now the approximation I'm thinking of using is to add up the deviations from average of the two statistics, though of course  in the end what I really want to do is do a regression....


  1. Would there be any way to employ neutralized batting statistics (available on Baseball Reference) in a project like this? I'm not sure if this would make your life easier or harder.

    1. Hey,

      Yup, good point; it would definitely be possible to do--it's just a matter of which giants csv table to copy paste. I decided to go with the raw stats, for now at least, because if I'm trying to use team runs scored as a measure of how well my simulator is working ( I'm going to want the raw numbers, but after having done that it might be worth switching to neutralized stats for individual-level statistics involving inter-year comparisons.