Earlier I posted about Basim, a baseball simulator I designed that simulates baseball games based on the stats of people in the lineups. My long term project has been to design a stat eWAR, empirical wins above replacement, that attempts to determine how good offensively a baseball player is by seeing how many runs lineups including them produce.
Recently, I put up a widget on the top right of the site that runs Basim simulations live every time you load the page. It randomly chooses two 2011 mlb teams from the same league and plays a game between them, the same way I simulate games to calculate eWAR.
I'll continue to work on it--eventually I'd like to set up a simulated baseball season, shown live on the blog--but for now I'm interested in feedback on how to improve it.
Known things that could be improved:
1) Groundouts and flyouts could be treated differently, with batters replacing runners on first on groundouts that don't turn into double plays.
2) The batting orders, roughly speaking, are the most frequently used lineups from the team from 2011, but this can have some weird consequences, like not using someone who is generally in the lineup because they switch spots frequently enough that no single lineup involving them has been used very much. So, some of the orders that I'm using aren't very good. To that extent, if you see a lineup that is not very representative of a team, tell me and I can change it. (There are also potential oddities where the lineup lists I'm reading from only use last names, so, for example, for a while the Pirates were using Donald McCutchen in their lineup instead of Andrew McCutchen.)
3) Right now people are always thrown out trying to take an extra base on 3% of balls put in play; if someone could find better player-by-player data for this, that'd be awesome.
Anyway, feel free to shoot me any comments you have. And in the mean time, I hope you spend as much time staring at text-based baseball games as I have.
Addendum: if you know a good way to synthesize pitcher stats with hitter stats to predict an at bat, I'm all ears; right now the approximation I'm thinking of using is to add up the deviations from average of the two statistics, though of course in the end what I really want to do is do a regression....