Measuring Shadows: baseball

Showing posts with label baseball. Show all posts

Sunday, December 23, 2012

Less Stupid Use of Pitchers: Pitcher Fatigue

A while ago I wrote a post about one of the most unenlightened areas of baseball strategy: the use of pitchers. I proposed eliminating the distinction between starting pitchers, middle relievers, and closers in favor of a system that just uses a set of pitchers, each pitching different total numbers of innings, but no single pitcher pitching more than a few innings in a game; in other words, a starter would now throw two innings every few games instead of seven innings every five games.

The advantages of this, as I see it, are four fold.

1) If you're an NL team, you can pinch hit for your pitchers whenever they come up.

2) Pitchers don't have to throw 100 pitches in a game.

3) Batters never get to see the same pitcher twice in a game, and so can't get used to their pitches.

4) You can get the pitcher-batter match-ups you want all the time, instead of being stuck with your same pitcher the first three times through the lineup.

In the first post I estimated the size of effect (1): pinch hitting for you pitcher every time would let you score about 0.2 more runs per game, translating into about 3.2 wins per season (the difference between a .500 team and a .520 team).

Now I'm going to look at effects (2) and (3).

Checking in on Tim Lincecum

A month ago I wrote an article looking at changes in San Francisco Giants' starting pitcher Tim Lincecum's pitch command and velocity this season. In particular, I found that the velocity on his fastball and slider had decreased by about a mile per hour, and that the average distance of his pitches from the edge of the strikezone had increased--assuming that, generally, pitches near the edge of the strikezone are better than those in the middle or nowhere near it.

Since the all-star break, though, Lincecum's ERA, at least, has been a respectable 3.66. Have his command and speed improved as well?

It turns out that his speed is the same as earlier this year, with a fastball averaging around 90.4 mph, and about a mile per hour slower than last year. So, no improvement on that front.

His average distance from the edge of the strikezone, on the other hand, has gotten a bit better--it's been at .940*, as compared to .923 last year and .961 for the first half of 2012**.

So, long story short, there are some signs that his pitching might be picking up but nothing conclusive.

By the way, I'll be announcing results from the second contest and introducing the third one in the next day or two.

_________________________________________________________________________________
The value for the second half of this year is only statistically significantly different from the value from last year, and from the value for the first half of this year, at the p= .30 level--so the jury's out on this one.

Monday, August 13, 2012

Checking in on the Giants' lineup

Earlier a wrote a few posts on what the SF Giants' optimal starting lineup should be using Basim, a baseball simulator I wrote. A lot has changed since then, though--Posey has become much better, Blanco and Pagan have cooled off, and Hunter Pence and Marco Scutaro have joined the club. So, what should the Giants' lineup look like now? What lineup do I hope they start tonight?

First off, here was my guess at their best lineup (once again assumin Zito is the pitcher):

1. Buster Posey
2. Brandon Belt
3. Melky Cabrera
4. Pablo Sandoval
5. Hunter Pence
6. Marco Scutaro
7. Angel Pagan
8. Brandon Crawford
9. Barry Zito

Running this lineup through the simulator*, it scored an average of 4.03 runs per game.

I then found that a random lineup (i.e. random ordering of the nine players) scored about 3.89 runs per game. The lineups that the simulator liked the best generally had Posey, Belt, Cabrera, or Pence batting leadoff, which is unsurprising--each has either a high OBP or a high ground into double play rate that would be very painful in the heart of the order. The single lineup that the simulator liked best** was the following:

1. Buster Posey
2. Angel Pagan
3. Hunter Pence
4. Brandon Belt
5. Melky Cabrera
6. Marco Scutaro
7. Pablo Sandoval
8. Brandon Crawford
9. Barry Zito

It scored an average of about 4.035 runs per game***.

I then looked at two lineups that were close to what I predict the Giants will run; they differ only in whether the Giants play Theriot or Crawford; the lineup Pagan, Scutaro, Cabrera, Posey Sandoval, Pence, Belt Theriot, Zito scored an average of 3.96 runs per game, while the same lineup but with Crawford batting for Theriot scored 4.00 runs per game on average. So, it seems like about half the difference between my lineup and the one with Theriot just comes fromt the fact that Craword is better than Theriot.

Anyway, here's to hoping the Giants will do something smart.

_________________________________________________________________________________
*: For Pence and Scutaro I used their pre-Giants numbers.

**: This should be taken with a grain of salt--to actually find the best would take days of simulation; treat this as a lineup that is pretty close to the best.

***: FWIW, the ten best starting lineups, according to the simulatr (with the same caveat as **), in the form (lineup, average runs scored by lineup per game): [(['Buster Posey', 'Melky Cabrera', 'Brandon Belt', 'Marco Scutaro', 'Hunter Pence', 'Pablo Sandoval', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito'], 4.0152400000000004), (['Brandon Belt', 'Buster Posey', 'Pablo Sandoval', 'Melky Cabrera', 'Hunter Pence', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito', 'Marco Scutaro'], 4.0168675), (['Melky Cabrera', 'Buster Posey', 'Brandon Belt', 'Marco Scutaro', 'Pablo Sandoval', 'Brandon Crawford', 'Angel Pagan', 'Hunter Pence', 'Barry Zito'], 4.0179150000000003), (['Brandon Belt', 'Buster Posey', 'Pablo Sandoval', 'Melky Cabrera', 'Hunter Pence', 'Brandon Crawford', 'Angel Pagan', 'Marco Scutaro', 'Barry Zito'], 4.0218325000000004), (['Melky Cabrera', 'Brandon Belt', 'Marco Scutaro', 'Pablo Sandoval', 'Angel Pagan', 'Hunter Pence', 'Buster Posey', 'Brandon Crawford', 'Barry Zito'], 4.0219899999999997), (['Hunter Pence', 'Marco Scutaro', 'Buster Posey', 'Pablo Sandoval', 'Brandon Belt', 'Melky Cabrera', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito'], 4.0253224999999997), (['Brandon Belt', 'Hunter Pence', 'Pablo Sandoval', 'Angel Pagan', 'Melky Cabrera', 'Brandon Crawford', 'Marco Scutaro', 'Buster Posey', 'Barry Zito'], 4.0315525000000001), (['Hunter Pence', 'Angel Pagan', 'Brandon Belt', 'Melky Cabrera', 'Buster Posey', 'Pablo Sandoval', 'Marco Scutaro', 'Brandon Crawford', 'Barry Zito'], 4.0322525000000002), (['Marco Scutaro', 'Pablo Sandoval', 'Brandon Belt', 'Melky Cabrera', 'Buster Posey', 'Hunter Pence', 'Brandon Crawford', 'Angel Pagan', 'Barry Zito'], 4.0353874999999997), (['Buster Posey', 'Angel Pagan', 'Hunter Pence', 'Brandon Belt', 'Melky Cabrera', 'Marco Scutaro', 'Pablo Sandoval', 'Brandon Crawford', 'Barry Zito'], 4.0365225000000002)]

Thursday, August 9, 2012

Traditionball: the most unenlightened area of baseball strategy

About ten years ago, baseball started to undergo a statistical revolution: youth became valued, OPS was born, and walks finally became valued. Fast forward a decade and OPS is now a mainstream stat, multiple sites are constructing competing ways to summarize the total value of a player, and even in baseball clubhouses sabermetrics are the new cool kid on the block.

But there are still a few areas of baseball strategy stuck in the dark ages of gut instincts and wild speculation, and chief among them is use of pitchers.

Right now it baseball there are three types of pitchers: starters, relievers, and closers. Starters come in to pitch the start of the game, stay in for at least five innings, and are eventually taken out. They pitch every five days. Closers come in in the ninth inning with a lead of between one and three runs. They never pitch more than an inning, and never come in otherwise. Middle relievers pitch in between starters and closers.

These roles bear an uncanny resemblance to two of the stupidest pitching statistics, wins and saves.

This system is, of course, not close to optimal. Frequent pitching changes at the beginning of the game would allow a manager to get better matchups, keep pitchers fresher from stopping them from having to throw too many pitches in one day, and allow pitchers to throw however many pitches is best for them--not a bimodal distribution with centers at fifteen and one hundred.

It would also give an NL team another advantage--they could always pinch hit for their pitchers (or at least as long as it wasn't a two out, none on situation).

I'll look at the first effects in a later post, but for now, how much would always pinch hitting help?

Well, first I found the number of runs scored by an average NL lineup from 2011 using Basim; it was 3.799.

Then, I substituted the average substitute player for the league in for the ninth spot in the lineup; Basim then simulated it and found an average of 4.006 runs per game.

That's roughly a 3.2 win difference right there--the difference between a .500 team and a .520 team.

It's true, of course, that implementing such a system could incite a revolt from pitchers--but it seems like there is too much to be gained for it to be worth ignoring as a manager.

Tuesday, August 7, 2012

Results from the First Contest

A week ago I proposed a contest for readers to construct the best possible lineup from the 2002 Giants roster; the rules are here and here. The winner of the contest gets 3 Shadow-points; second place gets 2; and third place gets 1. In addition everyone whose entry beats my own entry gets an additional Shadow-point. Every two months the person with the most Shadow points gets their name on in the Shadow Hall of Fame, $2, and the ability to write any one article for the blog.

-----------------------------------------------

Today, I'm announcing the winnners of the contest. The average submission scored somewhere around 4.9 runs per game, and the best scored above 5. I have more comments to make, but without further ado, the three winners:

Last Chance for Contest

A week ago, I introduced a contest to design the best lineup from the 2002 Giants roster; details are here and here. Submissions for the contest are due tonight, so if you want to participate but haven't given me a lineup, either post it as a comment here or email it to me by 11:59 tonight.

I'll announce the results of the contests sometime tomorrow.

Also, I've run some more simulations with Basim on the 2000-2011 season; it looks like the correlation between RAA* and runs scored by a team is .963; without very many simulations (20,000 per player) the correlation between eWAA and runs scored is .952, but that number should go up with more simulations as the noise goes down (due to limited computing power it's taking a while to get a fuller result).

Also, if anyone has a suggestion for what I should write on (baseball, philosophy, or anything else), let me know.

_________________________________________________________________________________

RAA, runs above average, is the baseline offensive stat used to construct WAR; I'm using a modified version that removes ballpark advantage, etc. to do an apples-to-apples comparison.

Friday, August 3, 2012

Examining eWAA: The Fifteen Best Players of the Decade

A while ago I wrote a python program, Basim, that simulates baseball games, and used to construct a statistic for the offensive output of a player: eWAA. Also, check the bottom of the post for a few notes on the contest of the week.

----------------------------------

I've run a Basim simulation on all player-years from 2000-2011, inclusive, and used it to calculate the eWAA (empirical wins above average) for all player-seasons; think of this as the number of extra wins a team would be expected to get in the season if they replaced an average player with the given player. Below, I've listed something close to the best 15 player-seasons in the 2000-2011 period. I say "something close to" because, due to lack of available computer power, I haven't run enough simulations to get a stable result; so, the numbers below should be taken with a standard deviation due to limited simulations of something like 0.33 eWAA. (Once I've run enough simulations I'll do a more in-depth look at eWAA, including its predictive power.) For fun, I also put the spot in the batting order that Basim thought they should hit that year*.

Name	eWAA	Season	Best Spot
Barry Bonds	12.22	2001	2
Barry Bonds	12.03	2004	2
Barry Bonds	11.35	2002	2
Todd Helton	9.11	2000	2
Sammy Sosa	9.03	2001	2
Barry Bonds	8.91	2003	1
Luis Gonzalez	8.44	2001	4
Alex Rodriguez	8.35	2007	4
Albert Pujols	8.16	2003	1
Todd Helton	8.13	2004	1
Albert Pujols	8.04	2009	2
Todd Helton	8.02	2004	3
Jose Bautista	7.59	2011	2
Albert Pujols	7.51	2004	1
Jason Giambi	7.41	2001	4

I then computed the best total eWAA throughout the period (summed over the years 2000-2011); if you want, the best offensive players (again, with roughly a 2.5% error) of the decade:

Name	eWAA
Albert Pujols	69.2
Barry Bonds	60.5
Alex Rodriguez	60.3
Todd Helton	54.0
Lance Berkman	46.3
Manny Ramirez	46.2
Chipper Jones	44.5
Bobby Abreu	39.8
Vladimir Guerrero	39.7
Jim Thome	39.2
Jason Giambi	36.0
Carlos Beltran	34.4
Miguel Cabrera	33.3
Brian Giles	32.9
Gary Sheffield	32.5

My initial reactions:

1) Bonds was, in fact, really good. Not only did he have the best seasons by far, but he has the second highest total eWAA despite retiring halfway through the period.

2) Pujols, unsurprisingly, is the man of the decade; he's been an mvp-level player for most of the years.

3) Coors field is really friendly. (Also, Helton is really good.)

4) Berkman, Abreu, Thome, Beltran, and Giles are really underrated.

-------------------------------

A few notes on the contest of the week (a contest to create the best lineup you can from the 2002 Giants roster; see here for details).

1) For fun, I'm going to create a lineup submission of my own; it won't be in the contest, but in addition to the normal Shadow-point payout, I'm giving one extra Shadow-point to everyone who submits a lineup that does better than mine.

2) To clarity, the numbers I'll be drawing the stats from are the ones listed here; they are the players stats from their time on the Giants in 2002.

3) So far there are 12 entries, all of which are unique.

4) I'm not going to show my lineup until I run the results, but as a teaser I'll say that there's one thing I think a lot of people are forgetting to think about.

Anyway, good luck in the contest; submissions are open until this coming Monday, August 6th at 11:59 pm.

_________________________________________________________________________________
*: I suspect that right now Basim is biasing too much toward having good hitters hit first and bad hitters hit last when calculating eWAA; I'll change that soon.

Tuesday, July 31, 2012

Introducing the Contest of the Week: What's Your Lineup?

What's the best lineup, order included, that can be made from the 2002 San Francisco Giants? Think you can come up with a better one than other readers?

---------------------------------------------------

Today I'm introducing a new feature of the blog: the contest of the week. Each week, I'll announce a contest to readers. Each week, the winner of the contest gets three Shadow-points, second place gets two Shadow-points, and third place gets one Shadow-point. Every two months, the person with the most Shadow-points in the period gets their name put on a Hall of Fame widget to the right of the blog (I'll create it once it's needed), a wallet-busting $2 prize, and the opportunity, if they want, to write a guest article for the blog.

So, on to the first contest:

What's Your Lineup?

I wrote Basim, a python script that simulates baseball games based on the stats of the people in the lineups. The details of how it works are in the link above; you can also see Basim simulating games live on the right of the blog. I've recently been looking into evaluating players with it, but this contest has to do with the original use of Basim: evaluating batting orders.

So, the first contest is to construct the best batting order from the 2002 San Francisco Giants roster. To enter the contest, submit a lineup from their players; I'll run Basim on all submissions, and the three highest average runs per game are the winners.

Rules:

1) The players you can draw from are listed here.

2) You can only use players who had at least 100 plate appearances with the Giants that year.

3) Your lineup must be defensively valid. That is to say you must have a first baseman, second baseman, etc. Shortstops and second basemen are considered interchangeable, and all outfielders and first basemen are also interchangeable. Third basemen and catchers can play first base, but not vice versa. The position that a person can play, up to interchangeability, is the one listed here. (Technical note: Dunston can play 2B, SS, OF, and 1B, in case you want to play him for some reason).

4) Your pitcher (in your lineup) must be Jason Schmidt.

5) I will run 1,000,000 simulations on each submitted lineup to find its average runs scored per game. The highest value will win and get three Shadow-points; second will get two, and third will get one.

So, for example, a submission might look like:

1. Benito Santiago (C)

2. J. T. Snow (LF)

3. Jeff Kent (SS)

4. Kenny Lofton (1B)

5. Barry Bonds (CF)

6. Jason Schmidt (P)

7. David Bell (3B)

8. Ramon Martinez (2B)

9. Marvin Bernard (RF)

Some (quite obvious) things to think about: where do you put Bonds? What do you think of stolen bases? What do you want in a leadoff hitter? Where do you put the pitcher?

Submissions are due by Monday, August 6th; I'll announce winners the next day. To submit a lineup, either email it to me (sambf at mit dot edu), or post it as a comment on this post. (Make sure to include your name.)

Good luck!

Sunday, July 29, 2012

Basim Live: Like baseball, but without the commercial breaks

Earlier I posted about Basim, a baseball simulator I designed that simulates baseball games based on the stats of people in the lineups. My long term project has been to design a stat eWAR, empirical wins above replacement, that attempts to determine how good offensively a baseball player is by seeing how many runs lineups including them produce.

Recently, I put up a widget on the top right of the site that runs Basim simulations live every time you load the page. It randomly chooses two 2011 mlb teams from the same league and plays a game between them, the same way I simulate games to calculate eWAR.

I'll continue to work on it--eventually I'd like to set up a simulated baseball season, shown live on the blog--but for now I'm interested in feedback on how to improve it.

Known things that could be improved:

1) Groundouts and flyouts could be treated differently, with batters replacing runners on first on groundouts that don't turn into double plays.

2) The batting orders, roughly speaking, are the most frequently used lineups from the team from 2011, but this can have some weird consequences, like not using someone who is generally in the lineup because they switch spots frequently enough that no single lineup involving them has been used very much. So, some of the orders that I'm using aren't very good. To that extent, if you see a lineup that is not very representative of a team, tell me and I can change it. (There are also potential oddities where the lineup lists I'm reading from only use last names, so, for example, for a while the Pirates were using Donald McCutchen in their lineup instead of Andrew McCutchen.)

3) Right now people are always thrown out trying to take an extra base on 3% of balls put in play; if someone could find better player-by-player data for this, that'd be awesome.

Anyway, feel free to shoot me any comments you have. And in the mean time, I hope you spend as much time staring at text-based baseball games as I have.

Addendum: if you know a good way to synthesize pitcher stats with hitter stats to predict an at bat, I'm all ears; right now the approximation I'm thinking of using is to add up the deviations from average of the two statistics, though of course in the end what I really want to do is do a regression....

Friday, July 27, 2012

The Playoffs and the Trade Deadline

Long story short: you should make aging all star for prospect trades at the start of the season, instead of at the trade deadline, if you're at least 70% sure, before the season starts, that the team getting the aging all star will be in playoff contention and that the other team will not.

_______________________

In Major League Baseball, teams are not allowed (or, in some cases, just significantly hindered) in making trades after the July 31st trade deadline. This left me wondering: in what cases does it make sense to wait until the trade deadline to make trades, and when does it make sense to do it at the start of the year?

Put another way, it's been about ten years since the Seattle Mariners were in playoff contention, and about twenty years since the Yankees weren't--the fact that this is also true this year shouldn't surprise anyone. So why didn't the Mariners trade Ichiro to the Yankees at the beginning of the year?

Most trade deadline trades, like the one involving Ichiro Suzuki, involve a team in playoff contention trading some future prospects and/or money to a team not in contention in return for an (often aging) borderline all-star level player. The advantage of making this trade at the beginning of the year, instead of waiting until halfway through for the trade deadline, is that the team in contention gets the good player for longer and thus the player has more utility for them; the disadvantage is that you might accidentally trade away a good player and then find yourself in playoff contention, or vice versa.

The first thing I investigated was the following: how much does gaining an Ichiro-like player help in the regular season, and how much does it help in the post season? Here were the assumptions I made.

1) All that matters is maximizing chances of winning the world series.
2) The player will add roughly 3 wins to the team (i.e. have a WAR of 3 more than the person they're replacing).
3) The team will be facing teams of equal strength not including the player traded for in the playoffs, and will win with a probability over .500 corresponding to the difference in their inherent winning percentage and their opponents'.
4) The team suspects that they will end up doing something between winning the division by 7 games, and losing by 10 games (roughly the average difference between division leader and second in division in 2011).

The probability of a team winning the world series p_WS = p_PL + p_PLW, the probability of making the playoffs + probability of winning the playoffs.

p_PL will be increased by ~3/20 by making the trade at the beginning of the season, meaning that p_PL will go from 50% to 65%, a proportional increase of .65/.5 = 1.3

To calculate the change in p_PLW, note that they have gained 3/162 ~ .0185 in winning percentage, meaning that they have a 51.85% chance of winning a playoff game (by assumption 3). So, whereas before p_PLW = 12.5%, now p_PLW = 15.6%, a proportional increase of about 1.25.

I had not been expecting this result; I had assumed that the playoffs would be random enough that player quality wouldn't matter as much as it would in the regular season (Because there are more games). But it seams that, in fact, a good player added to a good team might make about as much of an impact in the playoffs as in the regular season. (Note, however, that if you give yourself some partial victory credit for making the playoffs even if you don't win, that that would argue in the other direction.)

So, what's the upshot of this, as far as the trade deadline is concerned? Without trading you have a 50%*12.5% = 6.25% chance of winning the world series. If you trade at the beginning of the year, you have a .65*.156 = 10.14% chance of winning the world series. And if you make a trade at the trade deadline you get about 1.5 extra regular season wins from the player, so (by the same logic) you have roughly a .575*.156 = 8.98% chance of winning the world series.

So, making a before-season trade increases your probability of winning the world series by about 3.89%, whereas making it right before the trade deadline increases it by about 2.74%.

This means that the advantage of making the trade pre-season is that it's about 3.89/2.74 = 1.42 times as effective. Assuming that you'll be pretty sure by mid season whether you're in a playoff race, the question then becomes: before the season, are you at least 1/1.42 ~ 70% sure that your team will be in a competitive playoff race, and the other team won't be? In another post, I'll look at this question.

Monday, July 16, 2012

Some calculations about Tim Lincecum

Tim Lincecum is a starting pitcher for the San Francisco Giants. For the first five years of his career he was one of the best pitchers in the majors, with a cumulative ERA of 2.98. This year, however, he has been atrocious, with an ERA currently at 5.93. I decided to take a look at pitch-by-pitch data to see if I could make anything of it.

I noticed that he had an unusually large difference between home and away ERA--3.43 at home, but 9.00 on the road. Coupled with the source of home field advantage, I decided to investigate something: could his difference in play this year come from umpires restricting his strike zone?

As it turns out, no. In both 2011 and 2012 about 10% of his pitches that batters didn't swing at were balls miss-classified as strikes by the umpires, and about 2.5% were strikes that were called balls; umpires were no harsher this year than last.

I then took a look at placement. In particular, all else equal better pitches are generally around the edge of the strikezone, and worse pitches are generally either right down the middle or way outside the strikezone. So, I decided to look at the average distance from his pitches to the vertical and horizontal edges of the strikezone; this time there was a difference. In 2011 the sum of the vertical and horizontal misses from the edges of the strikezone averaged .923 feet; in 2012 it averaged .961 feet. It doesn't seem like a huge difference, but it is statistically significant, with just a 2% chance of occurring randomly (having a t-test value of -3.23). So, it does seem like his control is down from last year.

I also looked at the velocity of his pitches. There's been a lot of talk about his velocity decline; the decline, as it turns out, is real but not that precipitous: his fastballs and sliders have slowed down by about a mile per hour on average, though his changeup and curveball are still at roughly the same speeds there were in 2011.

So Lincecum's pitches are slower and less controlled than last year; in the end it just looks like Lincecum is pitching worse this season.

Is there anything else I should look at?

Friday, July 13, 2012

Improving Basim and feedback

I'm working on improving Basim and eWAR (some new content and some bug fixes), but it's going to take time to run the new calculations. (I'm also going to run them on more data points, i.e. multiple seasons.) While that happens, is there anything you think it would be cool for me to research/write about? Is there any improvement you'd suggest to Basim? Anything whose value in time and dollars I should compare to voting and donating to a campaign? If so, leave suggestions here in the comments (or drop me an email).

Also, I've spent a lot of time trying to decide how to integrate into eWAR the fact that lineups aren't totally irrelevant; my new approach is to take an average lineup (i.e. average 1st hitter, average 2nd hitter, etc.), and look at the run differential between replacing a test player for a given slot and replacing the league average player for that slot; this approach isn't perfect, though, because it is at risk of delegating all less than average players to ninth so that they total plate appearances in which they're batting, relative to the league average player, is minimized (and shove all better than average players to leadoff). Suggestions?

Also, as always, feel free to sign up for the RSS feed (widget on right), follow us on twitter, and spread the blog around.

SBF

Tuesday, July 10, 2012

Introducing eWAR: Empirical (offensive) Wins Above Replacement

This is continuing a series of posts about a baseball game simulator I made. For the introductory post, look here. For a revised look at the 2012 SF Giants lineup, look here, and for a description of how the simulator works, look here. Also, I've grown tired of referring to the simulator as such, so from here on out I'll refer to it as Basim (short for baseball simulator).

________________________________________

The short version of this article is that I've constructed a new stat, eRAA (empirical runs above average), that seems to correlate more with the number of runs a team scores than does traditional RAA, even accounting for ballpark corrections. This leads me to believe that eRAA may be a "better" stat than RAA, in that it better predicts how good a player is than RAA does. This leads to the definition of eWAR, empirical runs above replacement on offense, as 0.1*eRAA (using the canonical value of 10 runs/win).

I originally built Basim, a python script that simulates baseball games for a given lineup, with the intention of seeing which permutation of nine players produces the best results. It soon occurred to me, though, that there was something else I could use it for: I could use Basim to evaluate players. The idea was simple: put a given player in a lineup with eight average players and see how many runs above (or below) average that lineup scores by running hundreds of thousands of simulations of that lineup and recording the average runs per game; multiply it by amount they play, and you get eRAA--a measure of how good a player is, relative to average

The way to test whether eRAA is a "better" or "worse" stat than RAA, the basis for WAR, is to see which, when aggregated for a team, better predicts the number of runs that the team scores. I decided to run the test on the 2011 baseball season. My procedure was pretty simple:

1) I took aggregate batting stats for 2011 to find the "average" player, avePlayer*.

2) I ran 10,000,000 simulations of a lineup of nine of that player to find the baseline runs per game (brpg); I found that, for 2011, brpg = 4.1267312.

3) I also recorded the number of plate appearances per game that the test player (batting fifth) would have in the baseline simulation; I got paPerGame = 4.2913712999999998.

4) For each of the ~1,500 2011 major league baseball players, I ran 100,000 Basim simulations with them batting fifth and every other player being avePlayer. I recorded the number of runs scored per game on average by that lineup, playerRPG. I then computed eRAA = (playerRPG-brpg)*playerPA/paPerGame, the number of runs above the average player that they produced during the season.

5) For each team, I totaled up the eRAA of each hitter on their team to get their teamRAA. I added that to the average number of runs scored by a team aveTeamRuns = 693.36, to get eRuns, the number of runs my model would predict them to score in a season.

6) For each team, I also added up Rbat, Rbaser, and Rdp, the three offensive stats contributing to RAA. All statistics were taken from baseball-reference.com. However, I believe that Rbat attempts to correct for the park that the hitter plays in. In order to get an apples-to-apples comparison, I reversed that by multiplying the predicted runs (total, not above average) by the team's ballpark adjustment factor. (It's possible I messed this step up; my understanding is that this factor should be applied multiplicative to a player's runs created.) This got me the version of RAA I tested eRuns against; I'll call it RAAproduced.

7) I also recorded the total runs scored above average runsScoredaboveAverage by each team that season, and looked for correlations between that number, RAAproduced, and eRAA**.

The results: RAAproduced, the classic building block of WAR, had a correlation coefficient of 0.9459 with runsScoredAboveAverage, missing by an average absolute value of 28.6. eRAA had a correlation coefficient with runsScoredAboveAverage of 0.9592, missing by an average of 19.3 runs per team.

So, while both RAAproduced and eRAA correlated very well with the number of runs a team scored, eRAA correlated slightly stronger, leading me to believe that eRAA better predicted a player's value to a team than did RAA.

In order to see what eRAA said about individual players, I found the 2011 hitters with the highest eWAA, empirical offensive wins above average (eRAA/10):

Name	eWAR
Jose Bautista	9.41
Matt Kemp	8.38
Miguel Cabrera	7.83
Ryan Braun	6.85
Prince Fielder	6.58
Jacoby Ellsbury	5.61
Lance Berkman	5.38
Joey Votto	5.34
Adrian Gonzalez	5.23
Justin Upton	4.94
Curtis Granderson	4.61
Mike Napoli	4.41
Jose Reyes	4.24
Albert Pujols	4.15
Troy Tulowitzki	4.14

There are of course a number of claims in this article with which one could take issue. Not everything is quite an apples to apples comparison; I tried to turn WAR into the form most apt to compare to eWAR, but WAR still attempts to correct for managerial decisions (e.g. intentional walks) in a way mine doesn't. Also, WAR also has already developed adjustments for ballpark, position, and fielding that eWAR hasn't. I didn't tweak my formula at all after looking at the 2011 data set, but I should still run it in a truly out of sample year; I'll run it on 2010 and see how it performs there. Also, there is still a lot of work to be done on Basim. In particular, it makes arbitrary decision with runners on first and other bases at the same time, and doesn't have much granularity on taking extra bases. It also doesn't handle pitchers particularly well, and arbitrarily bats the test player 5th in the lineup. But those are all the more reason to believe that with more tuning eWAR could be a potential complement, or even supplement, for oWAR.

_____________________

*I summed up the stats of every player in the league in 2011 to create the averagePlayer::

PA:185245

BB + IBB + HBP:15018+1554+1231

Hits: 42267
Doubles: 8399
Triples: 898
Home Runs: 4552
Stolen Bases: 3279
Caught Stealing: 1261
Strikeouts: 34488
Ground into Double Play Rate: .1
Stolen Base Opportunities: 67623
Reached on Error: 1816
Extra Bases Taken Percentage: .41

**For reference, a list of (eRAA,RAAproduced,runsScoredAboveAverage) for each team in alphabetical order fo 2011 team abbreviation: [(45.200205677844551, 21.785200000000032, 37.259999999999991), (-44.774112414835749, -92.267200000000003, -51.840000000000032), (-5.1114567038280354, -49.814399999999978, 14.580000000000041), (179.73419545402641, 160.32159999999999, 181.44000000000005), (-38.23559280456594, -60.374400000000037, -38.879999999999995), (-51.724434797799972, -91.083600000000047, -38.879999999999995), (45.215654725565059, 42.115200000000073, 42.120000000000005), (-37.44421849491345, -46.934399999999982, 11.339999999999918), (54.75322142365065, 20.457599999999957, 42.120000000000005), (75.629775964620023, 81.570800000000077, 93.960000000000036), (-40.913050474099116, -62.37360000000001, -68.040000000000077), (-110.38232860437888, -66.712800000000016, -77.759999999999991), (20.932611680559809, 25.113600000000019, 37.259999999999991), (-3.4382776433257294, -30.865199999999959, -25.919999999999959), (-18.522117860088237, -56.98720000000003, -45.360000000000014), (58.721202940421378, 49.368000000000052, 27.539999999999964), (-96.570937732654357, -132.51800000000003, -74.520000000000095), (24.467027544318942, 21.412799999999947, 24.299999999999955), (118.53137657885713, 112.70880000000011, 173.33999999999992), (-103.13524462448632, -101.31079999999997, -48.600000000000023), (27.872957718666616, 26.268000000000029, 19.440000000000055), (-125.28292385233611, -117.97440000000006, -82.620000000000005), (-122.5675836066669, -137.34879999999998, -100.43999999999994), (-150.53768491204667, -156.28160000000003, -137.69999999999993), (-103.57234010489849, -132.33960000000002, -123.12), (79.896154919058134, 22.331999999999994, 68.039999999999964), (30.389992028888202, 0.6512000000000171, 12.960000000000036), (175.26261892090292, 236.04119999999989, 162.0), (41.654865473886879, -8.6655999999999267, 50.219999999999914), (-64.042960207148923, -77.0, -64.800000000000068)]

Thursday, July 5, 2012

Baseball Game Simulation details

Earlier I wrote some early results from a baseball game simulator I wrote. Here I'll describe exactly how the simulator works.

As an input the simulator takes a lineup of nine players, along with the following stats for each player: plate appearances, walks+IBB+hit by pitch, singles, doubles, triples, homers, strikeouts, ground into double play rate, stolen base opportunities, stolen bases, caught stealing, number of times reached on error, and extra bases taken percentage; all stats were taken from baseball-reference.com.

The simulator then simulates an entire (offensive) game from that lineup.

Each time a batter steps to the plate, the following things are simulated:

1) The play that the batter creates is randomly chosen based on how frequently they hit into that type of play. The options are single, double, triple, home run, walk, strikeout, ground into double play opportunity, other out. Reached on error is treated as a single, and intentional walks and hit by pitches are treated as walks.

2) Baserunners (sometimes) try to advance. Each baserunner other than the batter attempts to take an extra base with probability governed by their extra bases taken percentage, and if they try to advance they are thrown out with 3% probability. Baserunners try to advance on all base hits and all outs other than strikeouts.

3) If it was a ground into double play opportunity, then if there was a runner on first a double play is executed. Otherwise it's just a normal groundout.

4) After each plate appearance, if there are runners on base with the next base empty they attempt to steal with probability (sb+cs)/sbo, and are thrown out with the appropriate frequency. Runners do not try to steal third with zero or two outs, and they do not try to steal home.

Using this method an inning is simulated, and the runs scored, along with the next batter up, are returned; the simulator then clears all bases and outs, and starts simulating the next inning.

Thoughts on how to improve it? Right now the frequent bunting of pitchers is only implicitly counter for in their high number of outs but low number of strikeouts and ground into double plays, which will advance runners a good deal (but not as much as they should).

Wednesday, July 4, 2012

More Giants Lineup comments

For the previous post in this series, click here.

First of all, I found a bug in the stealing bases section of my lineup code. I've re-run the simulations; new findings:

1) A random lineup of the giants players scores, on average, 4.06 runs/game.
2) The current Giants lineup scores, on average, 4.13 runs/game
3) The top lineup this time was: Buster Posey, Brandon Belt, Melkey Cabrera, Pablo Sandoval, Gregor Blanco, Angel Pagan, Brandon Crawford, Ryan Theriot, Barry Zito, scoring 4.29 runs/game on average.
4) The old best lineup (with the buggy code) scored 4.22 runs/game on average.

So, it still looks like the Giants could get another ~.16 runs/game out of their lineup, translating to ~2.56 more wins in a season.

Also, I decided to see what effect stealing bases had on the runs scored by a lineup; the answer, essentially, was none: without allowing stolen bases a random lineup scored ~4.05 runs/game, just .01 less than with stolen bases. So it looks like stealing bases is close to a wash (note that the Giants have pretty good base stealers this year in Blanco, Pagan, and Cabrera).

It occurs to me, though, that by using a half season of one team I have some sample size issues. There are also some oddities surroundings pitchers (i.e. the high frequency of bunts, which my program only half accounts for). So, I'll next look at many different AL teams.

FWIW, here were the ten best lineups, along with their average runs scored: [([Brandon Belt, Angel Pagan, Gregor Blanco, Pablo Sandoval, Buster Posey, Melkey Cabrera, Brandon Crawford, Barry Zito, Ryan Theriot], 4.2150850000000002), ([Gregor Blanco, Melkey Cabrera, Pablo Sandoval, Brandon Belt, Angel Pagan, Brandon Crawford, Buster Posey, Ryan Theriot, Barry Zito], 4.2343450000000002), ([Gregor Blanco, Brandon Belt, Angel Pagan, Pablo Sandoval, Melkey Cabrera, Buster Posey, Ryan Theriot, Barry Zito, Brandon Crawford], 4.2404599999999997), ([Angel Pagan, Gregor Blanco, Brandon Belt, Pablo Sandoval, Melkey Cabrera, Brandon Crawford, Ryan Theriot, Buster Posey, Barry Zito], 4.2502000000000004), ([Gregor Blanco, Brandon Belt, Angel Pagan, Melkey Cabrera, Pablo Sandoval, Brandon Crawford, Ryan Theriot, Barry Zito, Buster Posey], 4.2520499999999997), ([Brandon Belt, Gregor Blanco, Angel Pagan, Melkey Cabrera, Pablo Sandoval, Ryan Theriot, Barry Zito, Brandon Crawford, Buster Posey], 4.2529050000000002), ([Gregor Blanco, Pablo Sandoval, Brandon Belt, Melkey Cabrera, Angel Pagan, Buster Posey, Brandon Crawford, Ryan Theriot, Barry Zito], 4.2606700000000002), ([Buster Posey, Gregor Blanco, Brandon Belt, Pablo Sandoval, Melkey Cabrera, Angel Pagan, Ryan Theriot, Brandon Crawford, Barry Zito], 4.2787699999999997), ([Buster Posey, Brandon Belt, Gregor Blanco, Pablo Sandoval, Melkey Cabrera, Angel Pagan, Ryan Theriot, Brandon Crawford, Barry Zito], 4.2844249999999997), ([Buster Posey, Brandon Belt, Melkey Cabrera, Pablo Sandoval, Gregor Blanco, Angel Pagan, Brandon Crawford, Ryan Theriot, Barry Zito], 4.2864449999999996)]. Why are these lineups the best? I'm not completely sure. Thoughts?

Thursday, June 28, 2012

Batting Order and Simulating Games

Welcome to Measuring Shadows, a blog about thoughts, games, and ethics. We will post about more serious subjects later, but first a foray into Baseball batting orders.

---------------------

Whether or not batting orders significantly impact a Baseball offense is an often asked question, and similar treatments to mine can be found all over the web, for instance here and here. To my knowledge, though, mine is slightly more thorough than others I've found.

My strategy is to take a possible lineup of ten players, look at the frequency with which they create different hits, and simulate entire baseball games based on those, recording the number of runs the offense scores; averaging runs scored over many (~100,000) simulations gives an estimate of the strength of an offense.

My test case has been this year's Giants lineup: Gregor Blanco, Ryan Theriot, Melkey Cabrera, Buster Posey, Angel Pagan, Pablo Sandoval, Brandon Belt, Brandon Crawford, and Barry Zito (chosen as a pitcher with roughly average batting stats). In future posts I'll write about exactly what my method was and extensions, but to start off with, some results:

1) An average (i.e. random) ordering of those nine players creates about 3.78 runs/game.

2) The Giants' typical batting order creates roughly 3.82 runs/game.

3) The best performing lineups in the simulation create roughly 3.99 runs per game. In particular, the following lineup performed best in my simulation: Theriot, Sandoval, Blanco, Belt, Cabrera, Posey, Pagan, Crawford, Zito; it created 3.992 runs/game on average (over 200,000 trials).

Due to sample size issues there might be another lineup which performs slightly better, but I think that 3.99 runs/game is a reasonable value for the best the Giants can do. Taking the canonical value of 10 extra runs creating one win, the Giants could get roughly 2.75 more wins in the season, or roughly .015 extra winning percentage, by changing their batting order. This isn't a huge difference, but given that most winning percentages are between .400 and .600, the difference between .550 and .565 is non trivial.

This kind of calculation is kind of cool, but perhaps a more important application of these simulations is in valuing players: one thing that can be done is to take a lineup composed of nine of a given player and see how many runs it scores. How well does this predict how "good" a player is? Which has a stronger correlation with a team's number of runs scored: the (weighted) sum of the runs created per game (using these simulations) of each of its players, or the sum of the WAR of its players? I'll look at these questions in a future post.

--Sam

Measuring Shadows