Measuring Shadows: 2012

Monday, December 31, 2012

Pitcher Fatigue, Part 2: The Top 10

Earlier, I wrote a post on the declining effectiveness of starting pitchers as they get deeper into games, postulating that it came from two major sources: the first being the fact that it's difficult to throw 100 pitches in a night without your arm getting temporarily tired, and second that the second time a batter sees a pitcher, they already know what type of stuff the pitcher is throwing and so are better able to hit it. Overall I estimated that by rotating pitchers frequently each game so that no pitcher went through the lineup more than once, a team could save about 5.6 wins each season (ignoring other effects, like the fact that if you're in the NL you get to pinch hit more often).

Also, starting with this post I'm going to make a conscious effort to switch from using OPS as my default batting stat to wOBA. wOBA, which is on the same scale as on-base percentage, is basically a version of OPS that uses more accurate weightings for events.

______________________________

On average, in 2012, the first time pitchers saw a batter they allowed a wOBA of about 0.338. The second time they saw those batters, the wOBA jumped to about 0.350, for a difference in wOBA of about 0.011. I'm going to name this statistic--wOBA for second plate appearances minus wOBA for fist--w-diff.

So the league average w-diff in 2012 was about 0.011. But different pitchers had different w-diffs.

Look, for instance, at R.A. Dickey. Dickey is a knuckleballer, and so one would expect hitters to be unusually bad the first time they see him--they have no practice hitting a knuckle-ball--but to get much better the second time, meaning one would expect him to have an unusually large w-diff. And, in fact, he does have a large w-diff over his career if you ignore all of the seasons in which he didn't have a large w-diff, which is a thing that makes a lot of sense to do if you have a personal vendetta against the year 2011.

Sunday, December 30, 2012

Being a Utilitarian, Part 2: Conventional Charities

This is the second post in a series on actually being a utilitarian in the world; for the first post, look here. Also, for a more theoretical series on utilitarianism, look here.

______________

So, say that you're a utilitarian, and you're wondering what to do with your life. (Even if you're not a utilitarian but are wondering what to do with your life, most of this will apply.) What should you do? What, in the current society, can an individual do to make the world a better place? And what causes should you care about?

Is there anything you can do with your life to make the world a better place?

Sunday, December 23, 2012

Less Stupid Use of Pitchers: Pitcher Fatigue

A while ago I wrote a post about one of the most unenlightened areas of baseball strategy: the use of pitchers. I proposed eliminating the distinction between starting pitchers, middle relievers, and closers in favor of a system that just uses a set of pitchers, each pitching different total numbers of innings, but no single pitcher pitching more than a few innings in a game; in other words, a starter would now throw two innings every few games instead of seven innings every five games.

The advantages of this, as I see it, are four fold.

1) If you're an NL team, you can pinch hit for your pitchers whenever they come up.

2) Pitchers don't have to throw 100 pitches in a game.

3) Batters never get to see the same pitcher twice in a game, and so can't get used to their pitches.

4) You can get the pitcher-batter match-ups you want all the time, instead of being stuck with your same pitcher the first three times through the lineup.

In the first post I estimated the size of effect (1): pinch hitting for you pitcher every time would let you score about 0.2 more runs per game, translating into about 3.2 wins per season (the difference between a .500 team and a .520 team).

Now I'm going to look at effects (2) and (3).

Tuesday, December 4, 2012

Being a Utilitarian, Part 1

I've written a series of posts about the different types of utilitarianism arguing for aggregate, classical, act, one-level utilitarianism. I haven't, however, talked at all about what it would mean to be a utilitarian in the real world.

In the real world, obviously, you aren't faced with a series of trolley problems or utility monsters. If you don't think about it very much, you might conclude that utilitarianism isn't actually useful because you can't calculate the total utility of each possible action.

However, as it turns out, utilitarianism can be useful even if you don't know the exact state of the universe.

In future posts I'll examine thornier, more wide-reaching issues, but for now I'll just talk about one issue--the first issue that I actually thought about in utilitarian terms. For people familiar with utilitarianism it probably won't be that interesting or revolutionary, but it's a good way to remind yourself that just because a theory is complicated doesn't mean approximations can't be useful. (It also parallels an argument Peter Singer has made on the subject.)

Re-starting the blog, and results of the second contest

As you may have noticed, after a hiatus while the school year started, I'm back to blogging.

First, I never resolved the second contest. No one solved the puzzle but Matt Nass made partial progress, so he gets 3 Shadow-points. I'm going to leave the puzzle open and if anyone solves it they get one Shadow-point. Here's the puzzle again, with a little bit filled in as a hint:

Instructions for the puzzle are here.

Also, I think that weekly was probably too frequent for the contests, so they're going to change to bi-weekly; I'll have another one out soon.

If there's anything you want me to write about, put it in the comments here.

Friday, November 23, 2012

Newcomb's Decision

This post is partially a continuation of my previous posts on utilitarianism, and partially on philosophy in general; mostly, it's my two cents on one of the odder parts of consequentialist debate: decision theories.

Newcomb's Paradox

You, a mere mortal, encounter P, some super smart alien. Or maybe it's a supercomputer, or maybe a god; versions of the paradox differ on this. P comes up to you and says: "I have a deal for you. I'm going to give you two boxes--box A, and box B. Box B is transparent, and you can see $1,000 in it. You can't see what's in box A. I'm going to give you two choices. The first is to take box A--you get whatever is in it. The other choice is to take both boxes--you get box A, plus the $1,000 from box B."

So, you ask, why don't you take both boxes, getting the free $1,000? Well, says P, there's a catch: "I have predicted whether you will take one box or two boxes." (Or maybe I've simulated all of the atoms in the universe, or maybe studied your psychology, or maybe something else--versions of the paradox differ in how P knows how many boxes you're going to take. But however he knows it, you believe him; maybe he has, in the past, predicted everyone who's taken this challenge successfully.) "So I know what you're going to do", says P, "and before you arrived I decided how much money to put in box A. If I predicted that you were going to take only box A, I put $1,000,000 in it. Otherwise--if I predicted that you were going to take both boxes--I left box A empty."

"So", says P, "How many boxes do you want to take?"

Sunday, November 11, 2012

Elections and the Future

The Democrats' victory in the 2012 elections--primarily President Obama's reelection but also the Democratic caucus in the senate growing by three senators*--has caused a fair amount of hand wringing among conservative circles about the future of the Republican party--the new fashion in political circles seems to be guessing which of opposition to comprehensive immigration reform, opposition to gay rights, and opposition to tax hikes for the wealthy will have been felled by the 2012 election. I agree in large part with the long term trend of American politics, but I think it's important to keep it in perspective.

Tuesday, November 6, 2012

Election 2012 liveblog

Welcome to the Measuring Shadows 2012 election liveblog! We'll be live chatting on the widget to the right, and writing longer comments on this page.

Monday, November 5, 2012

Lame38, and liveblogging the election

I will be liveblogging the election tomorrow night on this blog, starting with some posts early in the day but picking up around 7pm. But why should you follow this blog?

Because there is a crisis waiting to unfold tomorrow night: what if polls close have just closed in Virginia but Nate Silver hasn't updated fivethirtyeight yet? How will you know if Obama's chances of winning the election have jumped to 92.5% or plummeted to 91.5%?

It is in the hopes that this catastrophe may be averted that I unveil Lame38: a shitty version of 538 that I promise to update frequently. What is Lame38? Well, I took all of the projections from fivethirtyeight, created a simplified model, and will run it with updated results--giving a real time projection of what Obama's chances of winning are, incorperating both states that have been called and the actual votes from noncompetitive states.

Q: Why is this better that 538?
A: Well maybe 538 will take like 15 minutes to update but Lame38 will only take 5 minutes. That'd be pretty cool, right?

Q: So it's just a shittier version of 538?
A: That's kind of a rude question.

Q: What information does your model incorporate as the night goes on?
A: I incorporate not just what states have been called, but also popular vote results from each state to try to estimate biases in the projections.

Q: So how does Lame38 work?
A: I took the projected vote differences and errors for each state from fivethirtyeight, as well as the standard deviation of the national popular vote. For each simulation I will sample a national popular vote, bias each state by its difference from the projected national popular vote (to estimate national bias), and then sample each state's vote. I then run about 1,000,000 simulations.

Q: So, what are the odds Obama wins the election?
A: Well, 538 says 92.2*%, and Lame38 says 93.5%, so I'd say the answer is about 92.2%.

Q: What else will you be blogging about?
A: Senate races, the race for California's 15th congressional district, and whatever else is on my mind.

Q: Will this be a great liveblogging, or the greatest liveblogging?
A: I'm shooting for "worth reading".

I hope to see you there!

*When I ran the simulations; 538 has since updated to 92.0%.

Thursday, August 16, 2012

Checking in on Tim Lincecum

A month ago I wrote an article looking at changes in San Francisco Giants' starting pitcher Tim Lincecum's pitch command and velocity this season. In particular, I found that the velocity on his fastball and slider had decreased by about a mile per hour, and that the average distance of his pitches from the edge of the strikezone had increased--assuming that, generally, pitches near the edge of the strikezone are better than those in the middle or nowhere near it.

Since the all-star break, though, Lincecum's ERA, at least, has been a respectable 3.66. Have his command and speed improved as well?

It turns out that his speed is the same as earlier this year, with a fastball averaging around 90.4 mph, and about a mile per hour slower than last year. So, no improvement on that front.

His average distance from the edge of the strikezone, on the other hand, has gotten a bit better--it's been at .940*, as compared to .923 last year and .961 for the first half of 2012**.

So, long story short, there are some signs that his pitching might be picking up but nothing conclusive.

By the way, I'll be announcing results from the second contest and introducing the third one in the next day or two.

_________________________________________________________________________________
The value for the second half of this year is only statistically significantly different from the value from last year, and from the value for the first half of this year, at the p= .30 level--so the jury's out on this one.

Monday, August 13, 2012

Checking in on the Giants' lineup

Earlier a wrote a few posts on what the SF Giants' optimal starting lineup should be using Basim, a baseball simulator I wrote. A lot has changed since then, though--Posey has become much better, Blanco and Pagan have cooled off, and Hunter Pence and Marco Scutaro have joined the club. So, what should the Giants' lineup look like now? What lineup do I hope they start tonight?

First off, here was my guess at their best lineup (once again assumin Zito is the pitcher):

1. Buster Posey
2. Brandon Belt
3. Melky Cabrera
4. Pablo Sandoval
5. Hunter Pence
6. Marco Scutaro
7. Angel Pagan
8. Brandon Crawford
9. Barry Zito

Running this lineup through the simulator*, it scored an average of 4.03 runs per game.

I then found that a random lineup (i.e. random ordering of the nine players) scored about 3.89 runs per game. The lineups that the simulator liked the best generally had Posey, Belt, Cabrera, or Pence batting leadoff, which is unsurprising--each has either a high OBP or a high ground into double play rate that would be very painful in the heart of the order. The single lineup that the simulator liked best** was the following:

1. Buster Posey
2. Angel Pagan
3. Hunter Pence
4. Brandon Belt
5. Melky Cabrera
6. Marco Scutaro
7. Pablo Sandoval
8. Brandon Crawford
9. Barry Zito

It scored an average of about 4.035 runs per game***.

I then looked at two lineups that were close to what I predict the Giants will run; they differ only in whether the Giants play Theriot or Crawford; the lineup Pagan, Scutaro, Cabrera, Posey Sandoval, Pence, Belt Theriot, Zito scored an average of 3.96 runs per game, while the same lineup but with Crawford batting for Theriot scored 4.00 runs per game on average. So, it seems like about half the difference between my lineup and the one with Theriot just comes fromt the fact that Craword is better than Theriot.

Anyway, here's to hoping the Giants will do something smart.

_________________________________________________________________________________
*: For Pence and Scutaro I used their pre-Giants numbers.

**: This should be taken with a grain of salt--to actually find the best would take days of simulation; treat this as a lineup that is pretty close to the best.

***: FWIW, the ten best starting lineups, according to the simulatr (with the same caveat as **), in the form (lineup, average runs scored by lineup per game): [(['Buster Posey', 'Melky Cabrera', 'Brandon Belt', 'Marco Scutaro', 'Hunter Pence', 'Pablo Sandoval', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito'], 4.0152400000000004), (['Brandon Belt', 'Buster Posey', 'Pablo Sandoval', 'Melky Cabrera', 'Hunter Pence', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito', 'Marco Scutaro'], 4.0168675), (['Melky Cabrera', 'Buster Posey', 'Brandon Belt', 'Marco Scutaro', 'Pablo Sandoval', 'Brandon Crawford', 'Angel Pagan', 'Hunter Pence', 'Barry Zito'], 4.0179150000000003), (['Brandon Belt', 'Buster Posey', 'Pablo Sandoval', 'Melky Cabrera', 'Hunter Pence', 'Brandon Crawford', 'Angel Pagan', 'Marco Scutaro', 'Barry Zito'], 4.0218325000000004), (['Melky Cabrera', 'Brandon Belt', 'Marco Scutaro', 'Pablo Sandoval', 'Angel Pagan', 'Hunter Pence', 'Buster Posey', 'Brandon Crawford', 'Barry Zito'], 4.0219899999999997), (['Hunter Pence', 'Marco Scutaro', 'Buster Posey', 'Pablo Sandoval', 'Brandon Belt', 'Melky Cabrera', 'Angel Pagan', 'Brandon Crawford', 'Barry Zito'], 4.0253224999999997), (['Brandon Belt', 'Hunter Pence', 'Pablo Sandoval', 'Angel Pagan', 'Melky Cabrera', 'Brandon Crawford', 'Marco Scutaro', 'Buster Posey', 'Barry Zito'], 4.0315525000000001), (['Hunter Pence', 'Angel Pagan', 'Brandon Belt', 'Melky Cabrera', 'Buster Posey', 'Pablo Sandoval', 'Marco Scutaro', 'Brandon Crawford', 'Barry Zito'], 4.0322525000000002), (['Marco Scutaro', 'Pablo Sandoval', 'Brandon Belt', 'Melky Cabrera', 'Buster Posey', 'Hunter Pence', 'Brandon Crawford', 'Angel Pagan', 'Barry Zito'], 4.0353874999999997), (['Buster Posey', 'Angel Pagan', 'Hunter Pence', 'Brandon Belt', 'Melky Cabrera', 'Marco Scutaro', 'Pablo Sandoval', 'Brandon Crawford', 'Barry Zito'], 4.0365225000000002)]

Second contest ends tonight (Monday night)

Last week I announced the second contest, to solve this intimidating puzzle:

Details are at the first link. So far no one has fully solved it, so send in your partial solutions--they'll probably make it to the top three.

The contest ends tonight (Monday night) at 11:59 pm. I'll announce the third contest sometime Tuesday.

Friday, August 10, 2012

Swing Vote!

Hey everyone. Sam made a post a couple weeks ago analyzing votes (and money) for congressional elections. This is a post about analyzing votes for US Presidential elections. This is the result of a 2am conversation-calculation with Sam and a mutual friend Gary Wang. Also, we base a lot of intermediate steps in the result comes from the website FiveThirtyEight.

The question we're going to try to answer is the following. You are Joe Smith from Iowa (or, insert your favorite state here), who decides to stay home on November 6, election day. What's the probability you wake up the next morning, open up Google News, and feel really stupid? Or more succinctly, what's the probability that a single vote in Iowa will make the difference in November?

Thursday, August 9, 2012

Traditionball: the most unenlightened area of baseball strategy

About ten years ago, baseball started to undergo a statistical revolution: youth became valued, OPS was born, and walks finally became valued. Fast forward a decade and OPS is now a mainstream stat, multiple sites are constructing competing ways to summarize the total value of a player, and even in baseball clubhouses sabermetrics are the new cool kid on the block.

But there are still a few areas of baseball strategy stuck in the dark ages of gut instincts and wild speculation, and chief among them is use of pitchers.

Right now it baseball there are three types of pitchers: starters, relievers, and closers. Starters come in to pitch the start of the game, stay in for at least five innings, and are eventually taken out. They pitch every five days. Closers come in in the ninth inning with a lead of between one and three runs. They never pitch more than an inning, and never come in otherwise. Middle relievers pitch in between starters and closers.

These roles bear an uncanny resemblance to two of the stupidest pitching statistics, wins and saves.

This system is, of course, not close to optimal. Frequent pitching changes at the beginning of the game would allow a manager to get better matchups, keep pitchers fresher from stopping them from having to throw too many pitches in one day, and allow pitchers to throw however many pitches is best for them--not a bimodal distribution with centers at fifteen and one hundred.

It would also give an NL team another advantage--they could always pinch hit for their pitchers (or at least as long as it wasn't a two out, none on situation).

I'll look at the first effects in a later post, but for now, how much would always pinch hitting help?

Well, first I found the number of runs scored by an average NL lineup from 2011 using Basim; it was 3.799.

Then, I substituted the average substitute player for the league in for the ninth spot in the lineup; Basim then simulated it and found an average of 4.006 runs per game.

That's roughly a 3.2 win difference right there--the difference between a .500 team and a .520 team.

It's true, of course, that implementing such a system could incite a revolt from pitchers--but it seems like there is too much to be gained for it to be worth ignoring as a manager.

Tuesday, August 7, 2012

Utilitarianism, part 6: To do, or not to do

This is the sixth post in a series about utilitarianism. For an introduction, see the first post. For a look at total vs. average utilitarianism, see here. For a discussion of act vs. rule, hedonistic vs two level, and classical vs. negative utilitarianism, see here. For a response to the utility monster and repugnant conclusion, see here. And for a look at whether to count lives not yet in being, see here.

Also, note that I'm now putting page breaks in the middle of my posts so that you can see more than one on the front page...
-------------------------------------------------------------------------

I'm going to start off by making a note about something slightly different from the content of this post. Earlier, I defined a philosophy as a preference ordering on all possible universes; the ordering had to be transitive, reflexive, etc. Basically, a philosophy is something that compares any two possible universes; in other words, it tells you which options are the best (if you have complete information, that is). Perhaps for you a philosophy is something different. Maybe it's something that compares some situations but doesn't say anything about other comparisons. Maybe it's a binary function that calls all actions either morally permissable or impermissable. Maybe it is a framwork to look at actions that doesn't necssarily tell you which are best, but instead some other difficult to define properties of them. Probabily it's a mechanism to justify your current way of life. Anyway, if you don't think a philosophy should be a preference ordering on possible universes, there's probably very little I can do to convince you, just as if you think faith is more important than evidence or that gut instincts are more important that statistics in baseball there's probably little I can do to convince you. But from now on I am using that definition, and will look critically upon philosophies that fail to create a preference ordering.

Act and Omission

Anyway, there is a large debate in philosophy about whether taking an action should be treated asymetrically from failing to take an action--the act/omission distinction. There are many phrasings of the problem, but here is one of the more famous ones: the trolley problem. The trolley problem is a thought experiment in which you, the actor, are near some trolley tracks. The tracks split, and past the split there are currently five people tied to one of the tracks, and three tied to the other. You're standing next to the lever which controls which path the trolley takes; in the first version of the problem the lever is currently such that three people will do, and in the second version of the problem the lever is currently such that the trolley will run over and kill the five people. A trolley is coming. You have time to pull the lever, if you want, but not to untie any of the people. In the first version it's pretty clear you don't pull the lever--not only are you causing the death of five people, but you're only saving three by doing it. But how about the seond version? Do you pull the lever and switch the trolley, kiling three other people, or not do anything and let the first five people die? That is, do you act, or not? And should morality treat the omission of action, which results in two extra deaths, the same way it would the action of killing two people? In other words, are these two scenarios the same? Does it matter which way the lever is currently pointing?

Contest Number Two: Two Degrees of Separation

Last week I introduced the contest of the week: each week I will propose a contest, and award Shadow-points to the winners; every two months the person with the most Shadow-points (in the period) will geet their name posted on the side of the blog, a $2 reward, and the chance to write any article they want for the blog. The results of the first week's contest are here.

--------------------------------------------------------------

I'm going to try to alternate types of contests, so this week's will be a little bit different from last week's. This week, the contest is a word puzzle of sorts. The first person to solve it will get first place, second person second place, etc.

Without further ado, here's the puzzle:

Two Degress of Separation*

Results from the First Contest

A week ago I proposed a contest for readers to construct the best possible lineup from the 2002 Giants roster; the rules are here and here. The winner of the contest gets 3 Shadow-points; second place gets 2; and third place gets 1. In addition everyone whose entry beats my own entry gets an additional Shadow-point. Every two months the person with the most Shadow points gets their name on in the Shadow Hall of Fame, $2, and the ability to write any one article for the blog.

-----------------------------------------------

Today, I'm announcing the winnners of the contest. The average submission scored somewhere around 4.9 runs per game, and the best scored above 5. I have more comments to make, but without further ado, the three winners:

Monday, August 6, 2012

Last Chance for Contest

A week ago, I introduced a contest to design the best lineup from the 2002 Giants roster; details are here and here. Submissions for the contest are due tonight, so if you want to participate but haven't given me a lineup, either post it as a comment here or email it to me by 11:59 tonight.

I'll announce the results of the contests sometime tomorrow.

Also, I've run some more simulations with Basim on the 2000-2011 season; it looks like the correlation between RAA* and runs scored by a team is .963; without very many simulations (20,000 per player) the correlation between eWAA and runs scored is .952, but that number should go up with more simulations as the noise goes down (due to limited computing power it's taking a while to get a fuller result).

Also, if anyone has a suggestion for what I should write on (baseball, philosophy, or anything else), let me know.

_________________________________________________________________________________

RAA, runs above average, is the baseline offensive stat used to construct WAR; I'm using a modified version that removes ballpark advantage, etc. to do an apples-to-apples comparison.

Friday, August 3, 2012

Examining eWAA: The Fifteen Best Players of the Decade

A while ago I wrote a python program, Basim, that simulates baseball games, and used to construct a statistic for the offensive output of a player: eWAA. Also, check the bottom of the post for a few notes on the contest of the week.

----------------------------------

I've run a Basim simulation on all player-years from 2000-2011, inclusive, and used it to calculate the eWAA (empirical wins above average) for all player-seasons; think of this as the number of extra wins a team would be expected to get in the season if they replaced an average player with the given player. Below, I've listed something close to the best 15 player-seasons in the 2000-2011 period. I say "something close to" because, due to lack of available computer power, I haven't run enough simulations to get a stable result; so, the numbers below should be taken with a standard deviation due to limited simulations of something like 0.33 eWAA. (Once I've run enough simulations I'll do a more in-depth look at eWAA, including its predictive power.) For fun, I also put the spot in the batting order that Basim thought they should hit that year*.

Name	eWAA	Season	Best Spot
Barry Bonds	12.22	2001	2
Barry Bonds	12.03	2004	2
Barry Bonds	11.35	2002	2
Todd Helton	9.11	2000	2
Sammy Sosa	9.03	2001	2
Barry Bonds	8.91	2003	1
Luis Gonzalez	8.44	2001	4
Alex Rodriguez	8.35	2007	4
Albert Pujols	8.16	2003	1
Todd Helton	8.13	2004	1
Albert Pujols	8.04	2009	2
Todd Helton	8.02	2004	3
Jose Bautista	7.59	2011	2
Albert Pujols	7.51	2004	1
Jason Giambi	7.41	2001	4

I then computed the best total eWAA throughout the period (summed over the years 2000-2011); if you want, the best offensive players (again, with roughly a 2.5% error) of the decade:

Name	eWAA
Albert Pujols	69.2
Barry Bonds	60.5
Alex Rodriguez	60.3
Todd Helton	54.0
Lance Berkman	46.3
Manny Ramirez	46.2
Chipper Jones	44.5
Bobby Abreu	39.8
Vladimir Guerrero	39.7
Jim Thome	39.2
Jason Giambi	36.0
Carlos Beltran	34.4
Miguel Cabrera	33.3
Brian Giles	32.9
Gary Sheffield	32.5

My initial reactions:

1) Bonds was, in fact, really good. Not only did he have the best seasons by far, but he has the second highest total eWAA despite retiring halfway through the period.

2) Pujols, unsurprisingly, is the man of the decade; he's been an mvp-level player for most of the years.

3) Coors field is really friendly. (Also, Helton is really good.)

4) Berkman, Abreu, Thome, Beltran, and Giles are really underrated.

-------------------------------

A few notes on the contest of the week (a contest to create the best lineup you can from the 2002 Giants roster; see here for details).

1) For fun, I'm going to create a lineup submission of my own; it won't be in the contest, but in addition to the normal Shadow-point payout, I'm giving one extra Shadow-point to everyone who submits a lineup that does better than mine.

2) To clarity, the numbers I'll be drawing the stats from are the ones listed here; they are the players stats from their time on the Giants in 2002.

3) So far there are 12 entries, all of which are unique.

4) I'm not going to show my lineup until I run the results, but as a teaser I'll say that there's one thing I think a lot of people are forgetting to think about.

Anyway, good luck in the contest; submissions are open until this coming Monday, August 6th at 11:59 pm.

_________________________________________________________________________________
*: I suspect that right now Basim is biasing too much toward having good hitters hit first and bad hitters hit last when calculating eWAA; I'll change that soon.

Thursday, August 2, 2012

The Fetishization of the Old

In about eighty years I will be dead, and in another eighty everyone who ever really knew me will be too. I will be at risk of being forgotten; everyone alive now will be, but most importantly for me, I will be. I would like to think that I will be remembered. We all would.

--------------------------------------------------------

-Beatrice (F) and Benedick (M) and fiances; so are Hero (F) and Claudio (M). The men are best friends, as are the women. Claudio believes Hero is cheating on him and breaks of their engagement. Beatrice tells Benedick, in retaliation for casting shame upon Hero, to kill Claudio. Benedick eventually relents, and agrees to murder his best friend.
-Stradivarious string instruments are instruments made by Antonio Stradivari.
-"A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed."

The objects referenced above share three similarities.

The first is that they're really old. The first, a summary of a key plot point in William Shakespeare's Much Ado About Nothing, was written sometime in the late sixteenth century. The second, the most famous string instruments ever made, were constructed sometime around 1700. And the third, the second amendment to the constitution of the United States of America, was adopted on December 15th, 1791.

The second commonality is that they are revered. Shakespeare is widely considered to be the best author ever to have lived. His works are required reading at almost every level of school, the subject of quite a lot of academic research, and the focal point of many theater festivals around the world. Stradivarius violins sell for a few million dollars each, and cellos an order of magnitude above that--both an order of magnitude above the cost of other professional-level instruments. The constitution has become the focal point for almost every public policy debate in Washington, by far the most ubiquitously cited source, and it was the interpretation of the constitution that rested at the heart of the recent supreme court case on Obamacare. The second amendment itself has determined the balance of gun control laws in America, has been used to limit local attempts to ban certain guns and to determine which attempts to limit access to guns are allowed.

The third thing that these three old, revered works share in common is that they are ridiculous. The plot twist in Much Ado--typical of Shakespeare--relies on simultaneously one-dimensional and unrealistic characters, illogical plots, and obvious endings. I mean, come on--kill someone because he thinks, with good reason, that his fiance is cheating on him? Beatrice is absurdly out of line in an unrealistic way; Benedick is absurd for listening to her, and this is all supposed to be taken in stride. Professional violinists don't show preferences for Stradivarius violins in double blind tests versus newer instruments. And we as a country should be able to decide what the best gun control laws are and enact them democratically, instead of listening to vaguely worded commands about gun laws from people who lived two hundred years ago when we were in open rebellion against a foreign occupier and didn't yet have a reliable police force or army. Instead of making decisions about what laws make sense in a country with internal security, a police force, and an army, we have to constantly make sure that "A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed." Whatever the fuck that means. All of Shakespeare's plays are like that, too, and the problem with the constitution is more general than the second amendment.

I could go on and on about the failings of Shakespeare and the constitution and Stradivarius violins, and at the bottom of this post I do*, but really I shouldn't need to: the Bayesian priors are pretty damning. About half of the people born since 1600 have been born in the past 100 years, but it gets much worse than that. When Shakespeare wrote almost all of Europeans were busy farming, and very few people attended university; few people were even literate--probably as low as about ten million people. By contrast there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564? The Bayesian priors aren't very favorable.

And take a look at string instrument creation. Not only does current society have much more expendable income and energy to devote to things like creating instruments, but we now have machines capable of cutting wood with micrometer level precision available to consumers; what are the odds, really, that the best violins have been made by a human hand in 1700?

The problem is much more systemic than plays and violins and laws. Citizen Kane was finally unseated as the best film of all time and bumped to number two--still quite an achievement for an almost unwatchably empty film. Old wines sell for ridiculous prices despite the lack of correlation between price and taste. (See here for recently disgraced Jonah Leherer's attempts to salvage expensive win.) The framers of the constitution are easily the most revered people in America and--importantly--those most often looked to for advice on public policy--despite being, you know, people with slaves who wouldn't understand a thing about the modern economy or technology or society. I spent a fair chunk of my childhood trying to decide who the best ten baseball players ever were--how does Gehrig compare to Bonds?--even though any of the players from 1920 would flunk if forced to play against modern teams. Again and again in our culture, the same theme pops up: we fetishize the old.

We like old plays and old movies and old wines and old instruments and old laws and old people and old records and old music. We like them because they're old and come with stories but we convince ourselves that there's more: we convince ourselves that they really were better. We don't just read stories about the framing of the constitution at bedtime, we use it as our guide for public policy. We don't just like to listen to the Beatles but we convince ourselves that they are the best and that anyone who doesn't like them doesn't have good taste in music. We don't just respect the old; we think that the old is right and that those who prefer the new to the old are wrong.

So why is it that we have become so enamored of things made in 1700?

There are many reasons. One is that there is a whole lot of inertia in the system. If Shakespeare is the most respected thing in 1900 then teachers will teach it in 1900 and academics will write about it in 1900 and if you're young in 1900 and want to be "in the know" and want to become an insider in academic literature, then, well, you'd better study Shakespeare; and so it's passed on from generation to generation. Furthermore, once something acquires a label, it's very hard to dislodge the label--even if the label is as the best author ever and there are more and more authors every day giving the old one a run for its money (and then some). I think there's one more reason, though, that we fetishize the old. I was reminded of it about a month ago while in a taxicab heading toward the Atlanta airport, and I saw a billboard advertisement for a Church that said, superimposed on the pastor's face: "In these troubled times, some things never change."

In about eighty years I will be dead, and in another eighty everyone who ever really knew me will be too. I will be at risk of being forgotten; everyone alive now will be, but most importantly for me, I will be. I would like to think that I will be remembered. We all would. And if we as a society spend so much time looking backward, so much time romanticizing those who died two hundred years ago, so much time replicating traditions born hundreds of years ago, then the future doesn't look quite so divorced from the present. And the thought that your society and your town and your way of life and maybe even you might be remembered in two hundred years doesn't seem quite so hopeless.

It's easy to get caught up in romanticization of the past and forget that it's the reason that 46% of Americans don't believe in evolution.
-------------------------------------------

This is not, of course, to say that Shakespeare should be banned. Everyone should be entitled to read what they want. But our laws should not be based on two hundred year old unchangeable documents, and schools shouldn't base their curriculum around analyzing Shakespeare; and next time you want to go see Citizen Kane playing in your local artsy movie theater, I think I'll pass.

_________________________________________________________________________________

*: The recent ruling on Obamacare rested on the personalities of two judges--John Roberts and Tony Kennedy. The reason that such an important case rested on personality instead of law or fact is because the vagueness of the constitution gives the justices free reign to rule as they wish. Obamacare was originally believed to be extremely unlikely to be overturned, and then underwent a series of transitions in terms of likelihood of being overturned, peaking at almost 80% on Intrade, before being upheld. And during this time, the constitution did not change; only one of the bumps even came from legal arguments in front of the court. Instead a year of speculation over the personal opinions of John Roberts and Tony Kennedy occurred. The constitution not only sets arbitrary and vaguely worded rules from a time when the nation was very different that are now almost impossible to change, but also allows people to judge the legality as they wish on almost any issue--even if they're a judge tasked with deciding whether a law will be upheld. But, you say, wouldn't we become a police state devoid of free speech if we lost constitutional protections? Look at the UK, whose constitution's main role is to establish a now-figurehead monarchy. Look at almost any comparable country--it's not going to rely as much on their constitution as we do on ours. If we want free speech--which we absolutely do--then let's have a law saying so.

Similarly, Shakespeare's non-comedies fix few of the flaws found in Much Ado About Nothing. Romeo and Juliet are incredibly flimsy characters, and the plot is absurd. (For those interested, the number of lines between when Romeo is first made aware of Juliet's existence and when he recites his first love sonnet about her is 32, and none of those involve any action on Juliet's part, let alone interaction between them.) Sure, you could say that the play is attempting to highlight the immaturity of youth, but at that point you're attempting to cite the one-dimensionalness of the main characters of a work as a strength.

And Shakespeare isn't alone in being a shitty writer from hundreds of years ago. The most ambitions woman in Pride and Prejudice has a life goal of marrying a rich, handsome man who is also intelligent--the thought that a woman could have a career or even hobby independent from her husband is outside the scope of the book. And don't get me started on The Canterbury Tales.

Tuesday, July 31, 2012

Introducing the Contest of the Week: What's Your Lineup?

What's the best lineup, order included, that can be made from the 2002 San Francisco Giants? Think you can come up with a better one than other readers?

---------------------------------------------------

Today I'm introducing a new feature of the blog: the contest of the week. Each week, I'll announce a contest to readers. Each week, the winner of the contest gets three Shadow-points, second place gets two Shadow-points, and third place gets one Shadow-point. Every two months, the person with the most Shadow-points in the period gets their name put on a Hall of Fame widget to the right of the blog (I'll create it once it's needed), a wallet-busting $2 prize, and the opportunity, if they want, to write a guest article for the blog.

So, on to the first contest:

What's Your Lineup?

I wrote Basim, a python script that simulates baseball games based on the stats of the people in the lineups. The details of how it works are in the link above; you can also see Basim simulating games live on the right of the blog. I've recently been looking into evaluating players with it, but this contest has to do with the original use of Basim: evaluating batting orders.

So, the first contest is to construct the best batting order from the 2002 San Francisco Giants roster. To enter the contest, submit a lineup from their players; I'll run Basim on all submissions, and the three highest average runs per game are the winners.

Rules:

1) The players you can draw from are listed here.

2) You can only use players who had at least 100 plate appearances with the Giants that year.

3) Your lineup must be defensively valid. That is to say you must have a first baseman, second baseman, etc. Shortstops and second basemen are considered interchangeable, and all outfielders and first basemen are also interchangeable. Third basemen and catchers can play first base, but not vice versa. The position that a person can play, up to interchangeability, is the one listed here. (Technical note: Dunston can play 2B, SS, OF, and 1B, in case you want to play him for some reason).

4) Your pitcher (in your lineup) must be Jason Schmidt.

5) I will run 1,000,000 simulations on each submitted lineup to find its average runs scored per game. The highest value will win and get three Shadow-points; second will get two, and third will get one.

So, for example, a submission might look like:

1. Benito Santiago (C)

2. J. T. Snow (LF)

3. Jeff Kent (SS)

4. Kenny Lofton (1B)

5. Barry Bonds (CF)

6. Jason Schmidt (P)

7. David Bell (3B)

8. Ramon Martinez (2B)

9. Marvin Bernard (RF)

Some (quite obvious) things to think about: where do you put Bonds? What do you think of stolen bases? What do you want in a leadoff hitter? Where do you put the pitcher?

Submissions are due by Monday, August 6th; I'll announce winners the next day. To submit a lineup, either email it to me (sambf at mit dot edu), or post it as a comment on this post. (Make sure to include your name.)

Good luck!

Sunday, July 29, 2012

Basim Live: Like baseball, but without the commercial breaks

Earlier I posted about Basim, a baseball simulator I designed that simulates baseball games based on the stats of people in the lineups. My long term project has been to design a stat eWAR, empirical wins above replacement, that attempts to determine how good offensively a baseball player is by seeing how many runs lineups including them produce.

Recently, I put up a widget on the top right of the site that runs Basim simulations live every time you load the page. It randomly chooses two 2011 mlb teams from the same league and plays a game between them, the same way I simulate games to calculate eWAR.

I'll continue to work on it--eventually I'd like to set up a simulated baseball season, shown live on the blog--but for now I'm interested in feedback on how to improve it.

Known things that could be improved:

1) Groundouts and flyouts could be treated differently, with batters replacing runners on first on groundouts that don't turn into double plays.

2) The batting orders, roughly speaking, are the most frequently used lineups from the team from 2011, but this can have some weird consequences, like not using someone who is generally in the lineup because they switch spots frequently enough that no single lineup involving them has been used very much. So, some of the orders that I'm using aren't very good. To that extent, if you see a lineup that is not very representative of a team, tell me and I can change it. (There are also potential oddities where the lineup lists I'm reading from only use last names, so, for example, for a while the Pirates were using Donald McCutchen in their lineup instead of Andrew McCutchen.)

3) Right now people are always thrown out trying to take an extra base on 3% of balls put in play; if someone could find better player-by-player data for this, that'd be awesome.

Anyway, feel free to shoot me any comments you have. And in the mean time, I hope you spend as much time staring at text-based baseball games as I have.

Addendum: if you know a good way to synthesize pitcher stats with hitter stats to predict an at bat, I'm all ears; right now the approximation I'm thinking of using is to add up the deviations from average of the two statistics, though of course in the end what I really want to do is do a regression....

Friday, July 27, 2012

The Playoffs and the Trade Deadline

Long story short: you should make aging all star for prospect trades at the start of the season, instead of at the trade deadline, if you're at least 70% sure, before the season starts, that the team getting the aging all star will be in playoff contention and that the other team will not.

_______________________

In Major League Baseball, teams are not allowed (or, in some cases, just significantly hindered) in making trades after the July 31st trade deadline. This left me wondering: in what cases does it make sense to wait until the trade deadline to make trades, and when does it make sense to do it at the start of the year?

Put another way, it's been about ten years since the Seattle Mariners were in playoff contention, and about twenty years since the Yankees weren't--the fact that this is also true this year shouldn't surprise anyone. So why didn't the Mariners trade Ichiro to the Yankees at the beginning of the year?

Most trade deadline trades, like the one involving Ichiro Suzuki, involve a team in playoff contention trading some future prospects and/or money to a team not in contention in return for an (often aging) borderline all-star level player. The advantage of making this trade at the beginning of the year, instead of waiting until halfway through for the trade deadline, is that the team in contention gets the good player for longer and thus the player has more utility for them; the disadvantage is that you might accidentally trade away a good player and then find yourself in playoff contention, or vice versa.

The first thing I investigated was the following: how much does gaining an Ichiro-like player help in the regular season, and how much does it help in the post season? Here were the assumptions I made.

1) All that matters is maximizing chances of winning the world series.
2) The player will add roughly 3 wins to the team (i.e. have a WAR of 3 more than the person they're replacing).
3) The team will be facing teams of equal strength not including the player traded for in the playoffs, and will win with a probability over .500 corresponding to the difference in their inherent winning percentage and their opponents'.
4) The team suspects that they will end up doing something between winning the division by 7 games, and losing by 10 games (roughly the average difference between division leader and second in division in 2011).

The probability of a team winning the world series p_WS = p_PL + p_PLW, the probability of making the playoffs + probability of winning the playoffs.

p_PL will be increased by ~3/20 by making the trade at the beginning of the season, meaning that p_PL will go from 50% to 65%, a proportional increase of .65/.5 = 1.3

To calculate the change in p_PLW, note that they have gained 3/162 ~ .0185 in winning percentage, meaning that they have a 51.85% chance of winning a playoff game (by assumption 3). So, whereas before p_PLW = 12.5%, now p_PLW = 15.6%, a proportional increase of about 1.25.

I had not been expecting this result; I had assumed that the playoffs would be random enough that player quality wouldn't matter as much as it would in the regular season (Because there are more games). But it seams that, in fact, a good player added to a good team might make about as much of an impact in the playoffs as in the regular season. (Note, however, that if you give yourself some partial victory credit for making the playoffs even if you don't win, that that would argue in the other direction.)

So, what's the upshot of this, as far as the trade deadline is concerned? Without trading you have a 50%*12.5% = 6.25% chance of winning the world series. If you trade at the beginning of the year, you have a .65*.156 = 10.14% chance of winning the world series. And if you make a trade at the trade deadline you get about 1.5 extra regular season wins from the player, so (by the same logic) you have roughly a .575*.156 = 8.98% chance of winning the world series.

So, making a before-season trade increases your probability of winning the world series by about 3.89%, whereas making it right before the trade deadline increases it by about 2.74%.

This means that the advantage of making the trade pre-season is that it's about 3.89/2.74 = 1.42 times as effective. Assuming that you'll be pretty sure by mid season whether you're in a playoff race, the question then becomes: before the season, are you at least 1/1.42 ~ 70% sure that your team will be in a competitive playoff race, and the other team won't be? In another post, I'll look at this question.

Thursday, July 26, 2012

Utilitarianism, part 5: Who Counts?

This is the fifth post in a series about utilitarianism. For an introduction, see the first post. For a look at total vs. average utilitarianism, see here. For a discussion of act vs. rule, hedonistic vs two level, and classical vs. negative utilitarianism, see here. For a response to the utility monster and repugnant conclusion, see here.

_______________________

Another question that sometimes gets asked about utilitarianism is: who, exactly, is included in the calculation of world utility when evaluating possible scenarios? People alive now? People who would be alive then? People who will live no matter what? This is a question which is also often bundled into total vs. average utilitarianism, but I've already written on that here, and, I hope, demonstrated that average utilitarianism doesn't work. I've also attempted to refute the "repugnant conclusion", the most commonly leveled criticism against total utilitarianism, with the "Sam is Great" rule here. So I will assume that no matter what group of people counts, the correct thing to do with them is to total up their utility in order to evaluate a scenario.

The first thing to note is that total utilitarianism naturally handles the somewhat unpleasant question of whether dead people matter, and if so at exactly what point someone is "dead": dead people have zero utility and so won't affect the total utility of the world anyway. You should include them in the calculation the whole time, but it will stop mattering once their dead enough to not feel any pain or pleasure.

Not Yet Living

The most common form of not counting some people in utilitarian calculations is generally called prior-existance utilitarianism, which states than when calculating the utility of possible universes you should only include people who are already alive, i.e. not people who will be born sometime between the current time and the time that you are analyzing. There are a number of variants on this, but the central idea is the same: there are some people who shouldn't yet count for evaluating the utility of future scenarios.

This idea, however, has two damning flaws. To understand the first flaw, look at the following scenarios.

The Future Pain Sink

There are currently ten people in the world W, each with utility 1 (where 0 is the utility of the dead). You are one of the ten, and trying to decide whether to press a button which would have the following effect: a person would be born with -10 utility for the rest of their life, which would last for 100 years, and you would gain 1 happiness for a year. (If you like, you can imagine a much more realistic version of "pressing the button" which results in short-term pleasure gain in return for an increase in unhappy population, but for the sake of reducing complications I will stick to the button model.) Do you press the button? Well, if you are a prior-existence utilitarianism, you would do it: you don't care about the not-yet-existing person's utility. You would prefer the button-pressed universe to the non-button-pressed one, because the ten existing people would be net happier, and so you would press the button. But then the new person comes into being, and you value its utility. And now you prefer the universe where you hadn't pressed the button to the one where you had, even though you haven't yet gotten any of the positive effects of the button press. You disagree with your past self, even though nothing unexpected has happened. Everything went exactly as planned. In fact, your past self could have predicted that your future self would, rationally, disagree with your current self on whether to press the button.

The Once King

Now, say that you're the king of an empire, the only empire on earth. You could manage your grain resources well, saving up enough for future years, encourage farming practices that preserve the soil, and reduce carbon emissions from anachronistic factories--in short, you could make some small short term sacrifices in order to let your kingdom flourish in fifty years. But the year is 1000, and so no currently living people are going to be able to survive that long except yourself (being a king, you have access to the best doctors of the age). And so, even though you're a utilitarian, you don't plan for the long term because it'll help people not yet born. But then fifty years pass and your kingdom is falling into ruin--and all of your subjects are suffering. And so you curse your past self for having been so insensitive to your current universe.

The problem with both of these scenarios, and in general with prior-existence utilitarianism, is that your utility function is essentially changing over time: its domain is the set of living people, a set which is constantly changing. And so your utility function will disagree with its past and future selves; it will not be consistent over time. This will give some really weird results, like someone repeatedly pressing a button, then un-pressing it, then pressing it again, etc., as your utility function goes back and forth between including a person and not including them. Any morality had better be constant in time, or it's going to behave really weirdly.

The second flaw is much simpler: why don't you care about future generations' happiness? Why did you think this would be a good philosophy in the first place? Why would you be totally insensitive to some people's utilities because of when they're born? Their happiness and their pain will be just as real as the current peoples', and to ignore it would be incredibly short-sighted, and a little bit bizarre, like a bad approximation to attempting to weigh your friends' utilities more than strangers'.

Friends and Family (and Self)

Speaking of which, the other common form of valuing people differently comes in valuing people close to you more (call it self-preferential utilitarianism). So, for instance, you might weight your own happiness with an extra factor of 1,000, your immediate family's with a factor of 100, and your friends with a factor of 10, or something like that.

Let me first say that there are of course good practical reasons to generally worry more about your own utility, and that of close friends and family, than that of strangers: it's often a lot easier to influence your own utility than someone you've never heard of before; you can easily have significant control on your own life and the lives of those close to you; and maintaining close relationships (and living a reasonably well off life, at least by global standards) can help to prevent burnout. But this is all already built into normal utilitarianism; to the extent that the utilitarian thing to do is to make people happy by talking to them and hanging out with them it's naturally going to be the case that this is best done with friends because you already have an established connection with them, you know that you'll get along with them, and it'd be difficult to find a random person and have an interesting conversation with them. Once again, it's important to make sure not to double count intuitions by explicitly adding a term for something that is already implicitly taken care of.

Even if you are undeterred by this and want to weigh friends, family and self more than strangers, as a philosophy this is going to run into more problems. First, to the extent that your friend base changes over time you could run into the same problems as prior-existence utilitarianism has with non-constant utility functions. Second, it has the weird property that two people will disagree about what the right thing to do is even if they have the same information and see the same options for how the universe should be. Third, as a consequence of this everyone being an optimal friend-preference utilitarian would, in many circumstances, be dominated by everyone being normal utilitarians. The easiest example of that is the prisoner's dilemma. Say there are two people in the world, A and B; each have a button they could press that would make them 1 util happier but cost the other 2 utils. If both are self-preferential utilitarians they would both push the button, leaving both worse off than if they had both acted as normal utilitarians--even by self-preferential utilitarian metrics. That is to say, everyone following self-preferential utilitarianism does not always lead to the optimal self-preferential outcome for everyone, and can in fact be dominated by other strategies that everyone could follow. Now that's not a problem with a way to describe people's motivations behind their actions, but it seems a bit weird to endorse a philosophy that produces sub-optimal results by its own measure. Fourth, there comes the sticky question of exactly what weights each person gets, and how that's decided.

All of these problems, in the end, come from the fact that self-preferential utilitarianism took utilitarianism and added an arbitrary, hard to define, non-universal, non-constant in time wrench into it. This is the downfall of many flavors of utilitarianism; I've made similar points about average utilitarianism, negative utilitarianism, and high/low pleasure utilitarianism. Like with average utilitarianism, in the end in some sense the problem with self-preferential utilitarianism is that it's not normal utilitarianism.

The astute observer will note that there is another divide in utilitarianism that would fit well under this title. But it deserves quite a bit more space than this, and so it will have to wait until a later day.