Friday, November 23, 2012

Newcomb's Decision

This post is partially a continuation of my previous posts on utilitarianism, and partially on philosophy in general; mostly, it's my two cents on one of the odder parts of consequentialist debate: decision theories.

Newcomb's Paradox

You, a mere mortal, encounter P, some super smart alien.  Or maybe it's a supercomputer, or maybe a god; versions of the paradox differ on this.  P comes up to you and says: "I have a deal for you.  I'm going to give you two boxes--box A, and box B.  Box B is transparent, and you can see $1,000 in it.  You can't see what's in box A.  I'm going to give you two choices.  The first is to take box A--you get whatever is in it.  The other choice is to take both boxes--you get box A, plus the $1,000 from box B."

So, you ask, why don't you take both boxes, getting the free $1,000?  Well, says P, there's a catch: "I have predicted whether you will take one box or two boxes."  (Or maybe I've simulated all of the atoms in the universe, or maybe studied your psychology, or maybe something else--versions of the paradox differ in how P knows how many boxes you're going to take.  But however he knows it, you believe him; maybe he has, in the past, predicted everyone who's taken this challenge successfully.)  "So I know what you're going to do", says P, "and before you arrived I decided how much money to put in box A.  If I predicted that you were going to take only box A, I put $1,000,000 in it.  Otherwise--if I predicted that you were going to take both boxes--I left box A empty."

"So", says P, "How many boxes do you want to take?"


So, how many boxes should you take?  Well, this "paradox", and ones like it, have spawned countless arguments over "decision theories".  I would try to define them but I think it's easier to see the two main decision theories by example of how many boxes they take.  The first type of decision theory, evidential decision theory, says: well, what behavior is consistent with the highest expected value for me?  If I take one box (box A), I will get the $1,000,000 from it; if I take two, P won't put any money in box A, and so I'll just get the $1,000 from box B.  So, the evidential decision theorist would only take box A.

Causal decision theorists, on the other hand, say: what actions will cause the best results?  So, a causal decision theorist would say: if I take two boxes, then that'll cause me to get the $1,000 in box B, whereas if I only take one box I won't, and since P has already decided whether to put the money in box A, my decisions can't cause the money to exit or not exit.  And so the causal decision theorist would take both boxes.

So, who's right?  Well, the one-box-taking evidential decision theorist will take only box A, and--per the assumptions of the problem--find $1,000,000 in it.  The two-box-taking causal decision theorist, on the other hand, will take both box A and box B, confident that their actions can't change what's in box A--and then will find box A to be empty, and end up with $1,000.

So is evidential decision theory (ED) correct?  Did the causal decision theorist (CD) throw away $999,000 because they insisted on taking the $1,000 from box B?

Well, let's back up a second.  Why was there $1,000,000 in box A when the ED opened the box, but not when the CD did?  What exactly do we mean when we say that P knows how many boxes you will pick?

One of two things is true.  Either we live in a universe where, prior to you making your choice, P knows how many boxes you'll take, or we don't.

Say P doesn't know with certainty.  He's pretty sure--he's studied you a lot and studied psychology a lot and is pretty damn sure he knows how many boxes you'll take--but theoretically he could be wrong.  Well, in this case the two-box-taking CD's mistake was in how he lived his life up until that point.  He made lots of decisions that made P think he would take two boxes, and so P didn't put anything in box A for him.  If he really wanted that $1,000,000 he should have written lots of blog posts during his life about how he would take only one box so that when the day came he could convince P to put the $1,000,000 in box A--and then taken both boxes, to get the full $1,001,000.  But he didn't, and now that he's sitting there with both boxes in front of him he might as well take both of them--P has already decided that there isn't going to be any money in the first box.

But what if P knows for sure?  What if your blog posts can't fool him?  What if he's simulated every atom in the universe and knows whether you'll take one box or two?  Then shouldn't you choose to be an evidential decision theorist, and choose to take only box A, so that you can get the $1,000,000?

Well, the trick is in the word choose.  If P knows for sure how many boxes you'll take then it's already been decided--and it doesn't mean anything to talk about how many boxes you should choose to take.  It's already been decided how many boxes you're taking.

My point, I guess, is that evidential decision theory only makes sense in a universe where (a) P is sure how many boxes you'll take, but (b) you still have the option to take either one or both.  But these are contradictory assumptions--the contradictory assumptions behind Newcomb's paradox.

In fact, causal decision theory is the same thing as evidential decision theory in non-contradictory universes.  There is no distinction between actions that happen if and only if you make some decision with actions caused by that decision except in inconsistent universes where something can be dependent on a decision but somehow not causally related to it.

So what would happen if P offered this deal to me?  Well, I'd talk a lot about how much I only intend to take one box, but P wouldn't buy it; he'd leave box A empty, and I'd take both boxes for a total of $1,000.

But if my goal is to get more than $1,000 from this process I've been screwed for a while.  I've been screwed since first thought through this problem and realized that it made no sense to take only one box.  I've been screwed since the minute I was born and P realized I was going to be a two-boxer.  There's nothing I can do about it.

Though writing this blog post certainly won't help.


  1. If the reason P knows how many boxes you’d take is that he simulated every atom in the universe, then between the time he made the offer and the time you decided, how do you know that you are not part of the simulation?

    That is why I’m a one-boxer. (Universe simulators take note.)

    1. Oh man, I never thought of that before.

      (Though if they're actually simulating their simulation is probably deterministic, or at least random but not free-will-including, so it probably doesn't make sense to talk about making a decision in such a case.)

      Also, I guess I should note that if it's going to be an iterated Newcomb's paradox then obviously you take only one box for all but the last iteration.

    2. Anders, what if you KNOW that all P does is look at a scan of your brain, but instead of simulating it, does some heuristic checks that work for 99.99999999999% of human brains? Do you then one-box?

      Though I agree with your argument, I disagree if you two-box in that scenario.

    3. Well, I think the interesting core of this paradox is all about exactly what it would mean for P to predict your behavior so accurately. I suspect that the only way to get increasingly accurate heuristics for the behavior of an intelligent agent is to perform increasingly detailed simulations, indistinguishable from reality from the agent’s perspective. Whether this implies that the simulation has “free will” and/or “consciousness” is, of course, a Hard Question, but I sure don’t have any better theories.

      It’s true that the paradox “only” requires a P with accuracy greater than 0.5005, at least for a risk-neutral agent. (Note that I have to believe that P works that accurately on my brain; it’s not enough if I believe P works statistically accurately on other people’s brains.) But the idea of a decision that can’t be approximated even with probability 1/2 + Ω(1) is familiar to cryptographers—it would not surprise me at all to learn that this construction has that property.

    4. “(Note that I have to believe that P works that accurately on my brain; it’s not enough if I believe P works statistically accurately on other people’s brains.)”

      And perhaps this is the key: if P can predict me 51% of the time, why can’t an iterated P predict me 99.99999999999% of the time? And why can’t a mathematical analysis of P derive the exact probability of P’s prediction, resulting in a 100% accurate prediction?

    5. Why should simulation be the only way to get accurate heuristics? If we're guaranteed to halt on the decision to one-box or two-box, a lot of the theory with negative results disappears, right? I think it's quite plausible that the decision is basically cached somewhere, especially for people who have already thought about the problem. We can restrict the paradox to people who P can predict in this manner (people who have already thought about Newcomb's problem and have the answer cached somewhere). Clearly you should have "one-box" cached in your memory, right?

      I guess I think there is a separate interesting part to the problem, independent of this free will and simulation stuff. I am trying to make this aspect very explicit. How would you respond to the scenario as I described in the comments below?

      I don't understand your connection to cryptography. Are you saying that circuit obfuscation is a mechanism for hiding your behavior, or something? But 1) P is not necessarily computationally bounded, and 2) my brain's code is probably not obfuscated. The existence of crypto doesn't really tell much about the scenario right? (I think I may just be misunderstanding you, here)

  2. I think I disagree with both your claimed requirements for one-box to make sense. (Note that this does not mean I disagree with your arguments why two-box makes sense :P). I could elaborate, but that's not the point.

    I'm a physicist. Theory is good, but the real way to learn the truth is with experiments. Sam, let's do an experiment. We'll each write an AI for this game (with no randomness; if you allow coin flipping, then the game gets even more interesting :P). An impartial observer will read the the AIs, perhaps run some diagnostic experiments, and then put in money the boxes and let them play. If you want the Oracle to be less than perfect, we can instead do this: don't show him the code, but rather show him: with 1% probability the result of a single run of the code, and with 99% probability a single random (50-50) result. Then the oracle has got a 50.5% chance; good enough for me.

    We'll run a million trials, and see who has a higher expected value. No matter what theoretical discussions we have, we must believe that whichever strategy makes more money on average is a better strategy.


    PS: Here's my AI:

    def newcomb(whatever-inputs):
    return 1

    1. So, I totally agree that if you're participating in this experiment the correct program is return 1. The thing is, though, it's not the fact that the program will only choose one box that's relevant; it's that the impartial observer looks at the program and decides that it will only take one box. As it turns out these programs are going to be simple enough that those are the same.

      If we're offered this in real life I'm going to take two boxes and end up with $1,000, and you're going to take one box and end up with $1,000,000. But if for some reason I decided to take only one box I'd end up with nothing, and if you decided to take two boxes you'd end up with $1,001,000. What's screwing me over is not the fact that I'm taking two boxes; it's the fact that I am who I am. I've been screwed since I first heard about the scenario and decided what I'd do. In analogy to the AI scenario, I am the AI; if I want to get the $1,000,000 the trick isn't to take one box--it's to convince P that I'm going to take one box, and in order to do that you'd need to have a different AI then me.

    2. Sam, are you really claiming that the optimal strategy is different for and AI than for a human?

      In some games, someone is screwed from the beginning. If we play the "you win $1 per letter in your last name" game, then Sam is going to beat me no matter what.

      Some games are not like that. In any game where your reward depends only on your strategy (and everyone is free to pick any strategy), that can't be the case; everyone is equal.

      And this *is* such a game. Bluffing is irrelevant; once all the bluffing is done, you pick a strategy. Then, with some confidence p, the oracle puts money into the box according to your strategy (the temporal order of the things doesn't matter, assuming the oracle is good enough to see through bluffs). Then you implement your strategy. Then you get a reward.

      It's like Jeff said - we're not proposing that you change what the oracle thinks without changing your strategy; we're proposing that you actually change your strategy. If you do that, you will make more money.

      So you're in a situation where you're planning to play a game. You've got a strategy in mind, which will make expected value $1,000+1,000,000*(1-p). We've demonstrated that if you switch to a different strategy, you will win 1,000,000p. If p>0.5005, not switching is irrational.

  3. The real choice is not whether to one-box or two-box. The choice is, "Which decision theory should I use?". Without explicitly introducing this choice, I get confused reasoning about the relationship between P's decision and your boxing decision.

    So you have the option of choosing a decision theory which precommits to taking only one box. And this decision causes you to end up with more money. So shouldn't you choose a decision theory like that?

    You say "There's nothing I can do about [being screwed]". But now it's clear that there is something you can do: choose to be the type of person to one-box. Of course, once you decide it makes no sense, you're indeed screwed. But surely it was you that screwed yourself over, then?

    Also, it's fine for there to be noise between the arrows.

    1. Yeah, if I want to get the $1,000,000 I should have been a one-box-taking-decision-theorist. But once P asks me how many boxes I should take it's too late--he's already left box A empty because he knows I'm a two-boxer. If I want the $1,000,000 I have to start posturing way before I actually take the boxes--and the posturing is going to be a lie, because even if I do try to posture like that I'll still take both boxes in the end.

      Really the only way for me to get the $1,000,000 is to convince myself to be a one-boxer so much that it fools P into thinking I am one--a price not worth paying because it'll mean I'm intellectually wrong just in case some scenario that'll never come up in real life comes up.

    2. Okay, but re-read your paragraph. It says "because he knows I'm a two-boxer". It seems like your counterargument is conditional on your being a two-boxer. It's not about posturing; it's about actually being a one-boxer.

    3. Ok, so assuming he's actually very correct (correct enough to see through posturing), then I'm screwed: I'm not a one-boxer, I'm a two-boxer, and if I "become" a one boxer I won't actually become a one-boxer: I'll just change what I'm thinking to the extent that I can to seem more like a one-boxer.

      And if I actually "become" a one-boxer in a true sense then it's not the fact that I'll take one box that's correct--it's the fact that I've so convincingly branded myself as a one-boxer that P puts $1,000,000 in box A.

    4. (Forget about branding. Let's assume P is always correct.)

      That's right. If you "become" a one-boxer, it's not the fact that you took one box that's correct, it's the fact that you became a one-boxer. In some sense. But then you've basically said it yourself - deciding to be a one-boxer is correct. So why not just actually be one?

    5. The reason I'm not actually going to be one is because this particular scenario isn't every actually going to happen--it's a contradictory scenario where I both have choice and my choices have already been determined. It's not worth taking positions which are incorrect in any physical setting so that you could make $1,000,000 in an inconsistent one.

    6. Sounds like Pascal's Wager. God puts heaven in the box only if He's sure you're not just pretending to believe in Him so you can get into heaven.

    7. Indeed, it will never happen in your lifetime, but I don't see how it is contradictory.

      Firstly, it's true both that in general, 1) you make choices, and 2) your choices are "already determined", in a cosmic sense. How do you resolve that?

      Let me be concrete. Here's the scenario: P is a person, who started a show which offers people Newcomb-like games. The audience can see his putting of money into boxes. He predicts people's decisions with remarkable accuracy, and he appears to do this using a cerebroscope to do a rough simulation of their decision process. That's it. Is there a contradiction here?

      I in fact claim that you should one-box in this scenario, despite having no "causal" influence on P's prediction.

      Lastly, how is my position incorrect in any physical setting? My reasoning doesn't preclude me from behaving identically to you in normal situations.

      Maybe its real-world irrelevance is reason to not think about it at all. I disagree with that, but if you're going to think about it, you might as well try to make the right choice. To me, the right choice is clearly to be the one-boxer, because I want the million dollars.

    8. Clarifying my claim about the above scenario:

      Indeed, it's not so much the one-boxing itself that is "correct". It's more the decision to be a one-boxer. But the two come hand in hand, since if you take two boxes, you've probably failed at being a one-boxer. Here is a weaker claim: If you were able to precommit right now to one-boxing would such a situation arise (e.g. by rewiring your brain to respond instinctively in that particular scenario) then you should do so. Burning bridges is sometimes useful.

    9. So, I would agree with your weaker claim in the limit where the amount in box A over the amount in box B goes to infinity and P's accuracy goes to infinity, but in this case both evidential and causal decision theories will agree: you should choose to re-write your brain because it will cause, and correlate with, you getting more money.

    10. That's kind of like saying that you shouldn't push a fat person in front of a trolly because you might go to jail: it's true, of course, but only because you're adding more moving pieces to the original problem.

  4. This comment has been removed by the author.

  5. This comment has been removed by the author.

  6. Sam, I'm curious what you would do in this game:

    2 people, call them A and B are going to play a prisoner's dilemma; they each write down "1" or "2." Afterward if they both wrote 1, they both get a thousand dollars; if they both write 2 they get a dollar, and if one writes 1 and one writes 2, then mister 1 gets nothing and mister 2 gets $1,001.

    I tell you that you are player B, and that player A's made his move earlier; his move is written in this envelope right here.

    ok, that's just a regular prisoner's dilemma; probably you should defect like in a regular prisoner's dilemma.

    But now I tell you that player A was you yesterday. I took you into this experiment and told you everything I'm telling you now, including assuring you that player A's move was pre-written (although at the time I was lying and the envelope was empty). But then instead of evaluating your move, I wiped your memory and sent you home, and put your move into envelope A. In fact, you actually have no idea if, right now, it is really the last move of the game, or if I'm going to wipe your memory, put your move in the envelope, and bring you back tomorrow.

    What do you write?

    If you want to stand by 2-boxing, you have to say one of two things:
    1) You would still defect in a prisoner's dilemma where you get to write both moves (they have to be the same, but you get to write both).
    2) This is different from Newcomb's paradox, despite the fact that there's an envelope on the table which I claim predicts your move with reasonable confidence, and if it says "1" then you get $1000 for "1" and $1 for "2", and if it says "2" then you get $0 for "1" and $1 for "2".

    1. Obviously near the end there I meant '... and $1001 for "2", and ...'

    2. This is interesting, but different from Newcomb's paradox, because you're not sure whether it's currently the last move. You might want to cooperate because it will help your future self. Similar to what Anders was saying with the simulation.

      However, I think there is an interesting comparison to make along these lines:

      Scenario 1: You are playing Prisoner's Dilemma with yourself. Quite literally, you are looking into a mirror. Raise your right hand to cooperate, and left to defect. If your mirror image raises his left hand, he cooperates; if he raises his right, he defects. You don't care about his payoff.

      Scenario 2: You are playing Prisoner's Dilemma with someone you believe to be your psychological twin. You two appear to make the same decisions in pretty much all scenarios. You don't care about his payoff.

      In Scenario 1, I believe everyone would agree to cooperate. Scenario 2 is similar to scenario 1, and also similar to Newcomb's problem. I personally would cooperate.

  7. You're right, your scenario #2 is a much better way to phrase it, because you don't care about your twin's payoff. I think you can make stronger claims than "I would cooperate" and "it's similar to Newcomb's paradox."

    Surely everyone would cooperate in scenario #2: if you wouldn't cooperate, then I could tell you that this person has behaved opposite you in only 20% of circumstances. If you would still defect, then what if it were 1%? What about 0.001%? 0.1^10^30? There can't be a discontinuity at 100%, because if there were you would have to defect in scenario 1 because there's a *chance* that all the air molecules in the room will gather themselves in just the right way to refract the light so that your reflection behaves differently from you (I think it's about 0.1^10^30, plus or minus a few orders of magnitude in the order of magnitude.)

    Also surely scenario #2 is the same as newcomb's paradox; just rename "twin" to "oracle," ask him for his move yesterday, and tell him to submit his move in the form of an amount of money in box #1, and you're done.

  8. Do you really mean that "everyone would cooperate"? I predict at least 10% of people would defect in scenario 2, but not scenario 1. I don't understand your argument, even at a high level. The less they are like you, the better it is to defect, no? I don't understand what you're saying with the discontinuity either.

    But yeah, it seems like it's the same as the version of Newcomb's paradox where P uses a psychological twin to predict your move.

    While I think anyone who defects in scenario 1 is completely absurd, I find it is more difficult to make an argument in favor of scenario 2 than you suggest.

    I think to me, it boils down to something like this: Take action such that you are (expected to be) in worlds where your objectives are achieved. Importantly, when updating the distribution of worlds you're in, you should condition on the fact that you took what actions you did. It's not about causality or "free will". In particular, when considering counterfactuals, your actions can update the distribution of worlds you were in, just like the actions of others can.

    If you want to continue this conversation, feel free to email/gchat me: WuTheFWasThat. I realize I could subscribe my email, but it's more annoying.