Comments on Measuring Shadows: Newcomb's Decision

You're a sick puppy. Get help.

2022-11-13T01:55:22.967-08:00

You're a sick puppy. Get help.

Do you really mean that "everyone would coope...

2012-11-28T01:40:12.349-08:00

Do you really mean that "everyone would cooperate"? I predict at least 10% of people would defect in scenario 2, but not scenario 1. I don't understand your argument, even at a high level. The less they are like you, the better it is to defect, no? I don't understand what you're saying with the discontinuity either.

But yeah, it seems like it's the same as the version of Newcomb's paradox where P uses a psychological twin to predict your move.

While I think anyone who defects in scenario 1 is completely absurd, I find it is more difficult to make an argument in favor of scenario 2 than you suggest.

I think to me, it boils down to something like this: Take action such that you are (expected to be) in worlds where your objectives are achieved. Importantly, when updating the distribution of worlds you're in, you should condition on the fact that you took what actions you did. It's not about causality or "free will". In particular, when considering counterfactuals, your actions can update the distribution of worlds you were in, just like the actions of others can.

If you want to continue this conversation, feel free to email/gchat me: WuTheFWasThat. I realize I could subscribe my email, but it's more annoying.

You're right, your scenario #2 is a much bette...

2012-11-26T19:10:57.111-08:00

You're right, your scenario #2 is a much better way to phrase it, because you don't care about your twin's payoff. I think you can make stronger claims than "I would cooperate" and "it's similar to Newcomb's paradox."

Surely everyone would cooperate in scenario #2: if you wouldn't cooperate, then I could tell you that this person has behaved opposite you in only 20% of circumstances. If you would still defect, then what if it were 1%? What about 0.001%? 0.1^10^30? There can't be a discontinuity at 100%, because if there were you would have to defect in scenario 1 because there's a *chance* that all the air molecules in the room will gather themselves in just the right way to refract the light so that your reflection behaves differently from you (I think it's about 0.1^10^30, plus or minus a few orders of magnitude in the order of magnitude.)

Also surely scenario #2 is the same as newcomb's paradox; just rename "twin" to "oracle," ask him for his move yesterday, and tell him to submit his move in the form of an amount of money in box #1, and you're done.

This is interesting, but different from Newcomb...

2012-11-26T14:47:54.306-08:00

This is interesting, but different from Newcomb's paradox, because you're not sure whether it's currently the last move. You might want to cooperate because it will help your future self. Similar to what Anders was saying with the simulation.

However, I think there is an interesting comparison to make along these lines:

Scenario 1: You are playing Prisoner's Dilemma with yourself. Quite literally, you are looking into a mirror. Raise your right hand to cooperate, and left to defect. If your mirror image raises his left hand, he cooperates; if he raises his right, he defects. You don't care about his payoff.

Scenario 2: You are playing Prisoner's Dilemma with someone you believe to be your psychological twin. You two appear to make the same decisions in pretty much all scenarios. You don't care about his payoff.

In Scenario 1, I believe everyone would agree to cooperate. Scenario 2 is similar to scenario 1, and also similar to Newcomb's problem. I personally would cooperate.

Obviously near the end there I meant '... and ...

2012-11-25T13:00:32.750-08:00

Obviously near the end there I meant '... and $1001 for "2", and ...'

Sam, I'm curious what you would do in this gam...

2012-11-25T12:47:13.331-08:00

Sam, I'm curious what you would do in this game:

2 people, call them A and B are going to play a prisoner's dilemma; they each write down "1" or "2." Afterward if they both wrote 1, they both get a thousand dollars; if they both write 2 they get a dollar, and if one writes 1 and one writes 2, then mister 1 gets nothing and mister 2 gets $1,001.

I tell you that you are player B, and that player A's made his move earlier; his move is written in this envelope right here.

ok, that's just a regular prisoner's dilemma; probably you should defect like in a regular prisoner's dilemma.

But now I tell you that player A was you yesterday. I took you into this experiment and told you everything I'm telling you now, including assuring you that player A's move was pre-written (although at the time I was lying and the envelope was empty). But then instead of evaluating your move, I wiped your memory and sent you home, and put your move into envelope A. In fact, you actually have no idea if, right now, it is really the last move of the game, or if I'm going to wipe your memory, put your move in the envelope, and bring you back tomorrow.

What do you write?

If you want to stand by 2-boxing, you have to say one of two things:
1) You would still defect in a prisoner's dilemma where you get to write both moves (they have to be the same, but you get to write both).
2) This is different from Newcomb's paradox, despite the fact that there's an envelope on the table which I claim predicts your move with reasonable confidence, and if it says "1" then you get $1000 for "1" and $1 for "2", and if it says "2" then you get $0 for "1" and $1 for "2".

Sam, are you really claiming that the optimal stra...

2012-11-25T12:27:19.152-08:00

Sam, are you really claiming that the optimal strategy is different for and AI than for a human?

In some games, someone is screwed from the beginning. If we play the "you win $1 per letter in your last name" game, then Sam is going to beat me no matter what.

Some games are not like that. In any game where your reward depends only on your strategy (and everyone is free to pick any strategy), that can't be the case; everyone is equal.

And this *is* such a game. Bluffing is irrelevant; once all the bluffing is done, you pick a strategy. Then, with some confidence p, the oracle puts money into the box according to your strategy (the temporal order of the things doesn't matter, assuming the oracle is good enough to see through bluffs). Then you implement your strategy. Then you get a reward.

It's like Jeff said - we're not proposing that you change what the oracle thinks without changing your strategy; we're proposing that you actually change your strategy. If you do that, you will make more money.

So you're in a situation where you're planning to play a game. You've got a strategy in mind, which will make expected value $1,000+1,000,000*(1-p). We've demonstrated that if you switch to a different strategy, you will win 1,000,000p. If p>0.5005, not switching is irrational.

2012-11-25T12:18:25.669-08:00

This comment has been removed by the author.

Why should simulation be the only way to get accur...

2012-11-23T15:21:23.255-08:00

Why should simulation be the only way to get accurate heuristics? If we're guaranteed to halt on the decision to one-box or two-box, a lot of the theory with negative results disappears, right? I think it's quite plausible that the decision is basically cached somewhere, especially for people who have already thought about the problem. We can restrict the paradox to people who P can predict in this manner (people who have already thought about Newcomb's problem and have the answer cached somewhere). Clearly you should have "one-box" cached in your memory, right?

I guess I think there is a separate interesting part to the problem, independent of this free will and simulation stuff. I am trying to make this aspect very explicit. How would you respond to the scenario as I described in the comments below?

I don't understand your connection to cryptography. Are you saying that circuit obfuscation is a mechanism for hiding your behavior, or something? But 1) P is not necessarily computationally bounded, and 2) my brain's code is probably not obfuscated. The existence of crypto doesn't really tell much about the scenario right? (I think I may just be misunderstanding you, here)

“(Note that I have to believe that P works that ac...

2012-11-23T15:12:06.122-08:00

“(Note that I have to believe that P works that accurately on my brain; it’s not enough if I believe P works statistically accurately on other people’s brains.)”

And perhaps this is the key: if P can predict me 51% of the time, why can’t an iterated P predict me 99.99999999999% of the time? And why can’t a mathematical analysis of P derive the exact probability of P’s prediction, resulting in a 100% accurate prediction?

That's kind of like saying that you shouldn...

2012-11-23T15:01:54.790-08:00

That's kind of like saying that you shouldn't push a fat person in front of a trolly because you might go to jail: it's true, of course, but only because you're adding more moving pieces to the original problem.

So, I would agree with your weaker claim in the li...

2012-11-23T15:00:53.931-08:00

So, I would agree with your weaker claim in the limit where the amount in box A over the amount in box B goes to infinity and P's accuracy goes to infinity, but in this case both evidential and causal decision theories will agree: you should choose to re-write your brain because it will cause, and correlate with, you getting more money.

Clarifying my claim about the above scenario: Ind...

2012-11-23T14:57:10.593-08:00

Clarifying my claim about the above scenario:

Indeed, it's not so much the one-boxing itself that is "correct". It's more the decision to be a one-boxer. But the two come hand in hand, since if you take two boxes, you've probably failed at being a one-boxer. Here is a weaker claim: If you were able to precommit right now to one-boxing would such a situation arise (e.g. by rewiring your brain to respond instinctively in that particular scenario) then you should do so. Burning bridges is sometimes useful.

Indeed, it will never happen in your lifetime, but...

2012-11-23T14:47:20.851-08:00

Indeed, it will never happen in your lifetime, but I don't see how it is contradictory.

Firstly, it's true both that in general, 1) you make choices, and 2) your choices are "already determined", in a cosmic sense. How do you resolve that?

Let me be concrete. Here's the scenario: P is a person, who started a show which offers people Newcomb-like games. The audience can see his putting of money into boxes. He predicts people's decisions with remarkable accuracy, and he appears to do this using a cerebroscope to do a rough simulation of their decision process. That's it. Is there a contradiction here?

I in fact claim that you should one-box in this scenario, despite having no "causal" influence on P's prediction.

Lastly, how is my position incorrect in any physical setting? My reasoning doesn't preclude me from behaving identically to you in normal situations.

Maybe its real-world irrelevance is reason to not think about it at all. I disagree with that, but if you're going to think about it, you might as well try to make the right choice. To me, the right choice is clearly to be the one-boxer, because I want the million dollars.

Well, I think the interesting core of this paradox...

2012-11-23T14:28:49.981-08:00

Well, I think the interesting core of this paradox is all about exactly what it would mean for P to predict your behavior so accurately. I suspect that the only way to get increasingly accurate heuristics for the behavior of an intelligent agent is to perform increasingly detailed simulations, indistinguishable from reality from the agent’s perspective. Whether this implies that the simulation has “free will” and/or “consciousness” is, of course, a Hard Question, but I sure don’t have any better theories.

It’s true that the paradox “only” requires a P with accuracy greater than 0.5005, at least for a risk-neutral agent. (Note that I have to believe that P works that accurately on my brain; it’s not enough if I believe P works statistically accurately on other people’s brains.) But the idea of a decision that can’t be approximated even with probability 1/2 + Ω(1) is familiar to cryptographers—it would not surprise me at all to learn that this construction has that property.

Sounds like Pascal's Wager. God puts heaven in...

2012-11-23T14:15:05.095-08:00

Sounds like Pascal's Wager. God puts heaven in the box only if He's sure you're not just pretending to believe in Him so you can get into heaven.

The reason I'm not actually going to be one is...

2012-11-23T14:05:20.423-08:00

The reason I'm not actually going to be one is because this particular scenario isn't every actually going to happen--it's a contradictory scenario where I both have choice and my choices have already been determined. It's not worth taking positions which are incorrect in any physical setting so that you could make $1,000,000 in an inconsistent one.

(Forget about branding. Let's assume P is alw...

2012-11-23T13:58:04.397-08:00

(Forget about branding. Let's assume P is always correct.)

That's right. If you "become" a one-boxer, it's not the fact that you took one box that's correct, it's the fact that you became a one-boxer. In some sense. But then you've basically said it yourself - deciding to be a one-boxer is correct. So why not just actually be one?

Ok, so assuming he's actually very correct (co...

2012-11-23T13:53:22.698-08:00

Ok, so assuming he's actually very correct (correct enough to see through posturing), then I'm screwed: I'm not a one-boxer, I'm a two-boxer, and if I "become" a one boxer I won't actually become a one-boxer: I'll just change what I'm thinking to the extent that I can to seem more like a one-boxer.

And if I actually "become" a one-boxer in a true sense then it's not the fact that I'll take one box that's correct--it's the fact that I've so convincingly branded myself as a one-boxer that P puts $1,000,000 in box A.

Okay, but re-read your paragraph. It says "b...

2012-11-23T13:44:44.013-08:00

Okay, but re-read your paragraph. It says "because he knows I'm a two-boxer". It seems like your counterargument is conditional on your being a two-boxer. It's not about posturing; it's about actually being a one-boxer.

Anders, what if you KNOW that all P does is look a...

2012-11-23T13:41:51.033-08:00

Anders, what if you KNOW that all P does is look at a scan of your brain, but instead of simulating it, does some heuristic checks that work for 99.99999999999% of human brains? Do you then one-box?

Though I agree with your argument, I disagree if you two-box in that scenario.

2012-11-23T13:39:14.956-08:00

This comment has been removed by the author.

Yeah, if I want to get the $1,000,000 I should hav...

2012-11-23T13:30:10.789-08:00

Yeah, if I want to get the $1,000,000 I should have been a one-box-taking-decision-theorist. But once P asks me how many boxes I should take it's too late--he's already left box A empty because he knows I'm a two-boxer. If I want the $1,000,000 I have to start posturing way before I actually take the boxes--and the posturing is going to be a lie, because even if I do try to posture like that I'll still take both boxes in the end.

Really the only way for me to get the $1,000,000 is to convince myself to be a one-boxer so much that it fools P into thinking I am one--a price not worth paying because it'll mean I'm intellectually wrong just in case some scenario that'll never come up in real life comes up.

So, I totally agree that if you're participati...

2012-11-23T13:27:09.213-08:00

So, I totally agree that if you're participating in this experiment the correct program is return 1. The thing is, though, it's not the fact that the program will only choose one box that's relevant; it's that the impartial observer looks at the program and decides that it will only take one box. As it turns out these programs are going to be simple enough that those are the same.

If we're offered this in real life I'm going to take two boxes and end up with $1,000, and you're going to take one box and end up with $1,000,000. But if for some reason I decided to take only one box I'd end up with nothing, and if you decided to take two boxes you'd end up with $1,001,000. What's screwing me over is not the fact that I'm taking two boxes; it's the fact that I am who I am. I've been screwed since I first heard about the scenario and decided what I'd do. In analogy to the AI scenario, I am the AI; if I want to get the $1,000,000 the trick isn't to take one box--it's to convince P that I'm going to take one box, and in order to do that you'd need to have a different AI then me.

The real choice is not whether to one-box or two-b...

2012-11-23T13:22:35.772-08:00

The real choice is not whether to one-box or two-box. The choice is, "Which decision theory should I use?". Without explicitly introducing this choice, I get confused reasoning about the relationship between P's decision and your boxing decision.

So you have the option of choosing a decision theory which precommits to taking only one box. And this decision causes you to end up with more money. So shouldn't you choose a decision theory like that?

You say "There's nothing I can do about [being screwed]". But now it's clear that there is something you can do: choose to be the type of person to one-box. Of course, once you decide it makes no sense, you're indeed screwed. But surely it was you that screwed yourself over, then?

Also, it's fine for there to be noise between the arrows.