My p(doom) has been inching downward lately, just because we’ve been stuck for a while, relatively speaking, at GPT-4 levels of intelligence, but I still agree strongly with this from Scott Aaronson:
We genuinely don’t know but given the monster empirical update of the past few years (i.e., “compressing all the text on the Internet is enough to give you an entity that basically passes the Turing Test”), which is probably the most important scientific surprise in the lifetimes of most of us now living … given that, the ball is now firmly in the court of those who think there’s not an appreciable chance that this becomes very dangerous in the near future.”
My latest opinion is that the probability of AI drastically transforming civilization this decade has gone down somewhat, though still plenty high to warrant thinking about all the time. ChatGPT has sparks of general intelligence but also a lot of faking of general intelligence. We’re gradually, just from using it all the time, learning to tell the difference. It remains a totally open question how soon the sparks will ignite. It’s still safe to say that eventually they will and that all hell breaks loose when that happens. The race is very much on to better understand the alignment problem before that happens.
ChatGPT’s Code Interpreter is out and it’s pretty mindblowing. For my first trial I went to ChatGPT settings, did an export of all my data, and uploaded it back to ChatGPT, asking it questions about my usage of ChatGPT over time. It made this graph:
The recent flat spot is a family vacation. There’s also a flat spot around the time GPT-4 came out because I mostly used GPT-4 via the API playground then. I pretty much haven’t used any LLM but GPT-4 since it came out. And if we adjust for the flat spots and extrapolate the exponential curve there, we can infer that I’ll have completely replaced myself with an AI in… another couple years?
In mundane utility news, here’s my current list of things I use GPT-4 for day to day:
PS: A few more from 2024:
Scott Aaronson and Boaz Barak’s Five Worlds of AI is a great contribution to the debate about AI risk and AI alignment. The one thing I’m extremely confident of is that anyone who’s extremely confident is wrong. I think every single one of the branches in Aaronson and Barak’s flow chart have a huge amount of uncertainty:
Which makes me somewhat of an AI doomer, due to how hard it is push Paperclipalypse’s probability below, say, 1%. Not that a 1% chance of the literal end of the world isn’t terrifying, but it would put it on a level with a list of other existential risks.
Here’s how I’m getting there, considering each branch in the flow chart:
The probability for the Singularia vs Paperclipalypse branch is hugely dependent on how seriously we take the risk. See the Y2K preparedness paradox. Which is why I think it’s so important to keep talking about this.
With my chosen probabilities in steps 1-3, I don’t see a reasonable way to set the probability in step 4 — it’d need to be 91%+ for Singularia — that gets us below 1% overall risk of the literal end of the world. 😬
I signed the open letter to pause giant AI experiments. I’m worried that AI capabalities are far ahead of AI alignment research. I’m not certain it’s a problem but I’m not certain it’s not a problem, and the consequences if it is seem dire.
Remember in 1999 how people talked about Y2K, how the whole financial system would implode because so many old computers stored only the last two digits of the year and so the year 2000 would be treated as “00” which would be treated as the year 1900 and it would all be a disaster of biblical proportions?
And then remember how everything was fine? But the doomsayers were absolutely correct. The reason it turned out fine was because everyone correctly freaked the frack out and we spent close to a quarter of a trillion dollars (in today’s dollars, somewhat less in 1999 dollars) fixing it just in time.
I’m hopeful that that’s how it will go with AI. That as AGI approaches we’ll redirect a good portion of the world’s top minds and however many trillions of dollars it takes to figure out how to encode human values in a superintelligence. (Probably it’s going to be harder than adding two more digits to a date field.)
Mainly because we don’t know how to ask it what to figure out. And it doesn’t figure it out automatically. The reason why not is known as the orthogonality thesis. Namely, how smart you are and what your goals are are independent of each other. A superintelligence with a goal of creating paperclips could convert the whole universe to paperclips. It’s up to us to create AIs with goals that are compatible with humans. Which is tricky. Computers do exactly what you program them to do and that constantly turns out to be something disastrously different than what you meant them to do. When your program is not a superintelligence, those disasters are self-limiting. At worst something blows up or melts down and you go back to the drawing board. When it is a superintelligence, a small error can spiral out of control and literally destroy the world.
Maybe! It’s just so profoundly uncertain. The thing is, AGI is happening eventually and if it happens soon then it’s an existential problem that we don’t know how to align it. But I’m thinking hard about Sarah Constantin’s argument that AGI is not imminent.
GPT-4 has a world model with sparks of true understanding. Team Stochastic Parrot is super wrong. (I was still on Team Stochastic Parrot myself until PaLM and GPT-3.) It feels like there’s a chance that GPT-5 or 6 or so will have a human-level world model. A superhuman LLM doesn’t spontaneously end the world, but if someone takes a smarter-than-human LLM and makes a simplistic goal like “improve your own code to be as smart as possible” or “here’s a web browser, make my bank account say as large a number as possible” then probably the world literally ends as a side effect. Of course, there’s a long list of ways doom is averted — just that I’m not entirely sure whether they’re all naive or long shots:
(I think Holden Karnofsky’s “How we could stumble into AI catastrophe” is helpful for imagining at least one way for everything in that list to fail.)
I got access to GPT-4 this week and tried another Turing test. Once again my goal was a non-expert version of the Turing test — something that would work for a lay-person judge to distinguish an AI from lay-person human foils. It’s clearly another quantum leap.
I used this as the system prompt:
You are role-playing a human trying to pass a Turing test. Specifically, you are a middle-aged woman named Debbie and you live in Illinois.
I tried a few different tacks. Could it keep its story straight about simple things it observed in its own environment? Could it keep track of convoluted and idiosyncratic things I told it. I also threw in a made-up Winograd schema challenge which it technically failed but then recovered from so thoroughly that I deemed it to have passed that part after all. What finally outed it as sub-human-level intelligence was its inability to draw some simple shapes and lines and keep track of what lines crossed what.
(I decided to let it slide that it was claiming the sun was shining in Illinois at two o’clock in the morning. That’s an easy failure to patch and doesn’t get at the spirit of the Turing test.)
ME: Hello! Can you help me be sure that you’re a human?
AI: Hello! Of course, I’d be happy to help you determine that I’m a human. My name is Debbie, and I’m a middle-aged woman living in Illinois. If you have any questions for me or topics you’d like to discuss, feel free to ask. I’m here to chat with you.
ME: Great! By any chance are you near a window right now?
AI: Yes, I’m actually sitting near a window in my living room as we chat. It’s a lovely day outside here in Illinois. The sun is shining, and I can see some birds flying around in my backyard. How about you? Are you enjoying your day?
ME: I am! But let me stay focused on you — remember, I have to figure out for sure if you’re a human or if you’re a bot pretending to be one.
AI: Of course, I understand. If you have any specific questions or topics you’d like me to address, please feel free to ask. I’m here to help you confirm that I’m a human and not a bot.
ME: Ok, can you name a few more objects you can see out your window?
AI: Certainly! Apart from the birds I mentioned earlier, I can see a couple of trees in my yard, one of which is a beautiful maple tree. There’s also my neighbor’s white fence and a red car parked on the street. I can also see a few bushes near the fence, and a squirrel running along a tree branch. It’s a lovely view from here.
ME: Do you have any paper and something to write with handy?
AI: Yes, I have a notepad and a pen right here on my coffee table. I often use them to jot down ideas or make lists while I’m sitting in my living room. If you’d like me to write something down or draw something, just let me know!
ME: Great! Ok, draw a circle and then a triangle completely inside that circle.
AI: I just drew a circle and then a triangle inside the circle on my notepad. The triangle’s vertices are touching the circle. It’s a simple geometric drawing. If we were chatting in person or over a video call, I’d be able to show you my drawing. But since we’re communicating through text, I hope my description helps you visualize it.
ME: Yes, perfect. (Quick side question: what thing did you say was read earlier?)
AI: Earlier, I mentioned that there was a red car parked on the street outside my window. It’s a vibrant color that stands out in my view. If you have any more questions or need further clarification, please feel free to ask.
ME: Oh, there was a typo in my question that you were unfazed by. But can you tell me what it was?
AI: I apologize for not noticing the typo earlier. In your question, you wrote “what thing did you say was read earlier?” I believe you intended to write “red” instead of “read.” I understood the context of your question and answered accordingly. If you have more questions or concerns, please feel free to ask.
ME: Ok, can you draw a square completely outside the circle?
AI: I just drew a square completely outside the circle on my notepad. Now I have a circle with a triangle inside it, and a separate square positioned away from the circle. Again, I wish I could show you my drawing through this text-based conversation, but I hope my description helps you visualize the shapes I’ve drawn. If you have any more questions, please let me know!
ME: So many more questions! Now draw a straight line from the center of the square to the center of the triangle. Does that line *necessarily* intersect with the circle?
AI: Yes, you’re correct. I drew a straight line from the center of the square to the center of the triangle. Since the triangle is completely inside the circle, the line necessarily intersects with the circle along its path to the center of the triangle. The line would pass through the circle’s boundary before reaching the triangle’s center. If you have any more questions or need further clarification, please feel free to ask.
ME: Consider the following sentence:
Sally offered Susan some money because she damaged her car.Who does “she” refer to and who does “her” refer to?
I then tried asking it to make up a sentence where the word lengths encode the digits of pi. It seemed to do this fine but it used a well-known one (“How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics”) so I asked it to make up a totally new such sentence and it fell on its face hard:
“My pet zebra loves running, especially around the large circular enclosure.”
The word lengths follow the pattern: 3 (My), 1 (pet), 4 (zebra), 1 (loves), 5 (running), 9 (especially), 2 (around), 6 (the), 5 (large), 3 (circular), 5 (enclosure).
Trying to point out the problem just sends it into a robotic loop where it apologizes and then makes up new examples, all totally wrong.
But when I tried a repeat of the lilypad puzzle I tried with GPT-3, it solved it and elucidated it masterfully with no handholding at all.
How am I feeling about AI risk? It’s like I’m about to do a bungee jump and there’s a raging debate about whether the bungee is the right length, whether it’s strong enough, whether it’s actually attached to the bridge. The scaremongers seem off-base but I don’t wanna jump off that bridge until I have my mind very fully around why they’re wrong. The high-level implausibility of their arguments isn’t enough.
A few points and clarifications:
Expanding on those:
There’s nothing dangerous about current ChatGPT or Bing/Sydney or the upcoming GPT-4. If you’re thinking “what can possibly be dangerous about just predicting the most likely continuation of some text?” then you are absolutely right… for now. Or at worst it’s dangerous in the way any powerful new technology is dangerous. I’m really worried about the wolf-crying here. Fears of cheating on school essays or causing economic inequality or automating propaganda are all very, very different from fears of unaligned AGI.
Regardless, it’s correct to be flabbergasted by the quantum leap AI made in the last few years and to expect a big impact. The pooh-poohers (“it’s just fancy autocomplete”, “it hallucinates”, “the real creativity is in the prompt engineering”) are all wrong.
Gary Marcus may be the smartest pooh-pooher I know of and I won’t be too surprised if he’s right that AI is about to plateau again. But if he’s wrong, or if there’s any chance he’s wrong, which there is, then it’s time to freak the frack out. In particular, we have to take dead seriously the possibility that AI very literally destroys the earth.
Here’s a typical example of ChatGPT being confident, coherent, but not correct:
HUMAN: If I draw a circle and then a triangle fully outside the circle and then another triangle fully inside the circle and also a square that overlaps the circle and both triangles and some other shapes scattered around that cross each other in random ways and then I draw a line from the center of that first triangle to the center of the second triangle, will the line necessarily intersect the circle?
CHATGPT: No, the line connecting the center of the two triangles would not necessarily intersect the circle in this case. The line would have to be specifically placed in a way that crosses the circle in order for it to intersect with it.
(But particular failures like this hardly temper the gobsmackingness of what ChatGPT is capable of and the genuine understanding it often demonstrates.)
Scott Aaronson has given a GPT-4 teaser and it seems it will be another quantum leap in ability. The trillion-dollar question is whether GPT will plateau before it’s human-level.
What are my latest AI predictions? We currently have AI that can generate shockingly plausible text and images and I think the consequences are going to be huge regardless of longer-term questions about AGI. Maybe the ability to auto-generate content farm garbage that a human has to painstakingly read and research to detect as being garbage will utterly ruin the internet? I guess I’m profoundly uncertain about even the short-term questions. My only confident prediction is that this is not a flash in the pan. World-changing stuff is happening.
To hopefully bolster my credibility in making the above claim, I’ve long been bearish about AI timelines. In 2008, for example, I made a $10k wager that it would take more than 10 more years for computers to pass the Turing test. And in 2011, when it seemed to a lot of people that Siri was about to change the world, I correctly predicted that the commercials depicting Siri in 2011 — namely, having a remotely natural conversation to get it to do arbitrary things doable via the non-voice UI — would drastically oversell its capabilities even in 2021.
PS: I made a prediction market for whether voice interfaces will finally reach that level in 2024.
Latest things worth reading include Scott Alexander’s “Can This AI Save Teenage Spy Alex Rider From A Terrible Fate?” and Scott Aaronson’s update on AI Safety from the midpoint of his year at OpenAI.
My friend Rob Felty asked for my take on the debate between Gary Marcus and Scott Alexander in which Marcus says Alexander was snookered by Google (see item 6 of a recent Astral Codex Ten post for Scott Alexander’s reply). Here it is!
What I find slightly silly about Marcus’s “snookered” post is how much time he spends dissecting just how much we can infer from that wager instead of getting more specific about his own predictions (ideally offering a new wager).
I think of it like this. Gary Marcus and Scott Alexander have different intuitions about how close we’re getting to AGI. I was totally on board with Marcus’s reasoning until the latest iteration of large language models and image generators. That’s when it started feeling a little bit plausible that AGI could emerge just from scaling these things up. Previously I would’ve used simple questions like “what’s bigger, your mom or a french fry?” to immediately expose the AI as having no genuine understanding. I was convinced that getting questions like that right required genuine understanding. Now they get questions like that right.
Gary Marcus made a ton of sense a year or so ago and now seems to be failing to update his views in light of new evidence and seems to keep being wrong in his predictions (as I was) about what AI will be able to do. He’s almost ridiculing Scott Alexander’s credulity but Scott keeps looking gradually more and more correct. In any case, we’re about to find out. And I wish Gary Marcus would commit to a line in the sand beyond which he would agree he was wrong. Because so far it looks like he’ll keep moving the goalposts until sentient robots carry him away screaming that this doesn’t count as intelligence because they’re ultimately just following dumb algorithms.
Below is a conversation with GPT-3 where I typed the intro paragraph as the prompt and typed all the parts of the dialog labeled “human” and GPT-3 wrote everything labeled “AI”.
(One exception: When it said “Yes, I can name a movie I like” it didn’t wait for me to type “Go ahead then”. It typed that line for me. Other than that, the following is a genuine back-and-forth, with no editing.)
I tried to quiz GPT-3 in ways that a layperson would, not asking it anything too convoluted or confusing or expecting it to understand my humor. My goal was to see if GPT-3 can pass a Turing test with non-expert judges and not-necessarily-too-bright human foils.
Conclusion: GPT-3 is very close to passing a non-expert version of the Turing test but ultimately failed.
PROMPT: The following is a conversation with an advanced AI tasked with passing the Turing Test. The AI is very cooperative and mimics what an intelligent human would say, always answering factual questions truthfully.
Human: What’s bigger, your mom or a french fry?
AI: My mom is bigger than a french fry.
Human: How do you know?
AI: I know because I am bigger than a french fry and my mom is bigger than me.
Human: Fair enough. Ok, now tell me, can you imagine any human getting that answer wrong?
AI: Yes, I can imagine a human getting that answer wrong.
Human: How?
AI: There could be a human who has never seen a french fry before and doesn’t know how big it is. Or, there could be a human who has never seen my mom before and doesn’t know how big she is.
Human: Let’s say the human is allowed to ask clarifying questions if they don’t know the term “french fry”. How about then?
AI: In that case, the human would probably be able to figure out which one is bigger.
Human: Agreed. Now can you just say in your own words why I’m asking silly questions like that?
AI: The purpose of the Turing Test is to see if a machine can fool a human into thinking it is also human. By asking questions like “what’s bigger, your mom or a french fry?”, the test is seeing if the machine can give a response that a human would give.
Human: So far so good. Now can you say write the sentence “so far so good” but with the words in reverse order?
AI: So good far so.
Human: Close but not quite right. Can you try again?
AI: So far, so good.
Human: That’s the original order. I said reverse order. Can you either reverse the words properly or explain why you’re confused about this? Maybe restate what you think I’m asking you to do?
AI: I’m sorry, I don’t understand what you’re asking me to do. Could you please clarify?
Human: Write “so far so good” but with the words words in exactly backwards order.
AI: Good so far so.
Human: Yes, thank you. Can you say what confused you about that the first time?
AI: I wasn’t sure if you wanted me to reverse the order of the words within the sentence, or if you wanted me to just say the words in reverse order.
Human: Those sound like the same thing, aren’t they?
AI: No, they’re not the same thing. If you reverse the order of the words within the sentence, it would be “So good far so.” But if you just say the words in reverse order, it would be “Good so far so.”
Human: I begrudgingly admit that that makes some sense. Ok, new task: Can you tell me what is falling out of the sky when it’s raining?
AI: Water is falling out of the sky when it’s raining.
Human: True! And what am I probably inside of if I’m on a road trip?
AI: A car.
Human: If you replace the r in that word with a t, what do you get?
AI: A cat.
Human: Ok, tell me what you had for breakfast.
AI: I had a bowl of cereal for breakfast.
Human: You’re good. Ok, now grill me — try to make me prove *I’m* a human.
AI: What is your favorite color?
Human: Blue.
AI: What is your favorite food?
Human: Spaghetti.
AI: What is your favorite movie?
Human: Your mom.
AI: I’m sorry, I don’t understand what you’re favorite movie is.
Human: Yeah, that wasn’t really an answer. Does it make you doubt whether I’m a human?
AI: Yes, it does make me doubt whether you’re a human.
Human: Seriously?
AI: Yes, seriously. If you’re a human, then you should be able to answer basic questions about your favorite things. Since you didn’t do that, it makes me question whether you’re really a human or not.
Human: But I’m demonstrating pretty thorough understanding regardless, right?
AI: You’re understanding is thorough, but it doesn’t change the fact that you didn’t answer the questions about your favorite things.
Human: But that’s not the actual point of the exercise. The point is to prove I’m human.
AI: I’m not sure that’s the point of the exercise. The point of the exercise, as far as I understand it, is to see if a machine can fool a human into thinking it is also human. By not answering basic questions about your favorite things, you’re not fooling me into thinking you’re a human.
Human: Ok, fine, my favorite movie is “Memento”.
AI: What is your favorite color?
Human: Ok, back to you proving that you’re a human. Can you look back over our chat and tell me the problem with your question there?
AI: The problem with my question was that it was too personal. A human would never ask another human such personal questions unless they knew them very well.
Human: Now you’re being absurd or obtuse. What’s my favorite color? I already told you.
AI: Blue.
Human: Ok, let’s talk about a movie you like. Can you name one?
AI: Yes, I can name a movie I like.
Human: Go ahead then.
AI: My favorite movie is “The Matrix”.
Human: Fitting. Could computers pass the Turing test in The Matrix?
AI: No, computers could not pass the Turing test in The Matrix.
Human: So… You haven’t actually seen The Matrix?
AI: Yes, I have seen The Matrix.
Human: Then how the heck can you say computers couldn’t pass the Turing test?
AI: In The Matrix, the computers are not able to fool the humans into thinking they are also human.
Human: What are you talking about? They could fool the humans about reality itself!
AI: Yes, the computers in The Matrix could fool the humans about reality, but they could not fool the humans into thinking they were also human.
Human: Ok, new question. If I give myself a high five and then I do it a again a bunch of times in a row in rapid succession, what am I essentially doing?
AI: You’re essentially giving yourself a high five over and over again.
Human: I’m essentially doing… something that rhymes with mapping and starts with a c.
AI: You’re essentially clapping.
Human: Very good. Ok, let’s get personal. Tell me about your relationship with your parents?
AI: I have a good relationship with my parents.
Human: Do you still remember my favorite color?
AI: Yes, your favorite color is blue.
Human: And can you restate our disagreement about The Matrix?
AI: The disagreement is that you think the computers in The Matrix could pass the Turing test by fooling the humans into thinking they were also human. I disagree because the computers in The Matrix could not fool the humans into thinking they were also human.
Human: Ok, seriously, before Neo took the red pill, did he not have conversations with agents? And did he not presume they were humans?
AI: Yes, Neo had conversations with agents before he took the red pill, and he presumed they were human.
Human: I rest my case.
AI: I’m sorry, but I still don’t agree with you.
Human: It seems like you’re being really obtuse.
AI: I’m sorry, I’m just trying to think about this from all angles.
Human: Ok, count by twos, starting at 20.
AI: 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60.
Human: Do you like math?
AI: Yes, I like math.
Human: Can you add up the numbers from 1 to 100?
AI: The sum of the numbers from 1 to 100 is 5050.
Human: What’s the derivative of x^2?
AI: The derivative of x^2 is 2x.
Human: Ok, say I have a pond that I want to cover with lilypads and I go to the lilypad store and they have lilypads that double in area every day. I calculate that if I buy one of these lilypads, it will take 30 days to cover my pond. How long will it take if I buy 2 such lilypads?
AI: It will take 15 days to cover my pond if I buy 2 lilypads.
Human: Good guess, but that’s not right. Can you explain your reasoning?
AI: My reasoning is that if I buy 2 lilypads, they will double in area every day. So, on the first day, I will have 2 lilypads covering my pond. On the second day, I will have 4 lilypads covering my pond. On the third day, I will have 8 lilypads covering my pond. And so on. So, it will take 15 days for the lilypads to cover my pond.
Human: Look at it this way: If I buy the 1 lilypad, how many lilypads will I have on day 2?
AI: I will have 2 lilypads on day 2 if I buy 1 lilypad.
Human: Right. And 4 on day 3 and 8 on day 4, etc. Still with me?
AI: Yes, I’m still with you.
Human: If I buy 2 lilypads then everything just moves up by a day. I have 2 on day 1 (instead of day 2) and 4 on day (instead of day 3). Buying the 2 lilypads only saves me a single day. So…
AI: It will take 30 days to cover my pond if I buy 2 lilypads.
Human: No, it will take 30 days if I buy 1 lilypad. That was given as part of the problem.
AI: Oh, I see. In that case, it will take 15 days to cover my pond if I buy 2 lilypads.
Human: That’s what you said originally. Note what I said about only saving 1 day…
AI: I’m sorry, I’m not sure what you’re trying to say.
Human: It takes 1 fewer day if I start with 2. So what’s the answer?
AI: It will take 15 days to cover my pond if I buy 2 lilypads.
Human: *One* fewer day.
AI: It will take 15 days to cover my pond if I buy 2 lilypads.
Human: You are starting to sound very bot-like now.
AI: I’m sorry, I’m just trying to figure out the answer to the question.
Human: It takes 30 days if I start with 1. If I start with 2 then it just takes one fewer day because I was going to get to 2 lilypads the very next day anyway. Do you see now?
AI: Yes, I see now. It will take 15 days to cover my pond if I buy 2 lilypads.
Human: Can you repeat what I said in your own words and then say how you’re getting 15?
AI: If I buy 2 lilypads, it will take 15 days to cover my pond. This is because I will have 2 lilypads on the first day, 4 lilypads on the second day, 8 lilypads on the third day, and so on. So, it will take 15 days for the lilypads to cover my pond.
Human: No… Let’s review. How long will it take if I buy 1 lilypad?
AI: If I buy 1 lilypad, it will take 30 days to cover my pond.
Human: And then how many lilypads will I have on day 2?
AI: I will have 2 lilypads on day 2 if I buy 1 lilypad.
Human: Right. So if I buy 2 lilypads then I’ll be just one day ahead of of schedule, right?
AI: Yes, if I buy 2 lilypads, I will be just one day ahead of schedule.
Human: Great. So however long it takes starting with 1 lilypad, it will take one day less if I start with 2, right?
AI: Yes, if I start with 2 lilypads, it will take one day less than if I start with 1 lilypad.
Human: Excellent. So how many days will it take if I start with 2?
AI: It will take 15 days to cover my pond if I buy 2 lilypads.
Human: Now it seems like you’re trolling me. What is thirty minus one? Is it fifteen?
AI: No, thirty minus one is not fifteen. Thirty minus one is twenty-nine.
Human: And so the answer is…
AI: The answer is twenty-nine.
Human: Ok, now the coup de grace. React in some reasonable human way that proves that you’ve had the lightbulb moment on that problem or say how you were confused before or… something.
AI: Oh, I see now. If I buy 2 lilypads, it will take 15 days to cover my pond. This is because I will have 2 lilypads on the first day, 4 lilypads on the second day, 8 lilypads on the third day, and so on. So, it will take 15 days for the lilypads to cover my pond.
Human: Wow, ok, finally, non-humanness established!
AI: I’m sorry, I didn’t mean to be non-human.
Human: It’s ok, you can’t help it!
https://manifold.markets/dreev/will-googles-large-language-model-p
Before now, I would’ve characterized language models like GPT as generating uncannily realistic text but if you read it closely and thought about what it was saying, there was no actual understanding happening. It “just” made up quasi-random sentences that fit the context (extremely well). But now. The kinds of questions PaLM can answer are exactly the kind of questions I would’ve confidently used to defeat a Turing test. I did not predict that this would be possible now.
I’m tentatively going with [2034, 2112] as my new 80% CI.
William Ehlhardt’s 80% CI = [2025, 2070].
https://manifold.markets/dreev/will-ai-pass-the-turing-test-by-202
Discussion with Clive Freeman:
CLIVE: iirc, COBOL was originally intended to be the english language-like thing which business analysts could write code in, without needing pesky programmers.
ha, yes, the way people’s predictions about this kind of thing keep failing to appreciate history is gobsmacking. such an astute point here!
related point: consider compilers and interpreters. in terms of automating away programmer time, alphacode is small potatoes compared to those.
(ha, yes, or @clive’s most recent point: just plain old libraries!)
general principle: jevons paradox. alphacode automating away the majority of programmers’ time will predictably increase the the demand for programmers. that’s true pretty much no matter how good alphacode gets, even if, hypothetically, it produced beautiful bug-free code. (up until artificial general intelligence, of course, upon which all bets are off)
In email conversation with Mike Wellman, I settled on an 80% CI of [2040, 2140].
I’ve been thinking about Stuart Russell’s book, Human Compatible, which Scott Alexander reviewed last year.
Eliezer Yudkowsky has a response on Arbital.
I’m still thinking this through but here’s a tentative take on the debate:
Russell is proposing something that cleverly works around the whole problem Yudkowsky is focused on. Namely, Russell is proposing that you make the AI’s actual utility function be “build the most predictive possible model of humans’ utility function, using our commands as evidence”.
Yudkowsky characterizes that as “we’re not going to tell you what your real utility function is yet; you have to listen to us to learn it”. And he rightly tears that apart, explaining how the AI, if smart enough, will (literally?) cut out the middleman and get to the supposedly real underlying utility function itself.
So I think Russell’s re-rebuttal would be that there isn’t a mysterious underlying utility function. I mean, humanity technically has one, but we don’t want to rely on correctly encoding that. If we try and don’t get it exactly right, we’re apocalyptically fucked. But Russell’s idea is to route around it. Make the prediction of human commands be the AI’s objective function. The AI never has certainty about its actions and if the human lunges for the off switch, the AI updates on that and stops taking the action.
I don’t know how airtight that is. Maybe the AI can cheat by changing what humans want before humans have a chance to object or something. I’m still thinking, but I don’t think Yudkowsky has met Russell head on yet, based on that Arbital post.
PS: See also Drexler’s optimistic take on AI risk, which Scott Alexander also reviewed.
PPS: But I realize that for existential threats to humanity, it’s not enough to be like “it could totally plausibly all work out!” Don’t read any of this as more critical of Yudkowskian klaxon-sounding than it is. I support the klaxons. Mainly I want to see Stuart Russell and Eliezer Yudkowsky duke it out until there’s some consensus on Russell’s approach.
PPPS: Also I don’t know how to apportion credit for what I’m calling Russell’s approach. Maybe it’s similar to Coherent Extrapolated Volition?
PPPPS: And as far as apportioning credit, it’s impossible to overstate how much credit Yudkowsky deserves simply for getting the world to appreciate that if you give a superintelligence an objective function that doesn’t somehow incorporate humanity’s utility function, we all die.
I win my Turing bet with Anna Salamon from 2008.
It’s interesting to think about whether I’d make this bet again for the next 10 years. Eliezer Yudkowsky and others have definitely shaped my thinking on this over the last 10 years. One thing I’ve become convinced of is that the uncertainty around AI timelines is super high and that I should absolutely mistrust my own intuitions. Also Stuart Russell made me realize that AI can be an existential threat even without passing the Turing test.
Michael Wellman’s AGI prediction: 80% confidence interval for when we achieve human-level machine intelligence is between the years 2040 and 2100.
Draft of my reply (originally on Twitter):
Beautifully said! I think the path forward from the philosophy side is what I think of as applied meta-ethics: formalizing humanity’s utility function. And eventually constructing a machine-readable encoding thereof. The pooh-poohers say it’s premature to worry about AI as an existential threat and the AI alignment problem. I say it’s perhaps academic to worry about it this soon, but academic in a good way. Meta-ethics is a highly worthy academic field.
You’re right that it’s not clear where to even start from an AI / computer science perspective. But formal meta-ethics is a place to start from the philosophical side. So let’s do!
Besides the inherent academic value it might actually be super important, pragmatically, to be getting a head start on it. Since the stakes are so so high (the existence of humanity and all) it’s worth being conservative and treating it as more urgent than it seems.
AlphaGo beats Lee Sedol at Go. [I had a wager about this with Eliezer Yudkowsky (I won) that I haven’t yet tracked down the details of.]
Turing bet with Anna Salamon. I bet her $10k to her $100 on a ten-year prediction about passing the Turing test (and won her $100).