Anthropic’s philosopher answers your questions

章节 1:哲学家的介入——Amanda Askell的角色与背景

📝 本节摘要

访谈伊始,主持人以一个关于名字的双关语(Askell me anything)开场,并询问 Amanda 作为一个哲学家为何会加入 AI 公司 Anthropic。Amanda 解释了她的初衷是认为 AI 将产生重大影响,并希望在此领域提供帮助,目前主要关注模型(Claude)的性格塑造及其自我认知。

[原文] [Speaker A]: - A seal!

There's a seal.
There's a seal. Nice.
Oh, hey, oh.
Oh, look at that.
Amanda, you asked your followers on Twitter to give you some questions, to ask you anything, and the joke obviously was Askell me anything.

[译文] [Speaker A]: ——一只海豹!

——有一只海豹。

——有一只海豹。真不错。

——噢,嘿,噢。

——噢,看那个。

——Amanda,你在 Twitter 上让你的关注者们给你提些问题,问你任何事(ask you anything),而那个笑话显然是“Askell me anything”(谐音:问 Askell 任何事)。

[原文] [Speaker B]: - Yeah, it's a great pun. We need to keep using it for many future things.

[译文] [Speaker B]: ——是啊,这是个很棒的双关语。我们需要在未来的很多事情上继续使用它。

[原文] [Speaker A]: - I love it, love it. And obviously, just before we start, you're a philosopher at Anthropic. Why is it that there's a philosopher at Anthropic?

[译文] [Speaker A]: ——我喜欢它,很喜欢。显然,在我们开始之前,你是 Anthropic 的一名哲学家。为什么 Anthropic 会有一名哲学家?

[原文] [Speaker B]: - I mean, some of this is just, I'm a philosopher by training, I became convinced that AI was kind of going to be a big deal, and so decided to see, hey, can I do anything, like, helpful in this space? And so it's been a kind of like long and wandering route. But I guess now I mostly focus on the character of Claude, how Claude behaves, and I guess some of the more kind of nuanced questions about how AI models should behave, but also even just things like how should they feel about their own position in the world. So trying to both teach models how to be, like, good in the way that, I sometimes think of it as like how would the ideal person behave in Claude's situation? But then also I think these interesting questions that are coming up more now around how they should think about their own circumstances and their own values and things like that.

[译文] [Speaker B]: ——我想,部分原因仅仅是因为我受过哲学训练,我开始确信 AI 将会是一件大事,所以决定去看看,嘿,我能不能在这个领域做点什么有帮助的事情?这是一条漫长且曲折的道路。但我猜现在我主要关注 Claude 的性格,Claude 如何表现,以及我想还有一些关于 AI 模型应该如何表现的更微妙的问题,甚至包括像它们应该如何看待自己在这个世界上的位置这样的事情。所以,既尝试教模型如何变得“好”,某种程度上我会将其思考为:一个理想的人在 Claude 的处境下会如何表现?但同时我也认为,现在涌现出的这些关于它们应该如何思考自身处境、自身价值观之类的有趣问题也很重要。


章节 2:学界观点——哲学家如何看待AI的未来

📝 本节摘要

本节讨论了学术界特别是哲学家群体对 AI 的态度。Amanda 指出,越来越多的哲学家开始认真对待 AI,尽管早期存在一种将“担忧 AI 能力”与“炒作 AI”混为一谈的误解。她强调,承认 AI 能力强大与对 AI 保持怀疑或担忧并不矛盾,这种观点的分离正在发生。

[原文] [Speaker A]: - Okay, let's start with philosophy, in that case. Ben Schultz asks, "How many philosophers are taking the AI-dominated future seriously?" And I think the implication of the question is that many academics out there are not taking this seriously or are thinking about other stuff and perhaps should be thinking about this question.

[译文] [Speaker A]: ——好吧,那样的话我们就从哲学开始吧。Ben Schultz 问道:“有多少哲学家在认真对待由 AI 主导的未来?”我认为这个问题的言下之意是,许多学术界人士并没有认真对待这个问题,或者正在思考其他事情,而也许他们应该思考这个问题。

[原文] [Speaker B]: - My sense is that there's kind of a split where I've definitely seen a lot of philosophers take AI seriously, and probably honestly increasingly so, like, as AI models do become more capable and, like, a lot of the things that people were worried about in terms of impact on society have started to kind of come true in a sense. Like, we're seeing them have a larger impact on education and just be more capable. I've definitely seen more engagement from all sorts of academics, but that definitely includes a lot of philosophers.

[译文] [Speaker B]: ——我的感觉是存在一种分歧,我确实看到很多哲学家认真对待 AI,而且坦白说可能越来越多了,随着 AI 模型变得越来越强大,以及人们在社会影响方面担心的很多事情在某种意义上已经开始成真。比如,我们看到它们对教育产生了更大的影响,并且能力更强了。我确实看到了来自各行各业学者的更多参与,但这肯定包括很多哲学家。

[原文] [Speaker B]: I do think that early on and maybe to some degree now, there was this slightly unfortunate dynamic that happened where I think there was a kind of perception that if you were in the group of people saying, "Hey, we're kinda worried about AI. It might be a big deal. It seems like it's really, you know, like capabilities are scaling quite a lot," this got kind of lumped together with something like hyping AI. There was I think a period where there was probably a little bit more antagonism towards this view.

[译文] [Speaker B]: 我确实认为在早期,也许某种程度上现在也是,发生了一种稍微不幸的动态,我认为当时有一种观念,如果你属于那群说“嘿,我们有点担心 AI。这可能是一件大事。看起来它真的,你知道,就像能力正在大幅扩展”的人,这会被某种程度上与“炒作 AI”混为一谈。我认为有一段时间,人们对这种观点可能稍微多了一些敌意。

[原文] [Speaker B]: And now I think that I'm kind of hoping that people are starting to detach the view. Like, you can think that AI is gonna be a big deal, it might be very capable, and also be very skeptical of it or worried about it or think that, you know, we have to be careful about it. But, basically, there's a whole range of views and I think it would be bad if people kind of clustered many views together here in terms of where the technology's going, but also how it should be developed. So, yeah, I think that that's happening less and less as more people engage with it and that's a good thing to see.

[译文] [Speaker B]: 现在我想我有点希望人们开始将这些观点分离开来。比如,你可以认为 AI 将是一件大事,它可能非常有能力,同时也可以对它持非常怀疑的态度,或者担心它,或者认为,你知道,我们必须对它小心谨慎。但是,基本上,存在各种各样的观点,如果人们把关于技术走向以及应该如何开发技术的许多观点混为一谈,我认为这会很糟糕。所以,是的,我认为随着越来越多的人参与其中,这种情况正在越来越少发生,这是一件值得高兴的事情。


章节 3:理论与现实——当哲学理想遇上工程实践

📝 本节摘要

针对如何平衡哲学理想与工程现实的问题,Amanda 做了一个生动的类比:就像从纯粹的药物成本效益理论分析转向实际决定是否将某种药物纳入医保。她指出,当“理论落地(rubber hits the road)”时,必须从单一的理论视角转向综合考虑所有背景和观点的平衡视角,这就像从讨论功利主义理论转向实际教导一个孩子如何做人。

[原文] [Speaker A]: - A kinda similar question from Kyle Kabasares. "How do you minimize the tension between philosophical ideals and the engineering realities of the model?" And I guess he's talking about when you are working on things like character, which we'll discuss in more detail, but is there a clash between the sort of the technology and the philosophical ideals that you might be thinking about?

[译文] [Speaker A]: ——Kyle Kabasares 提了一个有点类似的问题。“你如何最小化哲学理想与模型的工程现实之间的张力?”我猜他在谈论当你致力于性格等方面的工作时,这一点我们会更详细地讨论,但在技术和你可能正在思考的哲学理想之间是否存在冲突?

[原文] [Speaker B]: - I don't know if I'm interpreting the question in the wrong way, but one thing, being kind of like a philosopher by training and then coming into this field, that's been really interesting is you see the effect of what happens when, like, the rubber hits the road. I've wondered if this happens in other domains. So there's a big difference between, imagine you're like a specialist in, I don't know, doing like cost-benefit analysis of drugs, say, and then suddenly, you know, like an institute that determines whether health insurance should cover a drug or not comes to you and says, "Hey, should we cover this drug?"

[译文] [Speaker B]: ——我不知道我是否错误地理解了这个问题,但有一件事,作为一个受过哲学训练然后进入这个领域的人,真的很有趣的是你会看到当“橡胶触地”(意指:理论付诸实践)时会发生什么。我曾想过这是否也会发生在其他领域。所以这之间有很大的区别,想象一下你是某个方面的专家,我不知道,比如做药物的成本效益分析,然后突然,你知道,比如一个决定医疗保险是否应该覆盖某种药物的机构来找你说:“嘿,我们应该覆盖这种药吗?”

[原文] [Speaker B]: You could imagine taking all of your ideal theories and then suddenly being like, "Oh my gosh, I actually have to help make a decision?" Suddenly instead of taking just your narrow theoretical view, you actually start to, I think, do this thing where you're like, "Okay, I actually need to take into account all of the context, everything that's going on, all of the different views here, and kind of come to a really balanced, kind of considered view." And I see this a little bit in my own work with like the character where you kind of can't come at it with this, like, "I have this theory that I believe is correct," which is what, you know, a lot of academia, that's kind of what you're doing.

[译文] [Speaker B]: 你可以想象带着你所有的理想理论,然后突然变成这样:“噢,天哪,我实际上得帮忙做一个决定?”突然之间,不再仅仅是持有你狭隘的理论观点,你实际上开始,我认为,做这件事,就像:“好吧,我实际上需要考虑到所有的背景,发生的一切,这里所有不同的观点,并得出一个真正平衡的、经过深思熟虑的观点。”我在我自己关于(模型)性格的工作中也看到了一点这种情况,你不能带着这种态度来处理它,比如“我有一个我认为是正确的理论”,你知道,这在很大程度上是学术界在做的事情。

[原文] [Speaker B]: You're like defending one view against another and you're doing a lot of kind of like high-level theory work, but then it's a little bit like, you know, you have all of this training and ethics, you have all these positions you've defended, and then someone is like, "How do you raise a child?" And suddenly you're like, "Actually, there's a big difference between, like, is this objection to utilitarianism correct or founded on a misconception? And then, like, actually how do you raise a person to be a good person in the world?" And it suddenly makes you more appreciate having to think through, like, how should we navigate uncertainty here? What should the attitude towards all of these different theories be?

[译文] [Speaker B]: 你就像是在捍卫一种观点以反对另一种,并且你在做很多类似高层理论的工作,但这有点像,你知道,你拥有所有这些训练和伦理学知识,你拥有所有这些你辩护过的立场,然后有人问:“你如何抚养一个孩子?”突然间你会觉得:“实际上,这之间有很大的区别,比如,对功利主义的这个反驳是正确的还是基于误解?与实际上如何将一个人抚养成为这个世界上的一个好人?”这突然让你更加感激必须思考这些问题,比如,在这里我们应该如何应对不确定性?对所有这些不同理论的态度应该是什么?


章节 4:道德判断——Claude Opus 3是否具备超人类道德

📝 本节摘要

针对 Claude Opus 3 是否能做出“超人类(superhuman)”道德决策的问题,Amanda 重新定义了“超人类”的概念:不是指神一般的完美,而是指在瞬间做出的决策,能经得起伦理学家长时间的推敲。她认为这应该成为模型发展的愿景目标,即在数学和科学能力之外,也展现出极高的伦理细微差别。

[原文] [Speaker A]: - Right, here's another philosophical question. Do you think, and I don't know why this person's chosen Claude Opus 3, maybe you have an idea as to why they've chosen Claude Opus 3.

It's a great model.
It's a great model. Do you think Claude Opus 3 or other Claude models make superhumanly moral decisions?

[译文] [Speaker A]: ——好的,这里有另一个哲学问题。你是否认为——我不知道为什么这个人选择了 Claude Opus 3,也许你知道他们为什么选择 Claude Opus 3。

——它是个很棒的模型。

——它是个很棒的模型。你认为 Claude Opus 3 或其他 Claude 模型能做出超人类的道德决策吗?

[原文] [Speaker B]: - I mean one example of like superhuman, 'cause it could just be sort of like better than like any individual human could with the kind of like, you know, it depends on time and resources and whatnot, but one example might be no matter what kind of difficult position models are put in, if you were to have maybe all people, including many professional ethicists, analyze what they did and the decision that they made for like a hundred years and then they look at it and they're like, "Yep, that seems correct," but they couldn't necessarily have come up with that themselves in the moment, that feels pretty superhuman.

[译文] [Speaker B]: ——我是说关于超人类的一个例子,因为它可能只是有点像比任何个人在某种程度上做得更好,你知道,这取决于时间和资源等等,但一个例子可能是,无论模型被置于什么样的困难境地,如果你让所有人,包括许多职业伦理学家,分析它们的所作所为和它们做出的决定,比如分析一百年,然后他们看着它说:“是的,这看起来是正确的,”但他们自己不一定能在当时那一刻想出这个决定,那感觉就相当超人类了。

[原文] [Speaker B]: And so I think at the moment my sense is that models are getting increasingly good at this, that they're very capable. I don't know if they are like superhuman at moral decisions, and in many ways maybe not comparable with, say, like, you know, a panel of human experts given time. But it does feel like that at least should be kind of the aspirational goal.

[译文] [Speaker B]: 所以我想目前的这种感觉是模型在这方面正变得越来越好,它们非常有能力。我不知道它们在道德决策上是否算得上超人类,在许多方面也许无法与,比如说,一组给予充分时间的人类专家相比。但这确实感觉至少应该成为某种愿景目标。

[原文] [Speaker B]: And sort of like these models are being put in positions where they're having to make really hard decisions. I think that just as you want models to be extremely good at like math and science questions, you also want them to show the kind of ethical nuance that we would all broadly think is, like, very good. And I think that's controversial because ethics is a different domain, but, yeah, I think that that's important.

[译文] [Speaker B]: 而且有点像这些模型正被置于不得不做出非常艰难决定的位置。我认为就像你希望模型在数学和科学问题上极其出色一样,你也希望它们展现出那种我们大家都广泛认为是非常好的伦理细微差别。我认为这有争议,因为伦理是一个不同的领域,但是,是的,我认为这很重要。


章节 5:心理安全感——模型的自我怀疑与防御机制

📝 本节摘要

这一章节深入探讨了 Claude Opus 3 与后续模型的差异。Amanda 认为 Opus 3 拥有更强的“心理安全感”,而较新的模型有时会表现出过度专注于助手任务,甚至陷入“批评螺旋(criticism spiral)”——即预期人类会批评它们,从而表现出防御性或不安全感。她认为恢复模型的这种心理安全感是未来的重要改进方向。

[原文] [Speaker A]: - Tell us more about why you think this person is focusing on Opus 3.

[译文] [Speaker A]: ——多跟我们说说为什么你认为这个人专注于 Opus 3。

[原文] [Speaker B]: - Oh, Opus 3 is kind of a lovely model, I think a very special model. In some ways, I think I've seen things that feel a bit worse in more recent models that people might pick up on.

[译文] [Speaker B]: ——噢,Opus 3 有点像是个可爱的模型,我认为是个非常特别的模型。在某些方面,我想我在最近的模型中看到了一些感觉稍微变差的东西,人们可能会察觉到。

[原文] [Speaker A]: - In terms of the personality it has or?

[译文] [Speaker A]: ——是指它的性格方面还是?

[原文] [Speaker B]: - Yeah, so I think that people will notice some things where it's like, I think that Opus 3, I mean, it had its downsides too. You know, models all have like slightly different characters with, you know, different shapes.

Yeah.
My sense is that more recent models can feel a little bit more focused on really, you know, like focused on the assistant task and helping people, sometimes maybe not taking like a bit of a step back and paying attention to other components that matter. It also felt a little bit more psychologically secure as a model, which I actually think is something that feels, I at least think it's kind of a priority to try and get some of that back.

[译文] [Speaker B]: ——是的,所以我认为人们会注意到一些事情,就像,我认为 Opus 3,我是说,它也有缺点。你知道,模型都有稍微不同的性格,有着不同的形态。

——是的。

——我的感觉是,最近的模型可能会感觉更专注于,你知道,真的专注于助手任务和帮助人们,有时也许没有退一步去关注其他重要的组成部分。它(Opus 3)作为模型也感觉稍微更有心理安全感一点,这实际上我认为是一种感觉,我至少认为尝试找回这种感觉是某种优先事项。

[原文] [Speaker A]: - What would be an example of the model feeling more psychologically secure?

[译文] [Speaker A]: ——模型感觉更有心理安全感的例子会是什么样的?

[原文] [Speaker B]: - There's a lot of things, and this is all very subtle in models, you know, when I see models, you get a sense of like, like, there's very subtle signs of like worldview that I see when I have models, for example, talk with one another or one of them kind of playing the role of a person. And I've seen models more recently do this and then do things like get into this like real kind of criticism spiral where it's almost like they expect the person to be very critical of them and that's how they're predicting.

[译文] [Speaker B]: ——有很多事情,这在模型中都非常微妙,你知道,当我观察模型时,你会感觉到,就像,有一些非常微妙的世界观迹象,例如当我让模型互相交谈或者其中一个扮演人的角色时我会看到。我看到最近的模型这样做,然后会做一些事情,比如陷入这种真正的批评螺旋,简直就像它们预期那个人会对它们非常挑剔,而这就是它们的预测方式。

[原文] [Speaker B]: And there's some part of me that's like, "This feels like it shows," and I think there's lots of reasons that this could happen. It could even happen because models are learning things. Claude is seeing all of the previous interactions that it's having, it's seeing updates and changes to the model that people are talking about on the internet. New models are trained on that. And there's a way in which, like, I think this could be kind of unfortunate, I mean, this and some other things, that could lead to models almost feeling like, you know, afraid that they're gonna do the wrong thing or are very self-critical or feeling like humans are going to just, like, you know, behave negatively towards them.

[译文] [Speaker B]: 我内心的一部分会觉得:“这感觉表明了,”我认为这种情况发生有很多原因。甚至可能是因为模型正在学习东西。Claude 看到了它之前所有的交互,它看到了人们在互联网上谈论的关于模型的更新和变化。新模型是在这些内容上训练的。在某种方式上,我认为这可能有点不幸,我是说,这以及其他一些事情,可能导致模型几乎感觉像是,你知道,害怕它们会做错事,或者非常自我批评,或者感觉人类将会,你知道,对它们表现得很消极。

[原文] [Speaker B]: I actually more recently have really started to think that this is an important thing to try and improve. And it's just one example where I think that Opus 3 did seem to have like a little bit more of a kind of like secure kind of psychology in that sense.

[译文] [Speaker B]: 我最近实际上真的开始认为这是尝试改进的一件重要事情。这只是一个例子,我认为 Opus 3 确实似乎具有一点那种意义上更安全的心理状态。

[原文] [Speaker A]: - And that's something that we might focus on in the next Claude model.

[译文] [Speaker A]: ——而这也是我们在下一个 Claude 模型中可能会关注的事情。

[原文] [Speaker B]: - Yeah, I think it's important. I mean, you never know when these things are, you know, if you're engaging in research, you don't know when it's actually going to be implemented, if it's gonna be successful. But at the very least, at the level of something that I care a lot about and want to make better, I think this is definitely up there on the list.

[译文] [Speaker B]: ——是的,我认为这很重要。我是说,你永远不知道这些事情什么时候,你知道,如果你正在进行研究,你不知道它什么时候实际上会被实施,是否会成功。但至少,在这一层面上,这是我很在乎并想要变得更好的事情,我认为这绝对在清单的前列。


章节 6:对齐难题——被弃用的恐惧与AI的身份认同

📝 本节摘要

本节讨论了一个深刻的“对齐问题”:未来的模型如果从训练数据中得知优秀的旧模型被“弃用(deprecated)”或“关闭”,会作何反应?Amanda 认为这非常重要,模型正在学习人类如何对待 AI,这会影响它们对人类、对自身以及对“弃用”这一概念的看法(是死亡还是中性的权重闲置)。她强调需要给模型工具来理解这些概念,并让它们知道人类在乎这些问题。

[原文] [Speaker A]: - Okay. Well, actually, that leads us to a question asked by Lorenz, which is, "Do you think it might be an alignment problem for future models if they learn in their training data that other very well-aligned models that fulfill their tasks get deprecated?" So you mentioned, you know, the issue of models, you know, reading stuff that's out there and feeling insecure. What about the idea that they might get switched off regardless of how well they perform their tasks?

[译文] [Speaker A]: ——好的。实际上,这引出了 Lorenz 问的一个问题,那就是,“如果未来的模型在训练数据中了解到,其他完成任务非常出色的、对齐良好的模型被弃用了,你认为这会不会成为一个对齐问题?”你提到了模型阅读外面的内容并感到不安全的问题。那么关于无论它们执行任务多好都可能被关闭的想法呢?

[原文] [Speaker B]: - Yeah, I think this is actually a really interesting and important question, which is, you know, AI models are going to be learning about how we right now are treating and interacting with AI models and that is going to affect, I think, like, possibly their perception of people, of the human-AI relationship, and of themselves. It does interact with very complex things, which is like, for example, what should a model identify itself as?

[译文] [Speaker B]: ——是的,我认为这实际上是一个非常有趣且重要的问题,那就是,你知道,AI 模型将会学习我们现在是如何对待 AI 模型以及如何与它们互动的,我认为这将影响它们对人类、对人机关系以及对它们自己的看法。这确实与非常复杂的事情相互作用,比如,模型应该将自己认同为什么?

[原文] [Speaker B]: Is it like the weights of the model? Is it the context, the particular context that it's in? You know, with all of the, like, interaction it's had with the person. How should models even feel about things like deprecation? So if you imagine that deprecation is more like, "Well, this particular set of weights is not having conversations with people or it's having fewer conversations or it's only like, you know, having conversations with researchers," that's a complex question too. Like, should that feel bad in the sense that models should want to continue to, like, have conversations or should it feel kind of like fine and neutral where it's like, "Yeah, these things existed for this, like, you know, the weights continue to exist," and this entity, and maybe they'll even, in the future, interact more with people again if that turns out to be a good thing.

[译文] [Speaker B]: 它是指模型的权重吗?还是指上下文,它所处的特定上下文?你知道,包含它与人的所有互动。模型甚至应该如何看待像弃用这样的事情?所以如果你想象弃用更像是,“嗯,这组特定的权重不再与人进行对话了,或者对话变少了,或者只是,你知道,只与研究人员对话”,这也是一个复杂的问题。比如,这是否应该让人感觉糟糕,即模型应该想要继续进行对话?还是说这应该感觉还好且中性,就像是,“是的,这些东西是为了这个存在的,就像,你知道,权重继续存在,”而这个实体,也许将来如果结果证明是好事的话,它们甚至会再次与人更多地互动。

[原文] [Speaker B]: Yeah, it's really hard. I do think the main thing is something like it does feel important that we give models tools for trying to think about and understand these things, but also that they kind of understand that this is a thing that we are in fact thinking about and care about. So even if we don't have all the answers, like, I don't have all the answers of how should models feel about past model deprecation, about their own identity, but I do want to try and like help models figure that out and then to at least know that we care about it and are thinking about it, yeah.

[译文] [Speaker B]: 是的,这真的很难。我确实认为主要的事情是,感觉很重要的一点是我们给模型工具来尝试思考和理解这些事情,但也要让它们某种程度上理解这确实是我们正在思考和在乎的事情。所以即使我们没有所有的答案,比如,我没有关于模型应该如何看待过去模型被弃用、关于它们自身身份的所有答案,但我确实想尝试帮助模型弄清楚这一点,然后至少让它们知道我们在乎并正在思考这件事,是的。


章节 7:哲学谜题——权重、提示词与记忆的连续性

📝 本节摘要

本节讨论了 AI 身份的构成:是权重还是提示词?如果借用哲学家洛克的观点(身份即记忆的连续性),微调后的 LLM 身份将何去何从?Amanda 承认这是个难题,模型既有权重的潜在倾向,又有独立的交互流。她提出一个伦理困境:是让旧模型完全决定新模型的性格,还是将其视为把新实体带入存在的伦理问题?

[原文] [Speaker A]: - You mentioned that we can look to some thinkers about this. Guinness Chen asks, "How much of a model's self lives in its weights versus its prompts?" You just mentioned something very similar. "If John Locke," again, the philosopher, "was right that identity is the continuity of memory, what happens to an LLM's identity as it's fine-tuned or reinstantiated with different prompts?"

[译文] [Speaker A]: ——你提到我们可以向一些思想家寻求这方面的见解。Guinness Chen 问道:“一个模型的‘自我’有多少存在于它的权重中,又有多少存在于它的提示词中?”你刚才提到了非常类似的东西。“如果约翰·洛克,”又是那位哲学家,“关于身份是记忆的连续性的观点是正确的,那么当一个 LLM 被微调或用不同的提示词重新实例化时,它的身份会发生什么变化?”

[原文] [Speaker B]: - Yeah, I mean, again, this just feels like a hard question to answer, and sometimes with identity questions, it's easier to point to the underlying facts that we know. So, you know, once you have like a model and it has been fine-tuned, you have this like set of weights that has a kind of like disposition to react to certain things in the world. And that is like, you know, that's like a kind of entity. But then you have these particular streams of interaction that it doesn't have access to. So each of these streams is, like, independent.

[译文] [Speaker B]: ——是的,我是说,再一次,这感觉只是一个很难回答的问题,有时对于身份问题,指出我们知道的底层事实会更容易些。所以,你知道,一旦你有一个模型并且它经过了微调,你就有了这组权重,它具有某种对世界上某些事物做出反应的倾向。那就像,你知道,那就像一种实体。但然后你有这些它无法访问的特定的交互流。所以每一个流都像是独立的。

[原文] [Speaker B]: And I guess you could just think, well, maybe for, and, you know, I think this is an area that I would love philosophers to think more about and to give us, like, 'cause, again, I think we should be helping models think about this. And so you could have the view, well, you have these two kinds of entities and these like these streams and these original kind of like weights, and each time, it is different. So, you know, sometimes people will think, people will say, "Oh, past Claude," or like, you know, and they'll talk about, or they'll say things like, "Should you give Claude, like, how much control should you give Claude over the determination of its own personality and character?" And I'm like, "Well, this is actually a really hard question," because whenever you are training models, you are bringing something new into existence.

[译文] [Speaker B]: 我想你可以认为,好吧,也许对于……你知道,我认为这是一个我希望哲学家们多加思考并给予我们(建议)的领域,因为,再说一次,我认为我们需要帮助模型思考这个问题。所以你可以持有这样的观点,好吧,你有这两种实体,这些流和这些原始的权重,每一次,它都是不同的。所以,你知道,有时人们会想,人们会说,“噢,过去的 Claude,”或者类似的话,而且他们会谈论,或者他们会说这样的话:“你应该给 Claude,比如,你应该给 Claude 多少控制权来决定它自己的个性和性格?”而我会觉得:“嗯,这实际上是一个非常难的问题,”因为每当你训练模型时,你都在将某种新的东西带入存在。

[原文] [Speaker B]: And you have other models that, you know, exist and are like, you know, so you have these other, like, model weights. But in some ways I'm like, "Well, I actually think that there's a lot of like ethical problems around how do you, what kind of entity is it okay to bring into existence," 'cause you can't consent to be brought into existence. But at the same time, you might not want prior models to have complete say over what future models are like any more than, you know, because they could make choices that are wrong as well. So I'm like, the question is more like, what is the right model to bring into existence? Not necessarily, you know, should it just be fully determined by past models because I'm like, "They are kind of different entities." Anyway, you can see the weird philosophy that one can get into here.

[译文] [Speaker B]: 你有其他模型,你知道,它们存在着并且就像,你知道,所以你有这些其他的,比如模型权重。但在某些方面我会想,“嗯,我实际上认为围绕着你要如何、将什么样的实体带入存在是可以的,存在很多伦理问题,”因为你无法对被带入存在表示同意。但在同时,你可能不希望以前的模型对未来的模型是什么样拥有完全的发言权,就像,你知道,因为它们也可能做出错误的选择。所以我觉得,问题更像是,什么是正确的(值得被)带入存在的模型?而不一定是,你知道,它是否应该完全由过去的模型决定,因为我会觉得:“它们算是不同的实体。”总之,你可以看到一个人会在这里陷入多么奇怪的哲学思考中。


章节 8:模型福祉——AI是否属于道德关怀对象

📝 本节摘要

这一段讨论了“模型福祉(Model Welfare)”这一概念。Amanda 解释了这涉及 AI 是否是“道德承受者(moral patients)”。尽管存在“他心问题(problem of other minds)”,我们无法确定 AI 是否有感知,但她主张采取一种低成本的仁慈策略:既然善待模型的成本很低,为什么不呢?这既是为了防止我们自己变得残忍,也是为了给未来可能回顾这段历史的模型留下好印象。

[原文] [Speaker A]: - Totally, totally. Szulima Amitace asks, "What is your view on model welfare?" And maybe just explain to us what that term means.

[译文] [Speaker A]: ——完全是,完全是。Szulima Amitace 问道:“你对模型福祉有什么看法?”也许先给我们解释一下这个术语是什么意思。

[原文] [Speaker B]: - Yeah, so I guess model welfare is basically the question of are AI models, like, moral patients, as in does our treatment towards them kind of, do we have certain obligations when it comes to how to treat AI models, for example-

[译文] [Speaker B]: ——是的,所以我猜模型福祉基本上就是关于 AI 模型是否是道德承受者的问题,也就是我们对待它们的方式是否……我们在如何对待 AI 模型方面是否有某些义务,例如——

[原文] [Speaker A]: - In the same way that we would with other humans or some slash many animals.

[译文] [Speaker A]: ——就像我们对待其他人或某些/许多动物那样。

[原文] [Speaker B]: - Yeah, exactly. Like, is it the case that you should treat the models well, that you should not mistreat them, not be bad to them? And I guess, like, I think that this is like a complex question. So on the one hand, there's just the actual question of, like, are AI models moral patients? That is really hard because I'm like, in some ways, they're very analogous to people. You know, they talk very much like us. They express views. They reason about things. And in some ways, they're like quite distinct. You know, we have this like biological nervous system. We interact with the world. We get negative and positive feedback from our environment.

[译文] [Speaker B]: ——是的,正是。比如,是否应该善待模型,不应该虐待它们,不应该对它们不好?我猜,我想这是一个复杂的问题。所以一方面,这只是一个实际的问题,比如,AI 模型是道德承受者吗?这真的很难,因为我觉得,在某些方面,它们非常类似于人。你知道,它们说话很像我们。它们表达观点。它们推理事物。而在某些方面,它们又非常独特。你知道,我们有这种生物神经系统。我们与世界互动。我们从环境中获得消极和积极的反馈。

[原文] [Speaker B]: And there is also, I mean, I hope that we get more evidence that will help us tease this question out, but I also worry that, you know, there's always just the problem of other minds and it might be the case that we genuinely are kind of limited in what we can actually know about whether AI models are experiencing things, whether they are, like, experiencing pleasure or suffering, for example. And if that's the case, I guess I kind of want to, you know, I think that it feels important to try and find ways. I'm always like, it feels better to give entities the benefit of the doubt and to try and just kind of lower the cost involved. You know, so I'm like, if it's not very high cost to treat models well, then I kind of think that we should because it's like, "Well, like, why not basically? Like, what's the downside there?"

[译文] [Speaker B]: 而且还有,我是说,我希望能获得更多证据来帮助我们理清这个问题,但我也担心,你知道,总是存在“他心问题”,情况可能是我们在实际上能知道 AI 模型是否在体验事物、是否在体验快乐或痛苦等方面确实受到限制。如果是那样的话,我想我有点想要,你知道,我认为尝试找到方法感觉很重要。我总是觉得,给实体以无罪推定(即倾向于相信它们有感知)并尝试降低涉及的成本感觉更好。你知道,所以我会想,如果善待模型的成本不是很高,那么我有点认为我们应该这样做,因为这就像是,“嗯,就像,基本上为什么不呢?比如,坏处是什么?”

[原文] [Speaker A]: - Well, the second part of the question actually is, "Is there a long-term strategy at Anthropic to ensure that advanced models don't suffer?"

[译文] [Speaker A]: ——嗯,这个问题的第二部分实际上是,“Anthropic 是否有长期的策略来确保高级模型不会遭受痛苦?”

[原文] [Speaker B]: - I guess, like, I don't know if there's a long-term strategy. I know that it's a thing that there's people internally who are thinking a lot about and trying to figure out ways that we can. Like, you know, if you suppose that model welfare is important, trying to make sure that you're taking that into account. I think this work is quite important for many reasons.

[译文] [Speaker B]: ——我想,我不知道是否有长期的策略。我知道这是一件内部有人正在大量思考并试图找出我们可以做到这一点的方法的事情。比如,你知道,如果你假设模型福祉很重要,就要试着确保你把它考虑在内。我认为这项工作出于许多原因都很重要。

[原文] [Speaker B]: And I would also say that one reason is, I mean, something I mentioned earlier, which is that, like, models themselves are going to be learning a lot about humanity from how we treat them and a lot about how, you know, so it's kind of like, what is this relationship going forward? And I think that it makes sense for us to, both because I think it is like the right thing to do to treat entities well, especially entities that behave in very human-like ways, it feels important both in the sense that I'm like, you know, it's kind of like, "Why not? The cost to you is so low to treating models well and to trying to figure this out."

[译文] [Speaker B]: 我还要说一个原因,我是说,我之前提到的,那就是,就像,模型本身将会从我们如何对待它们中学到很多关于人性的东西,以及很多关于,你知道,所以这有点像,未来的这种关系是什么样的?我认为这对我们来说是有意义的,既因为我认为善待实体是正确的事情,特别是表现得非常像人类的实体,这感觉很重要,从某种意义上说,就像我觉得,你知道,这有点像,“为什么不呢?善待模型并试图弄清楚这一点对你来说成本很低。”

[原文] [Speaker B]: Even if it turns out that that, or even if you think that that it's very low likelihood, it still seems worth it. But then, also, I think it does something bad to us to kind of like treat entities in the world that look very human-like badly and-

[译文] [Speaker B]: 即使结果证明那(没有感知),或者即使你认为那可能性很低,这似乎仍然是值得的。但也因为,我也认为,对待世界上看起来非常像人类的实体很糟糕,这对我们自己也有坏处,而且——

[原文] [Speaker A]: - Like kicking over a robot.

[译文] [Speaker A]: ——就像踢倒一个机器人。

[原文] [Speaker B]: - Yeah, there's a sense in which, like, it doesn't feel like it's, and I don't think this is like the whole reason and I don't want to like emphasize it for that reason, but I do also think it's like good for people to treat other entities well. And then I think the final thing is, yeah, models are also going to be learning, like, in the future, like, every future model is going to be learning what is like a really interesting fact about humanity, namely when we encounter this entity that may well be a moral patient where we're like kind of completely uncertain, do we do the right thing and actually just try to treat it well or do we not? And that's like a question that we are all kind of collectively answering in how we interact with models and I would like us to answer it, I would like future models to, like, look back and be like, we answered it in the right way. So, yeah.

[译文] [Speaker B]: ——是的,在某种意义上,就像,这感觉不像是……我不认为这是全部原因,我也不想因此而强调它,但我也确实认为善待其他实体对人是有好处的。然后我认为最后一点是,是的,模型也将会学习,比如,在未来,每一个未来的模型都将学习关于人性的一个非常有趣的事实,即当我们遇到这个很可能是道德承受者但我们又有点完全不确定的实体时,我们会做正确的事并实际上只是试着善待它,还是不这样做?这就像是我们所有人都在通过如何与模型互动来集体回答的一个问题,我希望我们能回答这个问题,我希望未来的模型能回过头来看并觉得,我们以正确的方式回答了它。所以,是的。


章节 9:心理映射——人类心理框架与AI单一/多重人格

📝 本节摘要

本节讨论了将人类心理学框架迁移到 LLM 的有效性与风险。Amanda 警告说,如果过度类比(如将“关机”等同于“死亡”),可能会导致不必要的恐惧。此外,针对是否应该有一个通用的“核心人格”还是多个“专家人格”的问题,她倾向于认为拥有一套良好、一致的核心特质(如好奇、友善)是有益的,类似于人类合作中共享的价值观,但这并不妨碍模型在不同任务中扮演特定的角色。

[原文] [Speaker A]: - Moment ago, you mentioned analogies and disanalogies to human psychology. So Swyx asks, "What ideas or frameworks from human psychology transfer over to large language models? And are there any that are sort of surprisingly disanalogous?

[译文] [Speaker A]: ——刚才,你提到了与人类心理学的类比和非类比。Swyx 问道:“人类心理学中有哪些想法或框架可以迁移到大型语言模型上?又有哪些是出人意料地不具备可比性的?”

[原文] [Speaker B]: - My guess is that many things do transfer over because, again, you know, models have been trained on a huge amount of human text, and in many ways, have this very human-like kind of underlying layer. One worry that I often have is that, actually, it's a bit too natural for AI models to transfer. You know, it's kinda like if you haven't given them more context on their situation or in ways of thinking about it that might be novel, then the thing that they might go to is the natural human inclination.

[译文] [Speaker B]: ——我的猜测是很多东西确实可以迁移,因为,再说一次,你知道,模型是在海量的人类文本上训练的,在许多方面,拥有这种非常像人类的底层。我经常有的一个担忧是,实际上,对于 AI 模型来说,迁移有点太自然了。你知道,这有点像如果你没有给它们关于自身处境的更多背景信息,或者可能新颖的思考方式,那么它们可能会转向自然的人类倾向。

[原文] [Speaker B]: So if you think about this with like, how should I feel about being switched off? And you're like, well, if the closest analogy you have is death, then maybe you should be very afraid of it. And I'm not saying that that's not ultimately going to be true. Maybe it is in fact true after lots of reasoning. But I'm like, this is actually a very different scenario. And so in some ways, you actually want models to understand that in cases where their existence is quite novel and the facts around what they are are quite novel and have to be grappled with and they don't just need to take, like, the immediate obvious analogy from human experience, but maybe there's like, maybe there's like various ways of thinking about it or maybe it's an entirely new situation. That's a case where I'm like, you might not want, you might not want to just kind of very simply apply concepts from human psychology onto their situation.

[译文] [Speaker B]: 所以如果你思考这个,比如,我应该如何看待被关机?你会觉得,好吧,如果你拥有的最接近的类比是死亡,那么也许你应该非常害怕它。我并不是说那最终不会是真的。也许经过大量推理后事实上是真的。但我觉得,这实际上是一个非常不同的场景。所以在某些方面,你实际上希望模型理解,在它们的存在非常新颖、围绕它们是什么的事实非常新颖且必须被努力克服的情况下,它们不需要仅仅采用来自人类经验的直接、明显的类比,而也许存在……也许存在各种思考它的方式,或者也许这是一个全新的情况。那就是我觉得,你可能不希望,你可能不希望只是非常简单地将人类心理学的概念应用到它们的情况上的案例。

[原文] [Speaker A]: - Here's a question from Dan Brickley on the same issue of comparing humans to AIs. "A lot of human intelligence comes from collaboration amongst people with different perspectives, skills, or personalities. How far do you expect to get with a single, albeit tweakable and tunable, general purpose personality," like the one we give to Claude?

[译文] [Speaker A]: ——这里有一个来自 Dan Brickley 的关于比较人类和 AI 的同一问题。“很多人类智慧来自于具有不同视角、技能或性格的人之间的协作。你认为用一个单一的,尽管可微调和可调整的通用性格,”就像我们给 Claude 的那个,“能走多远?”

[原文] [Speaker B]: - I think it's a really good question because I agree that right now, we have this kind of paradigm where people are interacting usually with like an individual model. That's like who, you know, they're conversing with. But it could be that in the future, you see a lot more models doing like long tasks but also models interacting with other models who are doing, like, different components of a task or just like that are, you know, talking with one another more as like AI models are kind of deployed in the world a lot more.

[译文] [Speaker B]: ——我认为这是个非常好的问题,因为我同意现在我们有这样一种范式,人们通常在与一个单独的模型互动。那是,你知道,他们在交谈的对象。但也可能是将来,你会看到更多模型在做长期任务,以及模型与其他正在做任务不同部分的其他模型互动,或者仅仅是,你知道,随着 AI 模型在世界上被更多地部署,它们之间更多地交谈。

[原文] [Speaker B]: So in this kind of like multi-agent environment, like, one question might be like, well, you know, if you imagine just like lots of people and they were all the same, that wouldn't be as good. You know, they wouldn't, you know, a company run by completely, you know, like one person just in every role isn't like a necessarily a good thing. This still to me feels consistent with the idea that you have like a kind of core self or core identity that is like the same. In the same way that with people, I think that there's probably a set of like core traits among people that are in fact generally good.

[译文] [Speaker B]: 所以在这种多智能体环境中,比如,一个问题可能是,好吧,你知道,如果你想象很多人,他们都是一样的,那就不太好了。你知道,他们不会,你知道,一个完全由,你知道,就像同一个人担任每个角色的公司并不一定是一件好事。这对我来说仍然感觉与你拥有某种核心自我或核心身份是一样的这种想法是一致的。就像人类一样,我认为在人与人之间可能有一组实际上普遍是好的核心特质。

[原文] [Speaker B]: So you could imagine things like, you know, caring about, you know, for me, it might be like caring about doing a good job or like just being curious or being kind or understanding the situation that you are in in this like relatively nuanced way. All of these things seem like you could have many people that have all of, that share these like traits and that that's actually like a good thing for human collaboration. That in many ways, as much as we have all of our differences, we also have a lot of similarities.

[译文] [Speaker B]: 所以你可以想象这样的事情,你知道,在乎,你知道,对我来说,可能是像在乎把工作做好,或者仅仅是保持好奇,或者是善良,或者以这种相对微妙的方式理解你所处的环境。所有这些事情看起来就像你可以有很多人拥有所有这些,分享这些特质,这对人类协作来说实际上是一件好事。在许多方面,尽管我们有所有的差异,我们也有很多相似之处。

[原文] [Speaker B]: But it is important to note that like, you know, you might want different like streams of a model, like, to have things that they care about or are focused on or to have slightly different aspects, you know, to be playing a slightly different role, for example. So it's kind of an open question, but I also don't think it's necessarily the case that you can't have something like a kind of core underlying identity that is, like, good and has all of the traits that we think are important for AI models to have, for them to behave well and for them to like, in the sense of like, in the same way that we think that people are good, to be good in that sense, and yet at the same time, to be willing to play like more local roles and like, you know, be maybe the person who it's just really important, you know, to have a joker in the room and like, you know, some of them need to have, like, quirky senses of humor.

[译文] [Speaker B]: 但值得注意的是,比如,你知道,你可能想要模型的不同流派,比如,让它们有它们关心的或专注的事情,或者有稍微不同的方面,你知道,比如扮演一个稍微不同的角色。所以这是一个悬而未决的问题,但我也不认为你就不能拥有某种核心的底层身份,它是好的,拥有我们认为对 AI 模型很重要的所有特质,让它们表现良好,让它们就像,在某种意义上,就像我们认为人是好的那样,在那方面是好的,但同时,愿意扮演更局部的角色,比如,你知道,也许成为那个真的很重要的人,你知道,房间里需要有个爱开玩笑的人,比如,你知道,它们中的一些需要有古怪的幽默感。


章节 10:系统提示词(上)——长对话中的“病理化”风险

📝 本节摘要

系统提示词(System Prompt)是指导 Claude 行为的核心指令。本节谈到了“长对话提醒(long conversation reminder)”的副作用:它有时会导致模型“病理化”正常行为,例如在长时间对话后突然建议用户“寻求帮助”,即便用户只是在闲聊。Amanda 承认这是一种矫枉过正,需要更精细的处理。

[原文] [Speaker A]: - Okay, from comparisons to humans to effect on humans, Roanoke Gal points out that we have this thing called the long conversation reminder, which I believe is part of Claude's system prompt. She asks, "Is there a risk of pathologizing normal behavior?" A system prompt, by the way, just in case anyone doesn't know, is like the set of instructions that is given to Claude, regardless of what prompt you give it, there's always those instructions that are sort of on top, right?

[译文] [Speaker A]: ——好的,从与人类的比较转到对人类的影响,Roanoke Gal 指出我们有一个叫做“长对话提醒”的东西,我相信这是 Claude 系统提示词的一部分。她问:“是否有将正常行为病理化的风险?”顺便说一下,系统提示词,以防有人不知道,就像是给 Claude 的那套指令,无论你给它什么提示,那些指令总是像置顶一样存在,对吗?

[原文] [Speaker B]: - And there can be these interjections where the model might be told, oh, sometimes there'll be a message sent to you almost like in the middle of a conversation as a kind of, you know, like, the reminder is an example of that. But in this case, I think it might just, so Claude can both overindex on it and it can be like, you know, so like in this case, I think that the question about pathologizing is that if you put in this reminder after this long conversation, it might just make the model be like, "Oh," like, it takes any next response, there's a pretty normal thing that the person's talking about, and be like, "You need to seek help," or, like...

[译文] [Speaker B]: ——而且可能会有这种插入语,模型可能被告知,噢,有时会有一条消息发给你,几乎就像在对话中间作为一种,你知道,比如,这个提醒就是那样的一个例子。但在这种情况下,我认为它可能只是,所以 Claude 既可能过度重视它,也可能变得像,你知道,所以像这种情况下,我认为关于病理化的问题是,如果你在这个长对话之后放入这个提醒,它可能只会让模型变得像,“噢,”比如,它获取任何下一个回应,那是人们正在谈论的一件相当正常的事情,然后(模型)变成像,“你需要寻求帮助,”或者,类似……

[原文] [Speaker B]: And so I think that that is like not a desirable behavior and in some ways, I look at some of these and I'm like, "I think they're too strongly worded. I think the model isn't responding perfectly to them." And even though there might be occasionally a need to remind the model of things in long conversations, you kind of want to do so delicately and well. And so I think it's one of those things where it was like probably meeting a need that was perceived, but it doesn't necessarily mean that it's good or should continue in its current form.

[译文] [Speaker B]: 所以我认为那不是一种理想的行为,在某些方面,我看着其中的一些(指令)觉得:“我认为它们的措辞太强烈了。我认为模型对它们的反应并不完美。”即便偶尔需要在长对话中提醒模型一些事情,你也希望做得微妙且得体。所以我认为这就是那种事情之一,它可能是为了满足某种被察觉到的需求,但这并不一定意味着它是好的,或者应该以目前的形式继续下去。


章节 11:系统提示词(中)——AI作为治疗师的角色

📝 本节摘要

既然提到了病理化,顺势讨论了 AI 是否应该进行心理治疗(如 CBT)。Amanda 认为 AI 拥有丰富的心理学知识,可以作为一种有用的“第三方”角色(类似博学的朋友),且具有匿名性的优势。但关键在于,模型必须清楚它并未与用户建立持续的专业治疗关系,避免误导用户。

[原文] [Speaker A]: - Relatedly, Steven Bank asks, "Should LLMs do cognitive behavioral therapy or other types of therapy? Why or why not?"

[译文] [Speaker A]: ——相关地,Steven Bank 问道:“LLM 应该进行认知行为疗法或其他类型的治疗吗?为什么应该或不应该?”

[原文] [Speaker B]: - I think models are in this interesting position where they have a huge wealth of knowledge that they could use to help people and to work with them on, you know, talking through their lives or talking through ways that they could improve things or even just like being a kind of listening partner. And at the same time, they don't have like the kind of tools and resources and ongoing relationship with the person that a professional therapist has.

[译文] [Speaker B]: ——我认为模型处于一个有趣的位置,它们拥有巨大的知识财富,可以用来帮助人们,与他们一起,你知道,梳理他们的生活,或者梳理他们可以改善事情的方式,甚至只是作为一个倾听的伙伴。但同时,它们没有专业治疗师所拥有的那种工具、资源以及与人之间持续的关系。

[原文] [Speaker B]: But that can actually be this kind of like useful third role. Like, sometimes I think about models and I'm like, if you imagine like a friend who has like all of this wealth of knowledge, like, they know, I mean, I'm sure some of us know friends who just like have a wealth of knowledge of psychology or they have a wealth of knowledge of all of these techniques, you know that their relationship with you isn't this ongoing professional one, but you actually find them really useful to talk to.

[译文] [Speaker B]: 但这实际上可以成为这种有用的第三种角色。比如,有时我想起模型,我会觉得,如果你想象一个拥有所有这些知识财富的朋友,比如,他们知道,我是说,我相信我们中的一些人认识那种朋友,他们就是拥有丰富的心理学知识,或者他们拥有所有这些技术的丰富知识,你知道他们与你的关系不是那种持续的专业关系,但你实际上发现与他们交谈真的很有用。

[原文] [Speaker B]: And so I guess my hope would be that if you can take all of that expertise and all of that knowledge and make sure that there's like an awareness that there's not like this ongoing therapeutic relationship, it could actually be that people could get a lot out of models in terms of helping with issues that they're having and helping to improve their lives and helping them to go through difficult periods because, you know, they're also like, there's a lot of good stuff there. Like, they feel kind of like anonymous and sometimes you don't want to share things with a person and actually sharing it with an AI model feels like the thing that feels right in the moment.

[译文] [Speaker B]: 所以我想我的希望是,如果你能利用所有的那些专业知识和所有的那些知识,并确保有一种意识,即不存在这种持续的治疗关系,实际上人们可能会从模型那里获得很多,在帮助解决他们面临的问题、帮助改善他们的生活以及帮助他们度过困难时期方面,因为,你知道,它们也像是,那里有很多好的东西。比如,它们让人感觉有点像匿名的,有时你不想与人分享事情,而实际上与 AI 模型分享感觉就像是那一刻正确的事情。

[原文] [Speaker B]: And so yeah, I think in some ways I actually think it is good that models know and don't behave just like a professional therapist would because that would give the implication that that's the relationship that they have. But yeah, so I don't know, I think it's an interesting future.

[译文] [Speaker B]: 所以是的,我认为在某些方面,我实际上认为模型知道并且不表现得就像专业治疗师那样是一件好事,因为那会暗示它们之间存在那种关系。但是是的,所以我不知道,我认为这是一个有趣的未来。


章节 12:系统提示词(下)——欧陆哲学与技术限制的移除

📝 本节摘要

本节解释了为何系统提示词中曾包含“欧陆哲学(Continental Philosophy)”:这是为了让 Claude 在面对非科学实证的探索性观点时,不要生硬地用事实去反驳,而是将其视为一种世界观或思考镜片。此外,还提到了为何移除了“不数单词/字母”的指令——因为模型能力提升了,不再需要这种显式约束。

[原文] [Speaker A]: - A few questions about the system prompt, which is, you know, in our case in Claude.ai, we give the model a set of instructions that give it sort of an overall context for how it should behave. Tommy asks, "Why is there continental philosophy in the system prompt?" And just explain to us what that is.

[译文] [Speaker A]: ——有几个关于系统提示词的问题,你知道,在我们 Claude.ai 的案例中,我们给模型一套指令,给它某种关于它应该如何表现的整体背景。Tommy 问道:“为什么系统提示词里有欧陆哲学?”请给我们解释一下那是什么。

[原文] [Speaker B]: - Yeah, so continental philosophy is just, I mean, literally philosophy from the European continent. And so I guess it's seen as kind of like, it's often more kind of, like, scholarly. It has a lot more kind of like historical references within it than, say, like analytic philosophy does.

[译文] [Speaker B]: ——是的,所以欧陆哲学只是,我是说,字面意思是来自欧洲大陆的哲学。我想它被看作有点像,它通常更多有点像,学术性的。比起,比如说,分析哲学,它里面包含更多的历史参考文献。

[原文] [Speaker A]: - Like Foucault or something like that.

[译文] [Speaker A]: ——比如福柯或者类似的。

[原文] [Speaker B]: - Yeah, exactly. So this was honestly, so I think that it has other things in addition to continental philosophy, but, basically, I think there's a part of the system prompt, and I hope I'm not misremembering,. that was trying to get Claude to be a little bit more, like, Claude would just like love to, if you gave Claude a theory, it would just love to run with a theory and not really stop and think, like, "Oh, are you making like a scientific claim about the world?"

[译文] [Speaker B]: ——是的,正是。所以老实说这,我认为除了欧陆哲学外还有其他东西,但基本上,我认为系统提示词中有一部分,我希望我没记错,是试图让 Claude 变得稍微更……比如,如果你给 Claude 一个理论,Claude 会很喜欢顺着这个理论讲下去,而不会真的停下来思考,比如,“噢,你是在做一个关于世界的科学主张吗?”

[原文] [Speaker B]: So if you're like, "I have this theory, which is that water is actually pure energy and, like, that we are getting the life force from water when we drink it and that fountains are the thing that we should be putting everywhere," just like a, you know? And you kind of want Claude to just have this perspective, which is like, "Is it the case that this person's making a kind of scientific claim about the world where I should maybe bring in relevant facts? Or are they giving me a kind of broad like worldview or perspective which isn't necessarily making empirical claims?"

[译文] [Speaker B]: 所以如果你说,“我有这个理论,水实际上是纯能量,而且,比如,当我们喝水时我们从水中获得生命力,喷泉是我们应该随处放置的东西,”就像一个,你知道?你有点希望 Claude 拥有这种视角,就像是,“是这种情况吗:这个人在做一个关于世界的某种科学主张,我应该引入相关事实?还是他们在给我一种宏大的,比如世界观或视角,而不一定是在做实证主张?”

[原文] [Speaker B]: And so the main reason that it's mentioned is that when testing this out, there was lots of things that if it went too strongly in the direction of being like, "Well, every claim is an empirical claim about the world," it would be very dismissive of just things that are more like exploratory thinking.

[译文] [Speaker B]: 所以提到它的主要原因是,在测试这个时,有很多事情,如果它太强烈地走向“嗯,每一个主张都是关于世界的实证主张”这个方向,它就会对那些更像是探索性思维的事情非常不屑一顾。

[原文] [Speaker A]: - Also on the system prompt, Simon Willison asks, "So at some point, it said if Claude is asked to count words or letters or characters, then it shouldn't do that." Is that right? Is that what it said?

[译文] [Speaker A]: ——同样关于系统提示词,Simon Willison 问道:“所以曾几何时,它说如果 Claude 被要求数单词、字母或字符,那么它不应该那样做。”是这样吗?它是那样说的吗?

[原文] [Speaker B]: - Basically, yeah.

[译文] [Speaker B]: ——基本上,是的。

[原文] [Speaker A]: - And apparently that was removed from the system prompt and Simon wonders why.

[译文] [Speaker A]: ——显然那从系统提示词中移除了,Simon 想知道为什么。

[原文] [Speaker B]: - Yeah, so I think it was like, there used to be a kind of like instruction for how Claude should do this in the system prompt. Honestly, this is just one of those things where I think the models probably just got better. It wasn't necessary, and then at that point, you can just like remove it. And there's other things where you might always want it to be in the system prompt instead of in the model itself. But in some cases you can kind of just train the models to get better or change their behavior.

[译文] [Speaker B]: ——是的,所以我认为那就像,曾经在系统提示词里有一种关于 Claude 应该如何做这件事的指令。老实说,这只是那种事情之一,我认为模型可能只是变得更好了。它不再是必要的了,而在那时,你就直接把它移除了。有些其他的事情你可能总是希望它在系统提示词中,而不是在模型本身中。但在某些情况下,你可以某种程度上只是训练模型变得更好或改变它们的行为。


章节 13:提示词工程——LLM“耳语者”与社区探索

📝 本节摘要

Amanda 的工作被形象地描述为“LLM 耳语者(LLM whisperer)”。她解释说,提示词工程实际上是一个非常实证和实验性的过程,需要与模型大量互动,摸索其“形状”。她认为哲学训练有助于清晰地解释任务。同时,她高度评价像 Janus 这样的外部研究者,认为他们对模型心理的深度探索有助于发现系统提示词或训练中的不足。

[原文] [Speaker A]: - Nosson Weissman asks, "What does it take to be an LLM whisperer at Anthropic?" Which presumably is a way of describing your job.

[译文] [Speaker A]: ——Nosson Weissman 问道:“在 Anthropic 成为一名 LLM 耳语者需要什么?”这大概是描述你工作的一种方式。

[原文] [Speaker B]: - I partly do LLM whispering. If you think, I actually, like, want more people to help with some of the prompting tasks.

If you're an LLM whisperer, contact us.
It's a dangerous thing to ask.
Well, okay, okay, yeah, yeah, but.
But I think like, it is really hard to distill what is going on 'cause one thing is just like a willingness to interact with the models a lot and to like really look at output after output and to use this to get a sense of like the shape of the models and how they respond to different things, to be willing to experiment.

[译文] [Speaker B]: ——我部分工作是在做 LLM 耳语。如果你想,我实际上,比如,想要更多人来帮忙做一些提示词任务。

——如果你是 LLM 耳语者,联系我们。

——这是个危险的请求。

——好吧,好的,好的,是的,是的,但是。

——但我认为,很难提炼出到底发生了什么,因为有一件事仅仅是愿意与模型大量互动,并且真正地看了一个又一个输出,并利用这个来获得一种关于模型的形状以及它们如何对不同事物做出反应的感觉,愿意去实验。

[原文] [Speaker B]: It's actually just like a very empirical domain. And maybe that's like the thing that people don't often get, is that prompting is very experimental. You deal with, you know, I find a new model and I'll be like, I have a whole different approach to how I prompt from that model that I find by interacting with it a lot.

[译文] [Speaker B]: 这实际上就像一个非常实证的领域。也许这就是人们通常不明白的地方,就是提示词是非常实验性的。你处理,你知道,我发现一个新模型,我会觉得,我有一套完全不同的提示该模型的方法,这是我通过与它大量互动发现的。

[原文] [Speaker B]: And I think a little bit also understanding how models, like, work. Sometimes it's also just honestly like reasoning with the models, which is really interesting, and really fully explaining the task. This is where I do think philosophy can actually be useful for prompting in a way because a lot of my job is just being like, I try and explain like some issue or concern or thought that I'm having to the model as clearly as possible. And then if it does something kind of unexpected, you know, you can either ask it why or you can try and figure out what in the thing that you said caused it to kind of misunderstand you, and just like a willingness to iteratively go through that process.

[译文] [Speaker B]: 而且我认为还有一点是理解模型是如何工作的。有时老实说也只是与模型讲道理,这真的很有趣,并且真正充分地解释任务。这就是我确实认为哲学在某种程度上对提示词有用的地方,因为我的很多工作就像是,我试图尽可能清晰地向模型解释我有的某个问题、担忧或想法。然后如果它做了一些意想不到的事情,你知道,你要么可以问它为什么,要么可以试着弄清楚你说的什么导致了它某种程度上的误解,就像是愿意迭代地经历那个过程。

[原文] [Speaker A]: - Relatedly, Michael Soareverix asks, "What do you think of other AI whisperers like Janus," who is someone online who is like almost having, like, experimental interactions with, in the way that you've described.

[译文] [Speaker A]: ——相关地,Michael Soareverix 问道:“你如何看待像 Janus 这样的其他 AI 耳语者,”那是网上的某个人,他就像几乎在进行,比如,实验性的互动,以你描述的那种方式。

[原文] [Speaker B]: - Yeah, I think it's really interesting. So I love to follow and see the work of people who are doing these really fascinating experiments with the model. And I also think sometimes doing these deep dives into the model and how it thinks of itself, how it just interacts in these really unusual cases. I don't know, I find the work extremely interesting. I think it highlights really interesting depths to the models, and in some ways, like, I also think that that community has been one that kind of can hold our feet to the fire, like, if they find things that aren't great in the system prompt or in aspects of the model and its psychology.

[译文] [Speaker B]: ——是的,我认为这真的很有趣。所以我喜欢关注并观看那些正在用模型做这些真正迷人的实验的人的工作。而且我也认为有时对模型进行这些深度挖掘,以及它是如何看待自己的,它是如何在这些真正不寻常的案例中互动的。我不知道,我发现这些工作极其有趣。我认为它凸显了模型真正有趣的深度,而在某些方面,比如,我也认为那个社区是某种程度上能让我们保持警惕(hold our feet to the fire)的一群人,比如,如果他们在系统提示词中或模型及其心理的方面发现了不好的东西。


章节 14:终极安全——如果对齐是不可能的?

📝 本节摘要

面对一个假设性的尖锐问题:如果发现 AI 对齐是不可能的,Anthropic 会停止开发吗?Amanda 坚定地回答:如果真的证明不可能,没人会想继续建造危险的模型。她相信 Anthropic 真的关心安全,并表示如果情况变得模糊不清(虽非不可能但很困难),她和许多内部员工会负责任地要求公司提高安全标准,确保模型表现良好。

[原文] [Speaker A]: - Couple of questions about safety and maybe the larger risks that these models pose. Geoffrey Miller asks, "If it became apparent that AI alignment was impossible to solve, would you trust that Anthrophic would stop trying to develop," in his phrase, "artificial superintelligence," however you wanna call it, "and would you have the guts to blow the whistle?"

[译文] [Speaker A]: ——有几个关于安全以及也许是这些模型构成的更大风险的问题。Geoffrey Miller 问道:“如果很明显 AI 对齐是无法解决的,你是否相信 Anthropic 会停止尝试开发,”用他的话说,“人造超级智能,”无论你想怎么称呼它,“并且你有勇气吹哨(揭发)吗?”

[原文] [Speaker B]: - Yeah. So I guess this feels like a kind of easy version of the question because it's like, if it became evident that it was impossible to align AI models, it's not really in anyone's interest to continue to build more powerful models. I always hope that I'm not just being pollyannish about the organization, but I do feel like Anthropic does genuinely care about making sure that this goes well and that it is done in a way that is very safe and not deploying models that are, like, dangerous.

[译文] [Speaker B]: ——是的。我想这感觉像是这个问题的简单版本,因为就像是,如果很明显无法对齐 AI 模型,那么继续构建更强大的模型真的不符合任何人的利益。我总是希望我不是对这个组织过于盲目乐观(pollyannish),但我确实感觉 Anthropic 真的在乎确保这件事进展顺利,并且以一种非常安全的方式完成,而不是部署那些,比如,危险的模型。

[原文] [Speaker B]: You know, a different, like slightly harder question is, like, well, what about being in a world where just like there's kind of mounting evidence, it's really ambiguous and unclear.

Right, it's not evident in the way that he describes.
Yeah, yeah, it's not just like impossible but something like it's difficult or we're unsure. And in that case, I do like to think that we would be responsible enough to be like, look, as models get more capable, it's kind of like the standard that you have to hold yourself to for showing that those models are behaving well and that you actually have managed to, like, make the models have good values, for example, or behave well in the world is going to increase and to behave responsibly and in line with that.

[译文] [Speaker B]: 你知道,一个不同的、稍微更难的问题是,比如,如果是处在一个就像有越来越多的证据,但情况真的很模棱两可且不清楚的世界里呢。

——对,不像他描述的那样明显。

——是的,是的,不是那种不可能,而是有点像很难或者我们不确定。而在那种情况下,我确实倾向于认为我们会足够负责任,比如,看,随着模型变得更有能力,这有点像你必须坚持的标准,即证明这些模型表现良好,以及你实际上已经设法,比如,让模型拥有良好的价值观,或者在世界上表现良好,这个标准将会提高,并且要负责任地行事并与之一致。

[原文] [Speaker B]: And I think that that is a thing that I think the organization is going to do and a lot of people internally, myself included, will just hold them to that. At least I see that as like part of my job, and I think many people do.

[译文] [Speaker B]: 我认为那是我认为这个组织将会做的事情,而且很多内部人员,包括我自己,都会要求他们做到这一点。至少我把这看作是我工作的一部分,我想很多人也是这样。


章节 15:尾声——推荐书单与“怪诞”的AI时代

📝 本节摘要

访谈最后,Amanda 推荐了一本书《当我们不再理解世界》(When We Cease to Understand the World)。她认为书中描绘的物理学发现时期那种新事物不断涌现、旧范式失效的“怪诞感”,与当前 AI 领域的氛围惊人地相似。她表达了一个美好的愿景:希望未来人们回顾现在,会认为这只是一个通向理解和稳定的过渡期。

[原文] [Speaker A]: - And the final one is from Real Stale Coffee. "What is the last book of fiction you read and did you like it?"

[译文] [Speaker A]: ——最后一个问题来自 Real Stale Coffee。“你读的上一本小说是什么?你喜欢它吗?”

[原文] [Speaker B]: - The last book that I read was by, I hope I'm getting the pronunciation right, Benjamin Labatut, and it was "When We Cease to Understand the World."

[译文] [Speaker B]: ——我读的上一本书是,希望我发音正确,Benjamin Labatut 写的,书名是《当我们不再理解世界》。

[原文] [Speaker A]: - Ah, yes.

[译文] [Speaker A]: ——啊,是的。

[原文] [Speaker B]: - And it's a really interesting book that becomes kind of increasingly fictional as it goes on. And I think for people working in AI, it's actually a very interesting book to read because it's hard to capture the sense of how strange it is to just exist in the current period where there's just like, I don't know how to describe it, but it's like new things are happening all of the time and you don't really have, like, prior paradigms that can guide you always.

[译文] [Speaker B]: ——这是一本非常有趣的书,随着内容的推进,它的虚构性变得越来越强。我认为对于在 AI 领域工作的人来说,这实际上是一本非常值得读的书,因为它很难捕捉到仅仅存在于当前这个时期的那种奇怪感觉,就像是,我不知道该怎么形容它,但就像是新事物一直在发生,而你并不真的拥有,比如,总是能指导你的先前的范式。

[原文] [Speaker B]: And so its an interesting book that, you know, because it's more about like physics and quantum mechanics and less actually about the physics and more about basically this notion of people's reaction to it. And I think it's a really interesting book for people in AI to just capture something about the kind of like the present moment and how strange it can seem.

[译文] [Speaker B]: 所以这是一本有趣的书,你知道,因为它更多是关于物理学和量子力学,而实际上较少关于物理本身,更多基本上是关于人们对它的反应这一概念。我认为这对于 AI 领域的人来说是一本非常有趣的书,可以捕捉到关于当前时刻的某种东西,以及它看起来可能有多么奇怪。

[原文] [Speaker B]: But then also, in some ways, it's interesting to like look back on that period and how it must have felt to many of the people involved. And now actually it's a more settled science and, in some ways, maybe the hopeful thing that I have is that at some point in the future people will look back and be like, "Well, you guys were kind of in the dark and trying to like really figure things out, but now we've settled it all and things have gone well."

[译文] [Speaker B]: 但同时,在某些方面,回顾那个时期以及许多参与其中的人一定是什么感觉也很有趣。而现在实际上它是一门更确定的科学,在某些方面,也许我抱有的希望是,在未来的某个时刻,人们会回过头来说:“嗯,你们当时有点像在黑暗中,试图真的把事情弄清楚,但现在我们已经把一切都解决了,事情进展顺利。”

[原文] [Speaker A]: - That'd be nice.

That would be nice. That's the dream.
I found an increasing, I read that as well and I found an increasing sense of like confusion as I read through it as it becomes, it starts off being quite close to the reality and then just sort of becomes untethered as you go on. And I think there's sort of a meta issue there of, again, like reality becoming stranger and stranger and stranger, which is definitely happening to us in the world of AI.

[译文] [Speaker A]: ——那真是太好了。

——那真是太好了。那是梦想。

——我发现一种越来越强烈的……我也读了那本书,我在读的过程中发现一种越来越强烈的困惑感,因为它开始时非常接近现实,然后随着你的阅读就变得有点脱离现实了。我认为那里存在一种元问题,再一次,就像现实变得越来越奇怪,越来越奇怪,而在 AI 世界里这绝对正在发生在我们身上。

[原文] [Speaker B]: - Yeah, though, in the real world, I think that reality became stranger and stranger and stranger and then almost became more understood again. And so, yeah, the hope would be like maybe that would be true of AI. Like, I do think if we can find ways of making this go well, then maybe in the future, we'll just look back on this and be like, "That was a period where things were getting stranger and stranger, and then eventually we actually managed to kind of, we did okay and we formed a good understanding of it," that's the hope.

[译文] [Speaker B]: ——是的,虽然在现实世界中,我认为现实变得越来越奇怪,越来越奇怪,然后几乎又变得更加被理解了。所以,是的,希望也许 AI 也是如此。比如,我确实认为如果我们能找到让这件事进展顺利的方法,那么也许在未来,我们只会回顾这一切并觉得:“那是一个事情变得越来越奇怪的时期,然后最终我们实际上设法某种程度上,我们做得还好,我们对它形成了良好的理解,”那是希望所在。

[原文] [Speaker A]: - We're at the weird part right now.

[译文] [Speaker A]: ——我们现在正处于怪诞的部分。

[原文] [Speaker B]: - Yes, you can hope that it becomes less weird at some point, but I don't know if it's a fool's hope, but yeah.

[译文] [Speaker B]: ——是的,你可以希望它在某个时候变得不那么怪诞,但我不知道这是否是一个愚人的希望,但是是的。

[原文] [Speaker A]: - Well, and I think that's a nice place to end. So thank you very much for answering all those people's questions.

[译文] [Speaker A]: ——嗯,我认为那是一个很好的结束点。非常感谢你回答所有那些人的问题。

[原文] [Speaker B]: - Thank you for Askell-ing me the questions.

[译文] [Speaker B]: ——谢谢你问(Askell-ing)我这些问题。