Making AI accessible with Andrej Karpathy and Stephanie Zhan

章节 1:重聚与回溯:OpenAI的早期岁月

📝 本节摘要

在本节中,红杉资本的Stephanie Zhan隆重介绍了Andrej Karpathy。对话以一种独特的怀旧氛围开场——他们惊讶地发现,访谈现场正位于OpenAI最初的办公室旁边。Andrej回忆起创业初期楼下“巧克力工厂”飘来的香味,以及Jensen Huang(黄仁勋)亲自向OpenAI交付第一台NVIDIA DGX-1超级计算机的历史性时刻。Stephanie随后回顾了Andrej从斯坦福到OpenAI、再到Tesla最后回归OpenAI的传奇职业生涯,并戏称他现在终于享受到了“失业”后的终极自由。

[原文] [Stephanie Zhan]: I'm filed to introduce our next and final speaker Andre karpathy I think karpathy probably needs no introduction most of us have probably watched his YouTube videos at length uh but he's a um he's renowned for his research in deep learning he designed the first deep learning class at Stamford was part of the founding team at open AI led the computer vision team at Tesla and is now a mystery man again now that he has just left open AI so we're very lucky to have you here

[译文] [Stephanie Zhan]: 我非常激动地介绍我们下一位也是最后一位演讲嘉宾,Andrej Karpathy。我想Karpathy大概不需要多作介绍了,我们大多数人可能都长时间观看过他的YouTube视频。但他……他在深度学习研究领域享有盛誉,他在斯坦福(Stanford)设计了第一门深度学习课程,是OpenAI的创始团队成员,曾领导Tesla的计算机视觉团队,而现在随着他刚刚离开OpenAI,他又再次成为了一位“神秘人”。所以我们非常幸运能邀请你来到这里。

[原文] [Stephanie Zhan]: here than Andre you've been such a dream speaker and so we're excited to have you and Stephanie close out the day thank you Andre's first reaction as we walked up here was oh my God to his picture it's like a very intimidating I don't know what year was taken but he's he's impressed um okay amazing um Andre thank you so much for joining us today and welcome back

[译文] [Stephanie Zhan]: Andre,你一直是我们梦寐以求的演讲嘉宾,所以我们很兴奋能由你和我(Stephanie)来为今天做结。谢谢。Andrej走上台时的第一反应是对着他的照片惊呼“天哪”,那张照片看起来非常令人敬畏,我不知道是哪一年拍的,但他显然被震住了。好的,太棒了。Andre,非常感谢你今天加入我们,欢迎回来。

[原文] [Andrej Karpathy]: yeah thank you um fun

[译文] [Andrej Karpathy]: 是的,谢谢,很有趣。

[原文] [Stephanie Zhan]: fact that most people don't actually know how many how many folks here know where open ai's original office was that's amazing um Nick I'm gonna guess right here right here right here on the opposite side of our uh San Francisco office where actually many of you guys were just in huddles so this is fun for us because it brings us back to Our Roots back when I first started at seoa and when Andre first uh started co-founding open aai

[译文] [Stephanie Zhan]: 有个事实大多数人实际上并不知道。在座有多少人知道OpenAI最初的办公室在哪里?太棒了。Nick,我猜就在这儿,就在这儿,就在这儿,就在我们旧金山办公室的对面,实际上你们很多人刚才就在那里开小组会。这对我们来说很有趣,因为它把我们带回了根源,带回了我刚开始在红杉(Sequoia)工作以及Andre刚开始联合创立OpenAI的时候。

[原文] [Stephanie Zhan]: um Andre in addition to living out the Willy Wonka working a top a chocolate factory dream uh what were some of your favorite moments working from here

[译文] [Stephanie Zhan]: 嗯,Andre,除了实现了那种在巧克力工厂顶上工作的“威利·旺卡(Willy Wonka)”式的梦想之外,你在这里工作时最喜欢的时刻有哪些?

[原文] [Andrej Karpathy]: yes opening I was right there um and this was the first office after I guess Greg's apartment which maybe doesn't count uh and so yeah we spent maybe two years here and the Chocolate Factory was just downstairs so it always smelled really nice uh and uh yeah I guess the team was you know 10 20 plus

[译文] [Andrej Karpathy]: 是的,OpenAI(当时的办公室)就在那边。我想这是继Greg公寓之后的第一个办公室——公寓也许不算数。所以是的,我们在这里度过了大概两年,巧克力工厂就在楼下,所以闻起来总是很香。而且,我想当时的团队大概有10到20多人。

[原文] [Andrej Karpathy]: and uh yeah we had a few very fun episodes here one of them was eluded to by um by Jensen at GTC that happened just yesterday or two days ago so Jensen was describing how he brought the DG the first dgx and how he delivered it to open AI so that happened right there uh so that's where we all signed it it's in the room over there

[译文] [Andrej Karpathy]: 我们在这里经历了一些非常有趣的插曲。其中一个Jensen(黄仁勋)在GTC大会上刚刚提到过,就在昨天还是前两天。Jensen描述了他如何带来第一台DGX,以及如何将其交付给OpenAI,这事就发生在那里。那是我们所有人签名的地方,它就在那边的房间里。

[原文] [Stephanie Zhan]: um so Andre needs no introduction but I wanted to give a little bit of backstory on some of his journey to date um as Sonia had introduced he was trained by Jeff Hinton and then uh Fay um you know his first claim to fame was his deep learning course at Stanford um he co-founded open AI back in 2015 and 2017 he was poached by Elon

[译文] [Stephanie Zhan]: 嗯,虽然Andre不需要介绍,但我想提供一些关于他迄今为止旅程的背景故事。正如Sonia介绍的那样,他师从Jeff Hinton,然后是Fay(李飞飞)。你知道,他的成名作是在斯坦福开设的深度学习课程。他在2015年联合创立了OpenAI,然后在2017年被Elon(马斯克)挖走了。

[原文] [Stephanie Zhan]: I remember this very very clearly for folks who don't under who don't remember the context then Elon had just transitioned through six different autopilot leaders each of whom lasted six months each and I remember when Andre took this job I thought congratulations and good luck um not too long after that uh you know he went back to open aai and has been there for the last year now unlike like all the rest of us today he is Basking in the ultimate Glory of freedom in all time and responsibility

[译文] [Stephanie Zhan]: 我对此记忆犹新。对于那些不了解或不记得当时背景的人来说,Elon当时刚刚换了六位不同的Autopilot(自动驾驶)负责人,每位都只干了六个月。我记得当Andre接下这份工作时,我心里想的是“恭喜,祝你好运”。没过多久,你知道他又回到了OpenAI,并在那里度过了过去的一年。现在,与我们在座的所有人不同,他正沐浴在完全的时间和责任自由的终极荣耀之中。

[原文] [Stephanie Zhan]: um and so we're really excited to to see what you have to share today a few things that I appreciate the most from Andre are that he is an incredible fascinating futurist thinker um he is a Relentless Optimist and he's a very practical Builder and so I think he'll share some of his insights around that today to kick things off

[译文] [Stephanie Zhan]: 所以我们真的很期待看到你今天会分享什么。我最欣赏Andre的几点是:他是一位极其迷人的未来主义思想家,他是一位不懈的乐观主义者,同时也是一位非常务实的建设者。所以我认为他今天会围绕这些方面分享一些见解,以此来开启我们的对话。


章节 2:AGI范式转移:迈向LLM操作系统(LLM OS)

📝 本节摘要

话题转向未来愿景。Stephanie指出AGI(通用人工智能)已从遥不可及变得触手可及。Andrej对此表示认同,并提出了他著名的“LLM OS”(大模型操作系统)概念:他将大语言模型(Transformer)比作新时代的CPU,而文本、图像、音频等模态则是连接到CPU的“外设”。他预言未来将充满各种专门针对经济各个细分领域(nooks and crannies)定制的智能体(Agents),而当下的任务是构建并引导这个多智能体的未来向好的方向发展。

[原文] [Stephanie Zhan]: um AGI even seven years ago seemed like an incredibly impossible task to achieve even in the span of our lifetimes now it seems within sight what is your view of the future over the next n years

[译文] [Stephanie Zhan]: 嗯,哪怕在七年前,AGI(通用人工智能)看起来还像是一项在我们有生之年都极其不可能完成的任务。而现在,它似乎已近在咫尺。你对未来n年的看法是什么?

[原文] [Andrej Karpathy]: uh yes so I think you're right I think a few years ago I sort of felt like AGI was um um it wasn't clear how it was going to happen it was very sort of academic and you would like think about different approaches and now I think it's very clear and there's like a lot of space and everyone is trying to fill it and uh uh so there's a lot of optimization

[译文] [Andrej Karpathy]: 嗯,是的,我觉得你是对的。我想几年前,我确实觉得AGI……嗯,当时还不清楚它将如何实现,它非常……算是非常学术化,你会思考各种不同的路径。而现在,我认为路径已经非常清晰了,而且有很大的空间,每个人都在试图填补这个空间。所以,现在有大量的优化工作正在进行。

[原文] [Andrej Karpathy]: um and I think roughly speaking the way things are happening is um everyone is trying to build what I refer to as kind of like this llm OS um and basically I like to think of it as an operating system you have to get a bunch of like basically peripherals that you plug into this new CPU or something like that the peripherals are of course like text uh images audio and all the modalities and then you have a CPU which is the llm Transformer itself and then it's also connected to all the software 1.0 infrastructure that we've already built up for ourselves

[译文] [Andrej Karpathy]: 嗯,我想大体来说,目前的发展趋势是——大家都在试图构建我称之为“LLM OS(大模型操作系统)”的东西。基本上,我喜欢把它看作一个操作系统。你需要有一堆基本上像是“外设”的东西,插到这个新的CPU或类似的东西上。当然,这些外设就是文本、图像、音频以及所有的模态。然后你有一个CPU,也就是LLM Transformer本身。然后它还连接着我们已经为自己建立的所有“软件1.0”基础设施。

[原文] [Andrej Karpathy]: and so I think everyone is kind of trying to build something like that and then um make it available as something that's customizable to all the different nukes and crannies of the economy and so I think that's kind of roughly what everyone is trying to build out and what um uh what we sort of also heard about earlier today

[译文] [Andrej Karpathy]: 所以我认为每个人都在某种程度上试图构建这样的东西,然后……嗯,让它成为一种可定制的产品,适用于经济中所有的不同角落和细分领域。所以我想这大致就是每个人都在努力构建的东西,也是我们今天早些时候听到的一些内容。

[原文] [Andrej Karpathy]: uh so I think um that's roughly where it's headed is um we can bring up and down these relatively uh you know self-contained agents that we can give high level tasks to and specialize in various ways so yeah I think it's going to be very interesting and exciting and it's not just one agent it's many agents and what does that look like and if that view of the future is true how should we all be living Our Lives differently

[译文] [Andrej Karpathy]: 嗯,所以我认为……大致的方向是,我们可以启动和关闭这些相对……你知道,相对独立的智能体(Agents),我们可以给它们下达高层级的任务,并让它们在不同方面进行专业化。所以是的,我认为这将会非常有趣和令人兴奋。而且这不仅仅是一个智能体,而是许多智能体。那会是什么样子?如果这种未来的愿景成真,我们所有人的生活方式应该发生怎样的改变?

[原文] [Andrej Karpathy]: um I don't I don't know I guess we have to try to build it influence it make sure it's good and uh yeah just uh try to try to make sure it turns out well

[译文] [Andrej Karpathy]: 嗯,我不……我不知道。我想我们必须尝试去构建它、影响它,确保它是好的,并且……嗯,就是试着……试着确保它的结果是好的。


章节 3:市场格局:巨头垄断下的应用生态与开源定义

📝 本节摘要

本节探讨了创业者最为关心的“OpenAI阴影”问题。Andrej通过操作系统的类比,指出即便巨头拥有“默认应用”(如浏览器),依然无法覆盖经济体系中的每一个细分角落,这正是初创企业的机会所在——如同早期的iPhone应用生态,虽然混乱但充满潜力。随后,他深入辨析了“开源模型”的定义,尖锐地指出目前许多所谓的开源模型(如Llama)实为“开放权重”的二进制文件。他强调,真正的开源应当包含数据与训练代码,只有这样开发者才能在不造成能力退化(Regression)的前提下进行有效微调。

[原文] [Stephanie Zhan]: so now that you're a free independent agent um I want to address the elephant in the room which is that open AI is um uh dominating the ecosystem and most of our audience here today are founders who are trying to carve out a little niche praying that open aai doesn't take them out overnight where do you think opportunities exist for other players to build new independent companies versus what areas do you think open AI will continue to dominate even as its ambition grows

[译文] [Stephanie Zhan]: 既然你现在是一个自由独立的“代理人”,我想谈谈那个大家避而不谈却显而易见的问题(elephant in the room),那就是OpenAI正在主导整个生态系统。今天在座的大多数听众都是创业者,他们正试图开辟一个小小的细分市场,并祈祷OpenAI不会一夜之间把他们干掉。你认为其他参与者在哪些领域有机会建立新的独立公司?又有哪些领域是即便OpenAI野心不断膨胀,也依然会继续主导的?

[原文] [Andrej Karpathy]: uh yes so my high level impression is basically open is trying to build out this lmos and I think uh as we heard earlier today like um is trying to develop this platform on top of which you can position different companies and different verticles now I think the OS analogy is also really interesting because when you get when you look at something like Windows or something like that these are also operating systems they come with a few default apps like a browser comes with Windows right you can use the edge browser and so I think in the same way openai or any of the other companies might come up with a few default apps quote unquote but it doesn't mean that you can have different browsers that are running on it just like you can have different chat agents uh sort of running on that infrastructure

[译文] [Andrej Karpathy]: 嗯,是的,我的高层级印象基本上是,OpenAI正试图构建这个“LLM OS(大模型操作系统)”。正如我们今天早些时候听到的那样,他们试图开发这个平台,而在其之上可以定位不同的公司和不同的垂直领域。现在,我认为操作系统的类比非常有趣,因为当你观察像Windows之类的东西时,它们也是操作系统,它们自带一些默认应用,比如Windows自带浏览器,对吧?你可以使用Edge浏览器。所以我认为同样地,OpenAI或其他任何公司可能会推出一些“默认应用”,但这并不意味着你不能有运行在其上的其他浏览器,就像你可以有运行在该基础设施上的不同聊天智能体一样。

[原文] [Andrej Karpathy]: and so there will be a few default apps but there will also be potentially a vibrant ecosystem of all kinds of apps that are fine tune to all the different NS and cares of the economy and I really like the analogy of like the early um iPhone apps and what they looked like and they were all kind of like jokes and it took time for that to develop and I think absolutely I agree that we're going through the same thing right now people are trying to figure out what is this thing good at what is it not good at how do I work it how do I program with it how do I debug it how do I just you know uh actually get it to perform real tasks

[译文] [Andrej Karpathy]: 所以,会有一些默认应用,但也可能会有一个充满活力的生态系统,包含各种各样的应用,它们针对经济中所有不同的角落和细分领域进行了微调。我非常喜欢早期iPhone应用的那个类比,看看它们当时的样子,它们都有点像是玩笑,它们的发展需要时间。我认为绝对如此,我同意我们现在正在经历同样的过程。人们正试图弄清楚这东西擅长什么?不擅长什么?我该如何使用它?如何用它编程?如何调试它?甚至只是……你知道,如何真正让它执行实际任务?

[原文] [Andrej Karpathy]: and what kind of oversight because it's quite autonomous but not fully autonomous so what does the oversight look like what does the evaluation look like there's many things to think through and just to understand sort of like the psychology of it and I think uh that's what's going to take some time to figure out exactly how to work with this infrastructure uh so I think we'll see that over the next few years

[译文] [Andrej Karpathy]: 还有什么样的监督机制?因为它相当自主,但又不是完全自主的。所以监督看起来应该是什么样的?评估看起来是什么样的?有很多事情需要思考,还要去理解它的某种“心理学”。我认为这需要一些时间来弄清楚究竟该如何与这个基础设施协同工作。所以我认为在接下来的几年里我们会看到这些发展。

[原文] [Stephanie Zhan]: so the race is on right now with llms open AI anthropic mol llama Gemini um the whole ecosystem of Open Source models now a whole longtail of small models how do you foresee the future of the ecosystem playing out

[译文] [Stephanie Zhan]: 现在的LLM(大语言模型)竞赛已经开始了,OpenAI、Anthropic、Mistral、Llama、Gemini……整个开源模型生态系统,现在还有长尾的一大堆小模型。你预见这个生态系统的未来会如何发展?

[原文] [Andrej Karpathy]: yeah so again I think the open source anal sorry the operating systems analogy is interesting because we have say like we have basically an oligopoly of a few proprietary systems like say windows uh Mac OS Etc and then we also have Linux and so and Linux has an Infinity of distributions uh and so I think maybe it's going to look something like that

[译文] [Andrej Karpathy]: 是的,所以我再次认为开源的类比……抱歉,是操作系统的类比很有趣。因为我们基本上有少数几个专有系统的寡头垄断,比如Windows、Mac OS等,然后我们也有Linux。Linux有无数的发行版。所以我认为未来可能看起来会有点像那样。

[原文] [Andrej Karpathy]: I also think we have to be careful with the naming because a lot of the ones that you listed like Lama mrone I wouldn't actually say they're open source right and so like it's kind of like tossing over a binary for like an operating system you know like you can you can kind of work with it it's like it's like useful but um but it's not fully useful right and um there are a number of um what I would say is like fully uh open source llms uh so there's um know Pia models llm 360 Almo Etc so and they're fully releasing the entire infrastructure that's required to compile the the operating system right to train the model from the data to gather the data Etc

[译文] [Andrej Karpathy]: 我也认为我们在命名上必须小心,因为你列出的很多模型,比如Llama、Mistral,我实际上不会说它们是“开源”的,对吧?这有点像是给你扔过来一个操作系统的二进制文件(Binary)。你知道,你可以用它,它是有用的,但……但这并不是完全有用的,对吧?还有一些我认为是完全开源的LLM,比如Pythia模型、LLM360、OLMo等。它们完全发布了编译这个“操作系统”所需的所有基础设施,也就是从数据训练模型、收集数据等过程。

[原文] [Andrej Karpathy]: and so when you're just given a binary it's much better of course because um you can fine-tune the model which is useful but also I think it's subtle but you can't fully fine-tune the model because the more you fine tune the model the more it's going to start regressing on everything else and so what you actually really want to do for example if you want to add capability is you uh and not regress the other capabilities you may want to train on some kind of um um like a mixture of the previous data set distribution and the new data set distribution because you don't want to regress the old distribution you just want to add knowledge

[译文] [Andrej Karpathy]: 所以当你只拿到一个二进制文件时,当然这比没有好,因为你可以微调模型,这很有用。但我认为其中也有微妙之处,你无法“完全”微调模型。因为你越是微调模型,它就越可能在其他所有方面开始退化(Regressing)。所以你真正想做的,例如如果你想增加某种能力,同时又不让其他能力退化,你可能需要在旧数据集分布和新数据集分布的混合体上进行训练。因为你不想让旧的分布退化,你只是想增加知识。

[原文] [Andrej Karpathy]: and if you're just given the weights you can't do that actually you need the training Loop you need the data set Etc so you are actually constrained in how you can work with these models and um again like I think it's definitely helpful but it's uh I think we need like slightly better language for it almost so there's open weights models open source models and then um proprietary models I guess and that might be the ecosystem um and yeah probably it's going to look very similar to the ones that we we have today and hopefully you'll continue to help build some of the that out

[译文] [Andrej Karpathy]: 如果只给你权重,你实际上做不到这一点。你需要训练循环(Training Loop),你需要数据集等等。所以你在如何使用这些模型上实际上是受限的。再次强调,我认为这绝对有帮助,但我认为我们需要为此找到稍微更准确的措辞。所以我猜会有“开放权重模型(Open Weights Models)”、“开源模型(Open Source Models)”,然后是“专有模型(Proprietary Models)”,这可能就是未来的生态系统。是的,它可能看起来与我们今天拥有的非常相似,希望你能继续帮助构建其中的一部分。


章节 4:算力壁垒:规模法则(Scale)与基础设施的隐形门槛

📝 本节摘要

在本节中,Stephanie抛出了另一个关键问题:规模(Scale)是否是决定AI成败的唯一因素?Andrej坦诚地表示,规模确实是“第一主成分”,是训练大模型的基础门槛。然而,他深入揭示了隐藏在资金和硬件背后的真正挑战——基础设施的工程难度。他描述了管理数万张GPU集群的恐怖细节:硬件随机故障、分布式优化的复杂性以及相关人才的极度稀缺。他强调,仅仅给某人一堆GPU和钱并不意味着就能造出大模型,基础设施、算法和数据的综合专业能力才是关键。

[原文] [Stephanie Zhan]: so I'd love to address the other elephant in the room which is scale um simplistically it seems like scale is all that matters scale of data scale of compute and therefore the large research Labs large Tech Giants have an immense advantage today um what is your view of that and and is that all that matters and if not what else does

[译文] [Stephanie Zhan]: 我想谈谈房间里的另一个“大象”(显而易见却被回避的问题),那就是规模(Scale)。简单来说,似乎规模就是一切——数据的规模、计算的规模,因此大型研究实验室和大型科技巨头在今天拥有巨大的优势。你对此有何看法?规模就是一切吗?如果不是,还有什么重要因素?

[原文] [Andrej Karpathy]: um so I would say scale is definitely number one uh I do think there are details there to get right and I think you know um a lot also goes into the data set propriation and so on making it uh very good clean Etc that matters a lot these are all sort of like compute efficiency gains that you can get so there's the data the algorithms and then of course the um the training of the model and making it really large

[译文] [Andrej Karpathy]: 嗯,我会说规模绝对是第一位的。但我确实认为其中有很多细节需要处理好。你知道,很多工作也投入到了数据集的准备等方面,使其非常优质、干净等等,这非常重要。这些都能带来某种计算效率的收益。所以,有数据、算法,当然还有模型的训练以及将其做得非常大。

[原文] [Andrej Karpathy]: so I think scale will be the primary determining factor is like the first principal component of things for sure uh but there are many of many of the other things uh that um that you need to get right so it's almost like the scale set some kind of a speed limit almost uh but you do need some of the other things but it's like if you don't have the scale then you fundamentally just can't train some of these massive models

[译文] [Andrej Karpathy]: 所以我认为规模将是主要的决定性因素,肯定就像是事物的第一主成分(First Principal Component)。但还有许多……许多其他事情你需要做对。这几乎就像是规模设定了某种速度限制,但你确实还需要其他东西。不过这就好比,如果你没有规模,你就从根本上无法训练这些大规模模型。

[原文] [Stephanie Zhan]: if you are going to be training models uh if you're just going to be doing fine tuning and so on then I think um maybe less scale is is necessary but we haven't really seen that just yet to fully play out and can you share more about some of the ingredients that you think also matter maybe lower in priority behind scale

[译文] [Stephanie Zhan]: 如果你要训练模型的话……如果你只是做微调之类的,我想可能不需要那么大的规模,但这方面我们还没完全看到结果。你能分享更多你认为同样重要、但优先级排在规模之后的“配料”吗?

[原文] [Andrej Karpathy]: um yeah so the first thing I think is like you can't just train these models if you have if you're just given the money and the scale it's actually still really hard to build these models and part part of it is that the infrastructure is still so new and it's still being developed not quite there but uh training these models at scale is extremely difficult and is a very complicated distributed optimization problem and there's actually like the talent for this is fairly scarce right now

[译文] [Andrej Karpathy]: 嗯,是的。所以我认为第一件事是,即便给了你钱和规模,你也无法直接训练出这些模型。构建这些模型实际上仍然非常困难。部分原因在于基础设施仍然非常新,还在开发中,尚未完全成熟。但在大规模下训练这些模型极其困难,这是一个非常复杂的分布式优化问题,而且实际上,这方面的人才目前相当稀缺。

[原文] [Andrej Karpathy]: and uh it just basically turns into this uh insane thing running on tens of thousands of gpus all of them are like failing at random at different points in time and so like instrumenting that and getting that to work is actually extremely difficult challenge uh gpus were not like intended for like 10,000 GPU workloads until very recently and so I think a lot of the infrastructure is sort of like creaking under that pressure and uh we need to like work through that

[译文] [Andrej Karpathy]: 它基本上变成了一件疯狂的事情:在数万个GPU上运行,所有这些GPU都会在不同时间点随机发生故障。所以,对其进行监控并使其正常运转实际上是一个极其困难的挑战。直到最近,GPU的设计初衷并不是为了应对上万个GPU协同工作的负载。所以我认为很多基础设施在那种压力下都有点不堪重负(Creaking under pressure),我们需要解决这个问题。

[原文] [Andrej Karpathy]: but right now if you're just giving someone a ton of money or a ton of scale or gpus it's not obvious to me that they can just produce one of these models which is why uh you know it's not it's not just about scale you actually need a ton of uh expertise both on the infrastructure side the algorithm side um and then the data Side and being careful with that so I think those are the major components

[译文] [Andrej Karpathy]: 但在目前,如果你只是给某人一大笔钱、大规模资源或GPU,这并不意味着他们就能直接造出这种模型,这一点对我来说并不显而易见。这就是为什么……你知道,这不仅仅关于规模。你实际上需要大量的专业知识,包括基础设施方面、算法方面,然后是数据方面,并且要小心处理。所以我认为这些是主要的组成部分。


章节 5:算法演进:模型统一、能效瓶颈与架构重构

📝 本节摘要

在本节中,Stephanie询问了Andrej目前最关注的研究挑战。Andrej首先指出了算法层面的一个“奇怪”现状:扩散模型(Diffusion)和自回归模型(Autoregressive)目前泾渭分明,他认为未来两者必将融合。随后,他通过对比人脑(约20瓦)与超级计算机(兆瓦级)的巨大能效差距,引出了硬件架构的变革需求。他预测未来的计算架构将打破传统的冯·诺依曼瓶颈,通过降低精度(甚至低至1.58位)和利用稀疏性(Sparsity),实现百万倍的效率提升。

[原文] [Stephanie Zhan]: the ecosystem is moving so quickly um even some of the challenges we thought existed a year ago are being solved more more today um hallucinations context Windows multimodal capabilities inference getting better faster cheaper um what are the llm research challenges today that keep you up at night what do you think are medy enough problems but also solvable problems that we can continue to go after

[译文] [Stephanie Zhan]: 生态系统发展得太快了,甚至一些我们一年前认为存在的挑战,如今也正在被越来越多地解决。比如幻觉问题、上下文窗口、多模态能力、推理变得更好、更快、更便宜。那么,今天还有哪些LLM(大语言模型)的研究挑战让你夜不能寐?你认为有哪些问题既足够棘手(meaty),又是我们能够继续攻克的可解之题?

[原文] [Andrej Karpathy]: so I would say on the algorithm side one thing I'm thinking about quite a bit is uh the this like distinct split between diffusion models and autoaggressive models they're both ways of presenting problem the distributions and it just turns out that different modalities are apparently a good fit for one of the two I think that there's probably some space to unify them or to like connect them in some way

[译文] [Andrej Karpathy]: 嗯,我想说在算法方面,我思考得比较多的一点是——扩散模型(Diffusion Models)和自回归模型(Autoregressive Models)之间存在这种明显的割裂。它们都是呈现问题分布的方式,结果表明不同的模态似乎分别适合其中一种。我认为可能存在某种空间将它们统一起来,或者以某种方式将它们连接起来。

[原文] [Andrej Karpathy]: uh and also um get some Best Best of Both Worlds or um sort of figure out how we can get a hybrid architecture and so on so it's just odd to me that we have sort of like two separate SP points in the space of models and they're both extremely good and it just feels wrong to me that there's nothing in between uh so I think we'll see that sort of carved out and I think there are interesting problems there

[译文] [Andrej Karpathy]: 而且……嗯,各取所长,或者弄清楚我们如何能得到一种混合架构等等。对我来说这真的很奇怪,我们在模型空间里似乎有两个分离的“奇点”,它们都极好,但我总觉得中间空无一物是不对劲的。所以我认为我们会看到那个中间地带被开发出来,我认为那里有许多有趣的问题。

[原文] [Andrej Karpathy]: and then the other thing that maybe I would point to is there's still like a massive Gap in just um the energetic efficiency of running all this stuff so my brain is 20 watts roughly uh Jensen was just talking at GTC about you know the massive super computers that they're going to be building now these are the numbers are in mega megawatts right and so maybe you don't need all that to run like a brain I don't know how much you need exactly but I think it's safe to say we're probably off by a factor of a thousand to like a million somewhere there in terms of like the efficiency of running these these models

[译文] [Andrej Karpathy]: 然后我也许会指出的另一件事是,在运行所有这些东西的能源效率上仍然存在巨大的差距。我的大脑大约只有20瓦。Jensen(黄仁勋)刚刚在GTC上谈论他们即将建造的大型超级计算机,这些数字都是兆瓦(Megawatts)级别的,对吧?也许像大脑那样运行并不需要那么多能量。我不知道确切需要多少,但我认为可以肯定地说,在运行这些模型的效率方面,我们可能偏离了1000倍到100万倍。

[原文] [Andrej Karpathy]: uh and I think part of it is just because the computers we've designed of course are just like not a good fit for this workload um and I think part Nvidia gpus are like a good step in that direction uh in terms of like the you need extremely high parallelism we don't actually care about sequential computation that is sort of like data dependent in some way we just have these uh we just need to like blast the same algorithm across many different uh sort of U array elements or something you can think about it that way

[译文] [Andrej Karpathy]: 我认为部分原因仅仅是因为我们设计的计算机当然并不适合这种工作负载。我认为Nvidia GPU在这方面迈出了很好的一步,就好像……你需要极高的并行度。我们实际上并不在乎那种某种程度上依赖数据的顺序计算,我们只是有这些……我们只需要把相同的算法“轰炸”到许多不同的数组元素上,你可以这样理解。

[原文] [Andrej Karpathy]: so I would say number one is just um adapting the computer architecture to the new uh data workflows number two is like pushing on a few things that we're currently seeing improvements on so number one maybe is uh Precision we're seeing Precision come down from what originally was was like 64 bit for double we're now to down to I don't know it is 456 or even 1.58 depending on which papers you read and so I think Precision is one big lever of um of getting a handle on this and then second one of course is sparsity so that's also like another big Delta

[译文] [Andrej Karpathy]: 所以我想说,第一点就是调整计算机架构以适应新的数据工作流。第二点是推动我们目前看到正在改进的一些事情。所以第一可能是精度(Precision),我们看到精度已经从最初的双精度64位下降了,现在降到了……我不知道,4位、5位、6位,甚至1.58位,取决于你读的是哪篇论文。所以我认为精度是掌控这一问题的一个重要杠杆。当然第二个就是稀疏性(Sparsity),那也是另一个巨大的增量。

[原文] [Andrej Karpathy]: would say like your brain is not always fully activated and so sparity I think is another big lever but then the last lever I also feel like just the V noyman architecture of like computers and how they built where you're shuttling data in and out and doing a ton of data movement between memory and you know the cores that are doing all the compute this is all broken as well kind of and it's not how your brain works and that's why it's so efficient and so I think it should be a very exciting time in computer architecture

[译文] [Andrej Karpathy]: 我想说你的大脑并不是一直处于全激活状态,所以我认为稀疏性是另一个大杠杆。但最后一个杠杆,我也觉得计算机的冯·诺依曼架构(Von Neumann architecture)以及它们的构建方式——你在内存和执行计算的核心之间不断地搬运数据,进行大量的数据移动——这也有点坏掉了(broken)。这不是你大脑的工作方式,这也是大脑如此高效的原因。所以我认为这应该是计算机架构领域一个非常激动人心的时刻。


章节 6:组织效能:Elon Musk的硬核管理哲学

📝 本节摘要

在本节中,Stephanie引用了一个关于“划船队”的笑话,生动地引出了Elon Musk的管理哲学。Andrej详细剖析了Elon极具个人色彩的硬核管理风格:他坚持组建小而精的技术团队,极度反感非技术类的中间管理层。Andrej特别提到了Elon对办公室“氛围(Vibes)”的重视,以及他越过高管直接与工程师对接的习惯。最令人印象深刻的是Elon解决瓶颈的方式——当团队因GPU短缺而受阻时,他会动用“大锤(Large Hammer)”,比如直接给黄仁勋打电话,以雷霆手段消除障碍。

[原文] [Stephanie Zhan]: okay Switching gears a little bit um you've worked alongside many of the greats of Our Generation Um Sam Greg from openai and the rest of the open AI team Elon Musk um who here knows the the joke about the uh rowing team the American team versus the Japanese team okay great so this will be a good one uh Elon shared this at Al LS base camp and I think it reflects a lot of his philosophy around how he builds uh cultures and teams

[译文] [Stephanie Zhan]: 好的,稍微转换一下话题。你曾与咱们这代许多伟大的人物并肩工作过,比如OpenAI的Sam、Greg以及OpenAI团队的其他成员,还有Elon Musk。在座有谁知道那个关于“划船队”——美国队对阵日本队——的笑话吗?好的,太棒了,这会是个好例子。Elon在Al LS大本营分享过这个故事,我认为这很大程度上反映了他关于如何构建文化和团队的哲学。

[原文] [Stephanie Zhan]: so you have two teams um the Japanese team has four rowers and one steerer and the American team has four steerers and one rower and can anyone guess when the American team loses what do they do shout it out exactly they fire the rower and and Elon shared this example I think as a reflection of how he thinks about hiring the right people building the right people building the right teams at the right ratio um from working so closely with folks like these incredible leaders what have you learned

[译文] [Stephanie Zhan]: 故事是这样的,有两支队伍。日本队有四名划船手和一名舵手,而美国队有四名舵手和一名划船手。有人能猜到当美国队输了的时候,他们会做什么吗?大声说出来。没错,他们把划船手解雇了。Elon分享这个例子,我认为是反映了他关于如何雇佣正确的人、培养正确的人、以及以正确的比例构建正确团队的思考。通过与这些不可思议的领导者如此密切地合作,你学到了什么?

[原文] [Andrej Karpathy]: uh yeah so I would say definitely Elon runs this company is an extremely unique style I don't actually think that people appreciate how unique it is you sort of like even read about in some but you don't understand it I think it's like even hard to describe I don't even know where to start but it's like a very unique different thing like I I like to say that he runs the biggest startups

[译文] [Andrej Karpathy]: 嗯,是的。我会说Elon经营公司绝对是一种极其独特的风格。我其实觉得人们并没有真正意识到它有多独特。你可能在某些地方读到过,但你无法真正理解它。我觉得甚至很难去描述它,我都不知道该从何说起,但这确实是一种非常独特、不同的东西。比如我喜欢说,他经营着(世界上)最大的初创公司。

[原文] [Andrej Karpathy]: and I think um it's just um I don't even know basically like how to describe it it almost feels like it's a longer sort of thing that I have to think through but well number one is like so he likes very small strong highly technical uh teams uh so that's number one so um I would say at companies by default they sort of like the teams grow and they get large Elon was always like a force against growth I would have to work and expend effort to hire people I would have to like basically plead to higher people

[译文] [Andrej Karpathy]: 我觉得……我甚至不知道该怎么描述,感觉这需要更长时间的思考才能说清楚。但第一点是,他喜欢非常小、非常强、技术含量极高的团队。所以这是第一点。我会说在一般公司里,默认情况是团队会不断增长变大。但Elon总是像一股对抗增长的力量。我必须努力工作、费尽周折才能招人,我基本上得乞求才能招人。

[原文] [Andrej Karpathy]: um and then the other thing is at big companies usually you want um it's really hard to get rid of low performers and I think Elon is very friendly to by default getting getting rid of low performance so I actually had to fight for people to keep them on the team uh because he would by default want to remove people and so uh that's one thing so keep a small strong highly technical team uh no middle management that is kind of like uh non-technical for sure

[译文] [Andrej Karpathy]: 然后另一件事是,在大公司里通常……很难摆脱低绩效者。但我认为Elon默认是非常乐意摆脱低绩效者的。所以我实际上不得不为了把人留在团队里而据理力争,因为他默认会想把人移走。所以这是一点:保持一个小而强、技术含量极高的团队。绝对不要那种……非技术类的中间管理层。

[原文] [Andrej Karpathy]: uh so that's number one number two is kind of like The Vibes of how this is how everything runs and how it feels when he sort of like walks into the office office he wants it to be a vibrant place people are walking around they're pacing around they're working on exciting stuff they're charting something they're coding you know he doesn't like stagnation he doesn't like to look for it to look that way he doesn't like large meetings

[译文] [Andrej Karpathy]: 这是第一点。第二点有点像是“氛围(Vibes)”,就是事情如何运作以及当他走进办公室时的感觉。他希望那里是一个充满活力的地方,人们走来走去,步履匆匆,做着令人兴奋的事情,他们在画图表,他们在写代码。你知道,他不喜欢停滞,他不喜欢看到那种死气沉沉的样子。他不喜欢大型会议。

[原文] [Andrej Karpathy]: he always encourages people to like leave meetings if they're not being useful uh so actually do see this or you know it's a large meeting and some if you're not contributing and you're not learning just walk out and this is like fully encouraged and I think this is something that you don't normally see so I think like Vibes is like a second big big lever that I think he really instills culturally

[译文] [Andrej Karpathy]: 他总是鼓励人们如果觉得会议没用就离开。所以你真的会看到这种情况,比如在一个大型会议上,如果你没有贡献也没有学到东西,就直接走出去。这完全是被鼓励的。我认为这是你通常看不到的。所以我认为“氛围”是他真正注入文化的第二个重要杠杆。

[原文] [Andrej Karpathy]: uh maybe part of that also is like I think a lot of bigger companies they like pamper employees I think like there's much less of that it's like the the culture of it is you're there to do your best technical work and there's the intensity and and so on and I think maybe the last one that is very unique and very interesting and very strange is just how connected he is to the team

[译文] [Andrej Karpathy]: 也许这其中一部分原因也是……我觉得很多大公司喜欢娇惯员工,而在他那里这种情况少得多。那里的文化是你来这里是为了做你最好的技术工作,有一种高强度等等。我认为最后一点非常独特、有趣且奇怪的,就是他与团队的连接程度。

[原文] [Andrej Karpathy]: uh so usually a CEO of a company is like a remote person five layers up who talks to their VPS who talk to their you know reports and directors and eventually you talk to your manager it's not how your ask companies right like he will come to the office he will talk to the engineers um many of the meetings that we had were like uh okay um 50 people in the room with Elon and uh he talks directly to the engineers he doesn't want to talk just to the VPS and the directors

[译文] [Andrej Karpathy]: 通常一家公司的CEO就像是一个高高在上的遥远人物,隔着五层架构,他们跟副总裁(VP)谈,副总裁跟总监谈,最后你跟你的经理谈。但在他的公司不是这样的,对吧?他会来到办公室,他会跟工程师谈。我们开的很多会议都是……大概50个人和Elon在一个房间里,他直接跟工程师对话,他不想只跟VP和总监谈。

[原文] [Andrej Karpathy]: uh so I you know um normally people would talk spend like 99% of the time maybe talking to the VPS he spends maybe 50% of the time and he just wants to talk to the engineers so if if the team is small and strong then engineers and the code are the source of Truth and so they have the source of Truth not some manager and he wants to talk to them to understand the actual state of things and what should be done to improve it

[译文] [Andrej Karpathy]: 通常人们可能会花99%的时间跟VP谈,他大概只花50%的时间,他只想跟工程师谈。所以如果团队小而强,那么工程师和代码就是“真理之源(Source of Truth)”。他们掌握着真相,而不是某个经理。他想跟他们谈,以了解事情的真实状态以及应该怎么做来改进它。

[原文] [Andrej Karpathy]: uh so I would say like the degree to which he's connected with the team and not something remote is also unique and um and also just like his large hammer and his willingness to exercise it within the organization so maybe if he talks to the engineers and they bring up that you know what's blocking you okay I I just I don't have enough gpus to run my my thing and he's like oh okay and if he if he hears that twice he's going to be like okay this is a problem

[译文] [Andrej Karpathy]: 所以我想说,他与团队的紧密联系而非疏远也是独特的。还有就是他的“大锤(Large Hammer)”以及他在组织内部挥舞这把锤子的意愿。比如如果他跟工程师谈话,他们提出“是什么阻碍了你?”,如果回答是“好吧,我只是没有足够的GPU来运行我的东西”,他会说“噢,好的”。如果他听到这话两次,他就会觉得“好吧,这是个问题”。

[原文] [Andrej Karpathy]: so like what is our timeline and when when you don't have satisfying answers he's like okay I want to talk to the person in charge of the GPU cluster and like someone dials the phone and he's just like okay double the cluster right now like let's let's have a meeting tomorrow from now on send me daily updates until cluster is H twice the size and then they kind of like push back and they're like okay well we have this procurement set up we have this timeline and Nvidia says that we don't have enough GP gpus and it will take six months or something and then you get a rise of an eyebrow

[译文] [Andrej Karpathy]: 然后他会问“我们的时间表是什么?”当你没有令人满意的答案时,他会说“好吧,我要跟负责GPU集群的人谈谈”。然后有人拨通电话,他就直接说“好的,马上把集群翻倍。我们明天开个会,从现在开始每天给我发更新,直到集群规模翻倍为止”。然后对方可能会推脱,说“好的,可是我们有采购流程,我们有时间表,Nvidia说我们没有足够的GPU,这需要六个月”之类的。然后你会看到他眉毛一挑。

[原文] [Andrej Karpathy]: and then he's like okay I want to talk to Jensen and then he just kind of like removes bottlenecks so I think the extent to which he's extremely involved and removes bottlenecks and applies his hammer I think is also like not appreciated so I think there's like a lot of these kinds of aspects that are very unique I would say and very interesting and honestly like going to a normal company outside of that is is uh you you like definitely miss aspects of that

[译文] [Andrej Karpathy]: 然后他会说“好吧,我要跟Jensen(黄仁勋)谈谈”。然后他就那样直接消除了瓶颈。所以我认为他极度介入、消除瓶颈并动用他的“大锤”的程度,也是外界所不了解的。所以我认为有很多这方面的特质是非常独特的,也非常有趣。老实说,离开那里去一家普通公司,你……你绝对会怀念那里的某些方面。


章节 7:终极愿景:培育多元共生的AI“珊瑚礁”

📝 本节摘要

访谈接近尾声,Stephanie询问Andrej在离开OpenAI后的下一篇章中,什么对他来说最有意义。Andrej表达了他超越单一公司的宏大愿景:他并不关心某一家具体的公司,而是深切关注整个AI生态系统的健康。他用“珊瑚礁(Coral Reef)”来比喻他理想中的未来——一个充满活力、由无数初创公司填满经济各个角落的多元化生态。他坦言,比起由“五家巨头公司(Mega Corps)”统治一切的未来,他更希望看到一个“沸腾的汤(boiling soup)”般热闹且去中心化的创新环境。

[原文] [Stephanie Zhan]: um taking a step back you've helped build some of the most generational companies you've also been such a key enabler for many people many of whom are in the audience today of getting into the field of AI um knowing you what you care most about is democratizing AC access uh to AI education tools uh helping uh create more equality in the in the whole ecosystem at large there are many more winners um as you think about the next chapter in your life what gives you the most meaning

[译文] [Stephanie Zhan]: 嗯,退一步说,你曾帮助建立了一些最具划时代意义的公司。你也是许多人——其中许多人今天就在观众席中——进入AI领域的关键推动者。据我了解,你最关心的是普及AI教育工具的访问权限,帮助在整个大生态系统中创造更多的平等,让这里有更多的赢家。当你思考你人生的下一篇章时,什么能给你带来最大的意义?

[原文] [Andrej Karpathy]: uh yeah I think like I think you've described it on in the right way like where my brain goes by default is um like you know I've worked for a few companies but I think like ultimately I care not about any one specific company I care a lot more about the ecosystem I want the ecosystem to be healthy

[译文] [Andrej Karpathy]: 嗯,是的,我觉得你描述得很准确。我的大脑默认思考的方向是……你知道,我为几家公司工作过,但我认为归根结底,我并不关心任何一家具体的公司,我更关心生态系统。我希望生态系统是健康的。

[原文] [Andrej Karpathy]: I want it to be thriving I want it to be like a coral reef of a lot of cool exciting startups and all the nukes and crannies of the economy and I want the whole thing to be like this boiling soup of cool stuff and genuinely Andre dreams about coral reefs you know I want it to be like a cool place and I think um yeah that's why I love startups and I love companies and I want uh there to be a vibrant ecosystem of them

[译文] [Andrej Karpathy]: 我希望它是繁荣的,我希望它像一个珊瑚礁(Coral Reef),由许多酷炫、令人兴奋的初创公司组成,遍布经济的每一个角落和缝隙。我希望整个东西就像这一锅“沸腾的汤(boiling soup)”,充满了酷的东西。这真的是Andre梦寐以求的珊瑚礁,你知道,我希望它是一个很酷的地方。我想……是的,这就是为什么我热爱初创公司,我热爱公司,我希望那里有一个充满活力的生态系统。

[原文] [Andrej Karpathy]: and um by default I would say a bit more hesitant about kind of like you know uh like five Mega Corps kind of like taking over especially with AGI being such a magnifier of power uh I would be kind of I'm kind of uh worried about what that could look like and so on so uh so I have to think that through more but yeah I like I love the ecosystem and I want it to be healthy and vibrant

[译文] [Andrej Karpathy]: 而且,哪怕只是默认情况下,我会说我对那种……你知道,那种由五家巨头公司(Mega Corps)接管一切的局面感到有些迟疑。特别是考虑到AGI是如此巨大的权力放大器,我会有点……我有点担心那会是什么样子等等。所以……所以我必须更深入地思考这个问题。但是是的,我热爱这个生态系统,我希望它是健康和充满活力的。

[原文] [Stephanie Zhan]: amazing um we'd love to have some questions from the audience

[译文] [Stephanie Zhan]: 太棒了。嗯,我们希望能听听观众的提问。


章节 8:深度问答:强化学习、模型优化与Transformer的未来

📝 本节摘要

在最后的观众互动环节,Andrej回答了涵盖管理、技术与生态的多个深度问题。他首先指出Elon Musk的管理风格取决于创始人的“DNA”,必须保持一致性。随后,他深入探讨了LLM的未来瓶颈,提出了一个振聋发聩的观点:目前的LLM只完成了AlphaGo的“第一步”(模仿学习),而尚未通过真正的强化学习(RL)迈出“第二步”。他批评现有的RLHF仅仅是一种“氛围检查(Vibes Check)”,并预言模型需要像研究生一样通过“自我练习”来内化知识。此外,他还建议创业者先追求模型性能再优化成本,呼吁建立更多帮助人们理解AI的“坡道(Ramps)”,并肯定了Transformer架构在并行计算上的卓越设计及未来的进化潜力。

[原文] [Brian]: hi um Brian hallan would you recommend Founders follow elon's management methods or is it kind of unique to him and you shouldn't try to copy him

[译文] [Brian]: 嗨,我是Brian Hallan。你会推荐创业者遵循Elon的管理方法吗?还是说那是他独有的,不应该尝试去模仿?

[原文] [Andrej Karpathy]: um yeah I think that's a good question I think it's up to the DNA of the the founder like you have to have that same kind of a DNA and that some some kind of a Vibe and I think when you're hiring the team it's really important that you're like the you're you're making it clear upfront that this is the kind of company that you have and when people send up for it they're uh they're very happy to go along with it actually

[译文] [Andrej Karpathy]: 嗯,是的,我觉得这是个好问题。我认为这取决于创始人的DNA。你必须拥有同样的那种DNA和那种……某种氛围。我认为当你招聘团队时,非常重要的一点是你要预先通过这种方式明确:这就是你所经营的那种公司。当人们为此签约时,他们实际上会非常乐意追随这种风格。

[原文] [Andrej Karpathy]: but if you change it later I think people are happy with that and that's very messy uh so as long as you do it from the start and you're consistent I think you can run a company like that um and uh you know uh but uh you know it has its own like pros and cons as well and I think uh um so you know up to up to people but I think it's a consistent model of company building and running

[译文] [Andrej Karpathy]: 但如果你后来改变了风格,我想人们……那会非常混乱。所以只要你从一开始就这么做并且保持一致,我认为你可以那样经营一家公司。你知道,但这也有它自己的优缺点。所以这取决于个人,但我认为这是一种自洽(consistent)的公司构建和运营模式。

[原文] [Alex]: hi um I'm curious if there any types of model composability that you're really excited about um maybe other than mixture of experts I'm not sure what you think about like merge model merges Franken merges or any other like things to make model development more composable

[译文] [Alex]: 嗨,我很好奇是否有你真正感到兴奋的模型可组合性(composability)类型?也许除了混合专家模型(Mixture of Experts)之外。我不确定你对模型合并(model merges)、“弗兰肯斯坦式合并(Franken merges)”或其他任何让模型开发更具可组合性的事物有何看法?

[原文] [Andrej Karpathy]: yeah that's a good question um I see like papers in this area but I don't know that anything has like really stuck maybe the composability I don't exactly know what you mean but you know there's a ton of uh work on like uh primary efficient training and things like that I don't know if you would put that in the category of composability in the way I understand it

[译文] [Andrej Karpathy]: 是的,这是个好问题。我看到过这方面的论文,但我不知道是否有任何东西真正立得住脚。关于可组合性,我不完全确定你指什么,但你知道在参数高效训练(parameter efficient training)等方面有大量工作,我不知道你是否会将其归类为你理解的可组合性。

[原文] [Andrej Karpathy]: but um it's only the case that like traditional code is very composable and I would say neural lots are a lot more fully connected uh and less composable by default but they do compose and confine tune as a part of a whole so as an example if you're doing like a system that you want to have chpt and just images or something like that it's very common that you pre-train components and then you plug them in and fine tune maybe through the whole thing as an example

[译文] [Andrej Karpathy]: 但是……传统代码确实非常具有可组合性,而我会说神经网络在默认情况下是更加全连接(fully connected)的,可组合性较差。但它们确实可以组合,并作为一个整体进行微调。举个例子,如果你在做一个系统,想要结合ChatGPT和图像处理之类的,通常的做法是预训练各个组件,然后把它们插在一起,也许再对整个系统进行微调。

[原文] [Nick]: um so you know we've got these next word prediction things do you think there's a path towards building a physicist or a Von noyman type model that has a mental model of physics that's self consistent and can generate new ideas for how do you how do you actually do Fusion how do you get faster than light if it's even possible is is there any path towards that or is it like a fundamentally different Vector in terms of these AI model developments

[译文] [Nick]: 嗯,你知道我们现在有这些“下一个词预测”的东西。你认为是否有一条路径能构建出一个物理学家或冯·诺依曼类型的模型?这种模型拥有自洽的物理心智模型,并能产生新想法,比如如何实现核聚变?如果可能的话,如何实现超光速?是有通向那里的路径,还是说这在AI模型开发方面是一个根本不同的方向?

[原文] [Andrej Karpathy]: I think it's fundamentally different in some in one aspect I guess like what you're talking about maybe is just like capability question because the Curr models are just like not good enough and I think there are big rocks to be turned here and I think people still haven't like really seen what's possible in the space uh like at all and I like roughly speaking I think we've done step one of alpha go this is what the team we've done imitation learning part

[译文] [Andrej Karpathy]: 我认为在某个方面它是根本不同的。我想你谈论的可能只是能力问题,因为当前的模型还不够好。我认为这里还有巨大的潜能未被挖掘(big rocks to be turned),人们还没真正看到这个领域可能发生什么……完全没有。粗略地说,我认为我们只完成了AlphaGo的第一步。这就是我们在做的——模仿学习(Imitation Learning)部分。

[原文] [Andrej Karpathy]: uh there's step two of Alo which is the RL and people haven't done that yet and I think it's going to fundamentally like this is the part that actually made it work and made something superum uh and so I think uh this is uh I think there's like big rocks in capability to still be turned over here um and uh you know the details of that like are are kind of tricky potentially but I think this is we just haven't done step two of alphao long story short and we've just done imitation

[译文] [Andrej Karpathy]: 还有AlphaGo的第二步,也就是强化学习(RL),而人们还没做这个。我认为这将是根本性的……这才是真正让它奏效并创造出超人类能力的部分。所以我认为在能力方面还有巨大的潜能。你知道,其中的细节可能有点棘手,但长话短说,我认为我们只是还没做AlphaGo的第二步,我们只做了模仿。

[原文] [Andrej Karpathy]: and I don't think that people appreciate like for example um number one like how terrible the data collection is for things like jpt like say you have a problem like some prompt is some kind of mathematical problem a human comes in and gives the ideal solution right to that problem the problem is that the human psychology is different from the model psychology what's easy or hard for the mo for the human are different to what's easy or hard for the model

[译文] [Andrej Karpathy]: 我觉得人们并没有意识到,例如第一点,像GPT这类东西的数据收集有多糟糕。比方说你有一个问题,某种数学问题的提示词,一个人过来给出了该问题的理想解决方案,对吧?问题在于人类心理学与模型心理学是不同的。对人类来说容易或困难的事,与对模型来说容易或困难的事是不同的。

[原文] [Andrej Karpathy]: and so human kind of fills out some kind of a trace that like comes to the solution but like some parts of that are trivial to the model and some parts of that are massive leap that the model doesn't understand and so um you're kind of just like losing it and then everything else is polluted by that later

[译文] [Andrej Karpathy]: 人类填写某种推导过程得出解决方案,但其中有些部分对模型来说是琐碎的,而有些部分则是模型无法理解的巨大跳跃。所以……你有点像是失去了它,然后后面的所有东西都被污染了。

[原文] [Andrej Karpathy]: and so like fundamentally what you need is the model my the model needs to practice itself uh how to solve these problems it needs to figure out what works for it or does not work for it uh maybe maybe it's not very good at four-digit Edition so it's going to fall back and use a calculator uh but it needs to learn that for itself based on its own capability and its own knowledge so that's number one is like that's totally broken I think it's a good initializer though um for something agent likee

[译文] [Andrej Karpathy]: 所以根本上你需要的是模型……模型需要自我练习(practice itself)如何解决这些问题。它需要弄清楚什么对它有效,什么无效。也许它不擅长四位数加法,所以它会退一步去用计算器。但它需要基于自己的能力和知识去学习这一点。所以这是第一点,目前的做法完全坏掉了(totally broken)。虽然我认为作为某种智能体系统的初始化器(initializer),它还是不错的。

[原文] [Andrej Karpathy]: and then the other thing is like we're doing reinforcement learning from Human feedback but that's like a super weak form of reinforcement learning doesn't even count as reinforcement learning I think like what is the equivalent in Alpha go for rhf it's like what is what is the reward model it's it's a what I call it's a Vibe check

[译文] [Andrej Karpathy]: 另一件事是,我们正在做基于人类反馈的强化学习(RLHF),但这是一种超级弱的强化学习形式,我觉得它甚至算不上强化学习。我想,AlphaGo里对应RLHF的东西是什么?它的奖励模型是什么?我称之为“氛围检查(Vibe check)”。

[原文] [Andrej Karpathy]: U like imagine like if you wanted to train like an alpha go rhf it would be giving two people two boards and like said which one do you prefer and then you would take those labels and you would train model and then you would ARL against that what are the issues with that it's like number one that's it's just Vibes of the board that's what you're training against

[译文] [Andrej Karpathy]: 想象一下,如果你想训练一个“AlphaGo RLHF版”,那就好比给两个人看两个棋盘局面,然后问“你更喜欢哪一个?”。然后你拿这些标签去训练模型,再针对它进行强化学习。这有什么问题?第一,这只是针对棋盘的“氛围”进行训练。

[原文] [Andrej Karpathy]: number two if it's a reward model that's a neural nut then it's very easy to overfit to that reward model for the model you're optimizing over and it's going to find all these spous uh uh ways of uh hacking that massive model is the problem uh so alphago gets around these problems because they have a very clear objective function you can ARL against it so rlf is like nowhere near I would say RL is like silly and the other thing is imitation learning super silly RL HF is nice Improvement but it's still silly

[译文] [Andrej Karpathy]: 第二,如果奖励模型是一个神经网络,那么你正在优化的模型非常容易对那个奖励模型过拟合(overfit)。它会找到所有那些虚假的(spurious)方式来通过作弊搞定那个庞大的模型,这就是问题所在。AlphaGo避开了这些问题,因为它们有一个非常清晰的目标函数,你可以针对它进行RL。所以RLHF还差得远呢。我会说RL(目前的做法)很傻,模仿学习超级傻,RLHF是不错的改进,但依然很傻。

[原文] [Andrej Karpathy]: and I think people need to look for better ways of training these models so that it's in the loop with itself and its own psychology and I think we're uh there will probably be unlocks in that direction so it's sort of like graduate school for AI models it needs to sit in a room with a book and quietly question itself for a decade yeah I think that would be part of it

[译文] [Andrej Karpathy]: 我认为人们需要寻找更好的方法来训练这些模型,让它与自身及自己的心理形成闭环。我认为那个方向可能会有解锁。所以这有点像是AI模型的“研究生院”——它需要拿着一本书坐在房间里,静静地自我提问十年。是的,我认为那将是其中的一部分。

[原文] [Yi]: yeah uh it's cool to be to be uh optimal and uh and and practical at the same time so I would I would be asking like how would you be align the priority of like a either doing cost reduction and revenue generation or be like finding the better quality models with like better reasoning capabilities how would you be aligning that

[译文] [Yi]: 是的,既要优化又要实用是很酷的。所以我想问,你会如何排列优先级?是做降低成本和创收,还是去寻找具有更好推理能力的更高质量模型?你会如何协调这两者?

[原文] [Andrej Karpathy]: so maybe I understand the question I think what I see a lot of people do is they start out with the most capable model that doesn't matter what the cost is so you use uh gp4 you use super prompt it Etc you do rag Etc so you're just trying to get your thing to work so you go after you're go you're going after uh sort of accuracy first and then you make concessions later

[译文] [Andrej Karpathy]: 也许我理解了你的问题。我看到很多人做的是,他们从能力最强的模型开始,不管成本是多少。所以你使用GPT-4,你使用超级提示词(super prompt)等等,你做RAG(检索增强生成)等等。你只是试图让你的东西跑起来。所以你先追求准确性,然后再做让步。

[原文] [Andrej Karpathy]: so I would say go after performance first and then you make it cheaper later um it's kind of like the Paradigm that I've seen a few people that I talked to about this kind of U say works for them... just get your thing to work really well because if you have a thing that works really well then one other thing you can do is you can distill that right... so I would say I would always go after sort of get it to work as well as possible no matter what first and then make it cheaper is the thing I would suggest

[译文] [Andrej Karpathy]: 所以我会说先追求性能,然后再让它变便宜。这有点像是我与之交谈过的几个人所说的对他们行之有效的范式……先把你的东西做得非常好。因为如果你有一个工作得非常好的东西,你能做的另一件事就是蒸馏(distill)它,对吧……所以我建议,无论如何先追求让它尽可能好地工作,然后再降低成本。

[原文] [Sam]: Hi Sam hi um one question um so this past year we saw a lot of kind of um impressive results from open source ecosystem I'm curious what your opinion is of how that will continue to keep Pace or not keep Pace with closed Source development um as the models continue to improve in scale

[译文] [Sam]: 嗨,我是Sam。有个问题,过去一年我们看到开源生态系统取得了很多令人印象深刻的成果。我很好奇你的看法,随着模型规模不断提升,开源将如何继续跟上或跟不上闭源发展的步伐?

[原文] [Andrej Karpathy]: yeah I think that's a very good very good question I don't I don't really know fundamentally like these models are so Capital intensive right like one thing that is really interesting is for example you have Facebook and meta and so on who can afford to train these models at scale but then it's also not part of it's not the thing that they do and it's not invol like their money printer is unrelated to that and so they have actual incentive to um potentially release some of these models so that they uh empower the ecosystem as a whole so they can actually borrow all the best ideas so that to me makes sense

[译文] [Andrej Karpathy]: 是的,我认为这是一个非常好的问题。我真的不知道。根本上讲,这些模型是如此资本密集型,对吧?一件真正有趣的事情是,例如Facebook和Meta等公司,他们负担得起大规模训练这些模型,但这又不是他们业务的核心部分,他们的“印钞机”与此无关。因此他们实际上有动力去发布其中一些模型,以赋能整个生态系统,这样他们实际上可以借鉴所有最好的想法。这对我来说是合理的。

[原文] [Peter]: yeah uh maybe this is like an obvious answer given the previous question but what do you think would make the AI ecosystem cooler and more vibrant or what's holding it back is it you know openness or do you think there's other stuff that is also like a big thing that you'd want to work on

[译文] [Peter]: 是的,鉴于前一个问题,这可能有一个显而易见的答案。但你认为什么会让AI生态系统更酷、更具活力?或者是什么在阻碍它?是开放性吗?还是你认为还有其他你想要致力于解决的大事情?

[原文] [Andrej Karpathy]: yeah I certainly think like one big aspect of is just like the stuff that's available I had a tweet recently about like number one build the thing number two build the ramp I would say there's a lot of people building a thing I would say there's lot a lot less happening of like building ramps so that people can actually understand all this stuff

[译文] [Andrej Karpathy]: 是的,我确实认为其中一个重要方面是可用的资源。我最近发了一条推文,大意是:第一,造出那个东西(build the thing);第二,造出坡道(build the ramp)。我会说有很多人在造东西,但我会说造“坡道”的人要少得多——那些让人们真正能理解所有这些东西的途径。

[原文] [Andrej Karpathy]: I would love for people to be a lot more open uh uh with respect to you know what they've learned how they've trained all this how what works what doesn't work for them Etc and um yes just from us to like learn a lot more from each other that's number one

[译文] [Andrej Karpathy]: 我希望人们能更加开放,关于他们学到了什么、他们是如何训练这些的、什么对他们有用、什么没用等等。是的,只是为了让我们能更多地互相学习,这是第一点。

[原文] [Michael]: to get to like the the next big performance leap uh from Models do you think that it's sufficient to modify the Transformer architecture with say uh thought tokens or activation beacons or do we need to throw that out entirely um and come up with a new fundamental building block to take us to the next big step forward or AGI

[译文] [Michael]: 为了实现模型的下一个重大性能飞跃,你认为仅仅修改Transformer架构(比如增加思维token或激活信标)就足够了吗?还是我们需要完全抛弃它,提出一个新的基础构建模块,来带我们迈向下一个大台阶或AGI?

[原文] [Andrej Karpathy]: yeah I think I think that's a good question um I think well the first thing I would say is like Transformer is amazing is just like so incredible I don't think I would have seen that coming for sure um like for a while before the Transformer arrived I thought there would be a insane diversification of neural networks and that was not the case it's like complete opposite actually it's a complete like it's like all the same model actually so it's incredible to me that we have that

[译文] [Andrej Karpathy]: 是的,我觉得这是个好问题。我想首先我要说的是,Transformer太神奇了,简直不可思议。我以前肯定没想到会这样。在Transformer出现之前的一段时间里,我以为神经网络会出现疯狂的多样化,结果并非如此。实际上完全相反,完全就像是……实际上全是同一个模型。所以我们拥有它这件事对我来说太不可思议了。

[原文] [Andrej Karpathy]: I feel very optimistic that someone will be able to find a pretty big change to how we do things today... but also on the Transformer and like I mentioned these levers of precision and sparcity and as we drive that and together with the codesign of the hardware... I to some extent also I would say like Transformer is kind of designed for the GPU by the way like that was the big leap I would say in the Transformer paper... because the recurrent neural network has sequential dependencies terrible for GPU uh Transformer basically broke that through the attention

[译文] [Andrej Karpathy]: 我非常乐观地认为,有人能够找到一种非常大的改变来改进我们要做的……但也可能是在Transformer基础上。就像我提到的精度(Precision)和稀疏性(Sparsity)这些杠杆,随着我们推动这些,并结合硬件的协同设计……在某种程度上,我也想顺便说一句,Transformer有点像是为GPU设计的。那是Transformer论文的一大飞跃……因为循环神经网络(RNN)有顺序依赖性,对GPU来说很糟糕。Transformer通过注意力机制基本上打破了这一点。

[原文] [Andrej Karpathy]: I think it's very likely we'll see changes to it still but it's been it's been proven like remarkably resilient I have to say

[译文] [Andrej Karpathy]: 我认为很有可能我们仍会看到它的变化,但我不得不说,它已经被证明具有惊人的韧性。

[原文] [Stephanie Zhan]: yeah as a parting message to all the founders and builders in the audience what advice would you give them as they dedicate the rest of their lives to helping shape the future of AI

[译文] [Stephanie Zhan]: 好的,作为给在座所有创始人和建设者的临别赠言,当他们致力于用余生帮助塑造AI的未来时,你会给他们什么建议?

[原文] [Andrej Karpathy]: uh so yeah I don't I don't have super I don't usually have crazy generic advice I think like maybe the thing that's top of my mind is I I think uh founders of course care a lot about like their startup I would I also want like how do we have a vibrant ecosystem of startups how do startups continue to win especially with respect to like big Tech and how do we how how's the E how how does the ecosystem become healthier and what can you do

[译文] [Andrej Karpathy]: 嗯,所以我没有……我通常没有什么疯狂的通用建议。我想我脑海中最先想到的是,我认为创始人当然非常关心他们自己的初创公司。我也希望……我们如何拥有一个充满活力的初创公司生态系统?初创公司如何继续赢下去,特别是相对于大型科技公司?我们如何让生态系统变得更健康?你能做什么?

[原文] [Stephanie Zhan]: sounds like you should become an investor amazing um thank you so much for joining us Andre for this and also for the whole day today

[译文] [Stephanie Zhan]: 听起来你应该成为一名投资人。太棒了。非常感谢你加入我们,Andre,感谢你的分享,也感谢你今天一整天的陪伴。