We're All Addicted To Claude Code

章节 1:从“马拉松”到“仿生膝盖”——Claude Code 与 CLI 的复兴

📝 本节摘要

本节作为访谈开篇,主持人介绍了前 OpenAI Codex 创始成员 Kelvin French。Gary Tan 分享了自己从“代码马拉松选手”因管理工作被迫退役,如今借助 Claude Code 像装上了“仿生膝盖”般重返编程一线的经历。对话深入探讨了一个反直觉的现象:在 AI 时代,复古的命令行界面(CLI)因其对代理(Agent)的友好性和组合性,意外击败了被视为未来的集成开发环境(IDE),成为开发者的新宠。

[原文] [Host]: i feel like when I'm using quad code it's like oh I feel like I'm flying through the code when it's in your CLI this thing can debug nested delayed jobs like five levels in and figure out what the bug was and then write a test for it and it never happens again this is insane i think everyone who's experimenting with this stuff on like a hobbyist level or at like a very small startup they're just pushing the coding agents as far as they can go because it's like you don't really have time to figure out anything else like as a startup you have limited runway you're just going to orient around speed i think at a bigger company you have a lot more to lose what are some of the tips to become a top 1% user of coding agents yeah what's your stack hey everyone welcome back to another episode of the Ly Cone gary are you are you ready to record

[译文] [Host]: 我感觉当我使用 Claude Code(注:原文误听为 quad code)时,就像在代码中飞行一样。当它在你的命令行界面(CLI)中运行时,这家伙可以调试嵌套了五层深的延迟任务,找出 Bug 是什么,然后为它编写测试,确保它不再发生,这太疯狂了。我认为每一个在业余水平或非常小的初创公司试验这些东西的人,都在尽可能地挖掘编程代理(Coding Agents)的极限,因为你真的没有时间去搞清楚其他事情。作为一家初创公司,你的资金跑道有限,你只能以速度为导向。我认为在大公司,你有更多东西可能会失去。成为前 1% 的编程代理用户的秘诀是什么?你的技术栈是什么?大家好,欢迎回到新一期的 Lightcone(注:原文误听为 Ly Cone)。Gary,你准备好录制了吗?

[原文] [Gary]: i'm I'm in plan mode right now but okay yeah I guess it's time sorry about that

[译文] [Gary]: 我现在正处于“规划模式”,不过好吧,我想是时候了,抱歉。

[原文] [Host]: well welcome to another episode of the light cone and today we have an incredible guest Kelvin French Owen he's one of the first people to create codecs at OpenAI and before that he started Segment which is multi-billion dollar company that got to a very successful exit kelvin welcome back

[译文] [Host]: 欢迎回到新一期的 Lightcone,今天我们请到了一位不可思议的嘉宾 Kelvin French Owen。他是 OpenAI Codex 的首批缔造者之一,在那之前他创立了 Segment,这是一家价值数十亿美元并在最后成功退出的公司。Kelvin,欢迎回来。

[原文] [Kelvin]: thanks for having me

[译文] [Kelvin]: 谢谢邀请。

[原文] [Gary]: i guess what a crazy time for all of us uh I recently got very very addicted to Claude Code and uh I would describe it as like 10 years ago I was a marathon runner and I love doing it and then I suffered a catastrophic knee injury which is called manager mode and I uh stopped coding which is tragic and horrible uh but now the last nine days have been like this incredible unlock of all the things I remember being able to do and it's like you know I got a new total knee replacement and actually it's a bionic knee and it allows me to run five times faster what's your take on it because you're I mean right out there at the forefront of it i mean codeex pioneered all of the a lot of the ideas that now like everyone still uses and codeex is still evolving too

[译文] [Gary]: 对于我们所有人来说,这真是一个疯狂的时刻。我最近对 Claude Code 非常非常上瘾。我会这样描述它:10 年前我是一名马拉松跑者,我热爱跑步,然后我遭遇了一次灾难性的膝盖受伤,这种伤叫做“经理模式”(Manager Mode),于是我停止了写代码,这是悲剧且可怕的。但在过去的九天里,就像是一次不可思议的解锁,唤醒了我记忆中所有能做的事情。这就像我换了一个全新的膝关节置换,而且实际上是一个“仿生膝盖”,它让我跑得比以前快五倍。你对此有什么看法?因为你是身处最前沿的人。我的意思是,Codex 开创了许多现在大家仍在使用的理念,而且 Codex 也在不断进化。

[原文] [Kelvin]: for brief context when I was at openai um I was working on the codeex web project at the time cursor was out in the market and they had kind of built this shim around I think it was set 3.5 and it was able to work in your IDE fod code had just come out uh and it was working as a CLI and we kind of had this idea like hey in the future coding is really going to feel more like talking to a co-orker like you're going to send off a question and then they'll go off and do something and come back to you with a PR uh and so that's where we started with this web view uh and that's what we were building i think directionally that's still kind of correct for where things should go but obviously now everyone is coding with CLIs instead like they're using those tools a lot more whether it's cloud code or whether it's codecs

[译文] [Kelvin]: 简单介绍一下背景,当我在 OpenAI 时,我正在负责 Codex Web 项目。当时 Cursor 已经上市了,他们围绕 GPT-3.5(注:原文误听为 set 3.5)构建了一个垫片层(Shim),使其能够在你的 IDE(集成开发环境)中工作。当时 Claude Code(注:原文误听为 fod code)刚刚推出,它是作为一个 CLI(命令行界面)运行的。我们当时有一种想法,觉得未来的编程真的会更像是与同事交谈:你发送一个问题,然后他们离开去做些什么,最后带回一个 PR(Pull Request,代码合并请求)给你。这就是我们从 Web 视图开始构建的原因。我认为从方向上讲,这对于未来的发展仍然是正确的,但显然现在大家都在用 CLI 写代码,人们更多地使用这些工具,无论是 Claude Code 还是 Codex。

[原文] [Kelvin]: and I think at least for me kind of the lesson in that is I think in some sense you're right that like everyone is going to become a manager in the future or at least that's my hottake but in order to get there there are steps along the way and you have to really build a lot of trust in the model and understand what it's doing

[译文] [Kelvin]: 对我来说,其中的教训是,我认为在某种意义上你是对的,未来每个人都会成为“经理”,或者至少这是我的暴论。但为了达到那个目标,中间还有很多步骤,你必须真正建立对模型的信任,并理解它在做什么。

[原文] [Host]: you recently came over to cloud code what's the transition been like in terms of as using it as your you know one of your stacks

[译文] [Host]: 你最近转到了 Claude Code(注:原文误听为 cloud code),把它作为你的技术栈之一使用,这个转变过程是怎样的?

[原文] [Kelvin]: yeah yeah so cloud code is uh certainly my kind of like daily driver today and honestly this has switched every few months uh for a while I was deeply in cursor i think their new model which is really fast is actually quite good then I kind of moved over to quad code especially with Opus cloud code is a really interesting product and I think it's underrated how good the both product and model are working together if you study them closely

[译文] [Kelvin]: 是的,Claude Code 确实是我现在的日常主力工具。老实说,这每隔几个月就会变一次。有一段时间我深度使用 Cursor,我觉得他们的新模型非常快,确实相当不错。然后我有点转向了 Claude Code(注:原文误听为 quad code),特别是配合 Opus 模型。Claude Code 是一个非常有趣的产品,如果你仔细研究它们,你会发现它的产品和模型配合得有多好,这一点被低估了。

[原文] [Kelvin]: I think one of the things that claude code does in particular that's really amazing is split up context well and so if you look at uh I don't know things like skills or sub agents like when you ask cla code to do something it will typically spawn an explore sub agent or like multiple ones and basically each of those are running haik coup to traverse the file system and kind of like explore what's there and they're doing it in their own context window

[译文] [Kelvin]: 我认为 Claude Code 做得特别好的一点是它很好地拆分了上下文。如果你观察它的技能或子代理(Sub-agents),当你要求 Claude Code 做某事时,它通常会生成一个“探索子代理”或多个子代理。基本上每一个子代理都在运行 Haiku 模型来遍历文件系统,探索那里的内容,并且它们是在各自独立的上下文窗口中进行这些操作的。

[原文] [Kelvin]: and I think enthropic has kind of like figured something out here around given a task does that task fit in the context window or should I actually like split it into many more and the models are like insanely good at this which I think gives them really good results and I think the fascinating thing is because it's on the terminal is the purest form for composible atomic integrations because if you came from ID first world which is where cursor was and I suppose codeex too this concept of uh finding the context more free form wouldn't come out so natural right string which is so unique yeah

[译文] [Kelvin]: 我认为 Anthropic 在这方面想明白了一些事情,即给定一个任务,该任务是否适合放入上下文窗口,还是应该将其拆分成更多部分?模型在这方面表现得好得惊人,这给它们带来了非常好的结果。我认为最迷人的一点是,因为它在终端(Terminal)上运行,这是可组合原子集成的最纯粹形式。因为如果你来自“IDE 优先”的世界——这也是 Cursor 和我想 Codex 所在的地方——这种更自由形式地寻找上下文的概念就不会那么自然地出现。

[原文] [Gary]: and I personally I was surprised I don't know how you all feel but I was surprised that like weird it's like a weird retro future that like the CLI which are the technology from 20 years ago have somehow beaten out all the actual ideides which were supposed to be the future

[译文] [Gary]: 我个人感到很惊讶,不知道你们感觉如何,但这就像一个奇怪的“复古未来”(Retro Future):20 年前的技术——命令行界面(CLI),不知何故击败了所有本该代表未来的集成开发环境(IDE)。

[原文] [Kelvin]: 100% yeah and I I think it's important actually to claude code that it's not an IDE because it sort of distances you from the code that's being written like IDEs are all about exploring files right and you're like trying to keep all the state in your head and understand what's going on but the fact that a CLI is like a totally different thing means that they have a lot more freedom in terms of how it feels and I I don't know about you but I feel like when I'm using cloud code it's like oh I feel like I'm flying through the code you know it's like there's all sorts of things going there's like little progress indicators it's kind of like giving me status updates but like the code that's being written is not the front and center thing

[译文] [Kelvin]: 100% 同意。而且我认为对于 Claude Code 来说,它不是一个 IDE 这一点其实很重要,因为它某种程度上让你与正在编写的代码保持了距离。IDE 的核心在于浏览文件,对吧?你需要试图把所有状态保持在脑子里,理解正在发生什么。但 CLI 是一个完全不同的东西,这意味着它们在体验上有更多的自由度。我不知道你怎么想,但我感觉当我使用 Claude Code 时,就像我在代码中飞行一样。你知道,有各种事情在发生,有小进度条,它在给我状态更新,但正在编写的代码本身并不是最核心的焦点。

[原文] [Gary]: i mean dev environments are so messy i mean I really like how clean a sandbox conceptually is but then I just ran into all these crazy issues like trying to do just simple testing right it needs to access Postgress and then it can't do it or you know my codeex.md ended up being 20 lines long and even then it didn't work when it's in your CLI it could just access your development database i mean I'm not sure if I'm supposed to do this but I've actually also had it access my production database to do and uh it can just do it it's like yeah okay here like I looked into it and I think this happened and I'm going to debug this you know concurrency issue i was like oh my god like this thing can debug nested delayed jobs like five levels in and figure out what the bug was and then write a test for it and it never happens again this is insane

[译文] [Gary]: 我的意思是,开发环境太混乱了。我很喜欢沙箱(Sandbox)在概念上的整洁,但我随后遇到了一堆疯狂的问题,比如试图做简单的测试,它需要访问 Postgres 数据库但做不到,或者你知道,我的 codeex.md 文件写了 20 行长,结果还是不工作。而当它在你的 CLI 中运行时,它可以直接访问你的开发数据库。我不确定我是否应该这样做,但我实际上甚至让它访问了我的生产数据库去做事,而它真的能做到。它就像是:“好的,我看了一下,我觉得发生了这种情况,我要去调试这个并发问题。”我就想:“天哪,这东西可以调试嵌套了五层深的延迟任务,找出 Bug 是什么,然后为它编写测试,确保它不再发生。”这太疯狂了。

[原文] [Gary]: yeah and I think that distribution mode is frankly underrated like thinking about a cursor or a cloud code or a codec CLI the fact that you can just download it and use it without having to get it permissions or anything makes a huge difference and actually I was playing around with a product the other day where you download a desktop app and then it execs the clawed code that you have running on your laptop and uses that and communicates back via an MCP server to the desktop product mhm and it's like this is a very interesting way of now starting to work with your laptop where you don't have to get anyone's permission to do it you just download the product and go

[译文] [Gary]: 是的,而且坦率地说,我认为这种分发模式被低估了。想想 Cursor、Claude Code 或 Codex CLI,事实上你可以直接下载并使用它,而无需获取许可或任何东西,这产生了巨大的差异。实际上前几天我在试用一个产品,你下载一个桌面应用,然后它会执行你在笔记本电脑上运行的 Claude Code,利用它并通过 MCP(Model Context Protocol)服务器与桌面产品通信。这就像是一种非常有趣的现在开始使用笔记本电脑工作的方式,你不需要获得任何人的许可,只需下载产品就可以开始。


章节 2:上下文工程与“去中心化”分发——构建 Top 1% 的代理工作流

📝 本节摘要

本节重点讨论了在 AI 时代,软件分发模式正从“自上而下”的 CTO 决策转向“自下而上”的开发者直接采用。Kelvin 分享了构建和使用编程代理(Coding Agents)的核心秘诀:上下文管理(Context Management)。他对比了 Cursor 的语义搜索与 Claude Code/Codex 的 grep 暴力检索策略,指出代码的高“上下文密度”使得后者更为有效。此外,两人深入探讨了随着对话变长,模型会出现“上下文中毒”和进入“愚笨区(Dumb Zone)”的现象,并介绍了一种利用“金丝雀(Canary)”测试来检测模型记忆力的巧妙技巧。

[原文] [Gary]: it's super interesting that in a world where things are changing so fast you really want your product to have a bottoms up distribution not top down because like top down is like just too slow like the CTO of a company is going to be like have all these concerns about security and privacy and what if and control exactly versus like the engineers just like install the thing and start using like this thing is amazing

[译文] [Gary]: 非常有趣的是,在一个变化如此之快的世界里,你真的希望你的产品采用自下而上的分发方式,而不是自上而下。因为自上而下实在是太慢了,比如公司的 CTO 会有一堆关于安全、隐私、“如果……怎么办”以及控制权的顾虑,而工程师们只是直接安装这东西,开始使用,并觉得“这东西太棒了”。

[原文] [Kelvin]: yeah I think that's right the one thing I do struggle with I mean I'm like a B2B enterprise guy generally but I feel like there's some amount of moat that happens when you do that top down sale and there's got to be some company who manages to crack it where it's like oh this is a thing that everyone has access to maybe individual people can take it up that was the original um Netscape Navigator it was free for non-commercial use and then uh people would just download it and use it for commercial use and then they could just track down the IPs and figure out uh exactly how many clients were in all of these different companies and say you should pay for this you're in violation but all you have to do is buy a license so I'd be curious if you could do that work again here

[译文] [Kelvin]: 是的,我认为是对的。但我确实纠结的一点是——我通常是个做 B2B 企业级业务的人——我觉得当你做自上而下的销售时,会形成某种护城河。肯定会有公司能攻克这一点,比如让每个人都能接触到它,也许个人用户可以先用起来。这就像最初的 Netscape Navigator(网景浏览器),它对非商业用途免费,然后人们就下载并在商业用途中使用它。之后他们只需追踪 IP 地址,算出这些不同公司里到底有多少客户端,然后说:“你应该为此付费,你违规了,但你只需要买个许可证就行。”所以我很好奇你是否能在这里重现这种做法。

[原文] [Gary]: What do you think are some of the tips for anyone that wants to build a coding agent since you've uh done it a lot what are what are some uh now lessons that you learned that you want to share

[译文] [Gary]: 对于那些想构建编程代理的人,你有什么建议吗?既然你已经做过很多次了,你现在学到了哪些教训想分享给大家?

[原文] [Kelvin]: i mean I think the number one thing uh is managing context well... the things that you can do are figure out like hey what context should I be supplying to this agent to get the best possible result and so for cloud code if you watch it working it's like oh I'm going to like spawn a bunch of these explore sub agents they will like search for different patterns in the file system they will come back uh they will have this context they'll summarize it for me and then I'll have someplace to go

[译文] [Kelvin]: 我认为第一件事就是管理好上下文……你能做的事情是搞清楚:“嘿,我应该向这个代理提供什么上下文才能获得最佳结果?”如果你观察 Claude Code 的工作方式,它就像是:“哦,我要生成一堆‘探索子代理’,它们会在文件系统中搜索不同的模式,然后返回,带回这些上下文并为我总结,然后我就知道下一步该去哪了。”

[原文] [Kelvin]: it's interesting watching like different agents structure this context uh like I think cursor takes an approach where they actually do semantic search where they embed everything and figure out like hey what query is closest to this if you look at a codeex or a cloud code uh they actually just use like grip and I think that works because well yeah it works very well because code is very context dense um like if you think about lines of code it's like each line is probably less than 80 characters there's not a lot of like big like data blobs or like JSON in your codebase maybe there's some but not a lot you can respect git ignore to figure out and like filter out stuff that's just not relevant or is like packaged and you can use gp and rip grep to like find context around the code which probably gives you a good sense for what that code is doing

[译文] [Kelvin]: 观察不同的代理如何构建上下文很有趣。比如 Cursor 采用的方法是语义搜索,他们将所有内容进行嵌入(Embedding),然后计算“嘿,哪个查询与此最接近”。如果你看 Codex 或 Claude Code,它们实际上只是使用像 `grep` 这样的工具。我认为这很有效,因为代码的上下文密度非常高。你想想代码行,每一行可能不到 80 个字符,代码库里没有太多像大数据块或 JSON 那样的东西(也许有一些但不多)。你可以遵循 .gitignore 来过滤掉不相关或打包好的东西,然后使用 grepripgrep 来查找代码周围的上下文,这大概能让你很好地了解代码在做什么。

[原文] [Kelvin]: and you can navigate the f folder structure and also elements are really good at emitting very complicated gp expressions that would like torture a human yes yeah yeah yeah yeah this is like the RL in practice

[译文] [Kelvin]: 而且你可以浏览文件夹结构。此外,大模型(LLMs)非常擅长生成那些极其复杂、会让原本人类感到折磨的 grep 表达式。是的,这就是强化学习(RL)的实际应用。

[原文] [Gary]: so given this is how a lot of the superpowers for the best coding agents is context engineering what are some of the tips to become a top 1% user of coding agents yeah what's your stack yeah what what do you do to be so productive with it

[译文] [Gary]: 既然最好的编程代理的超能力主要来自于上下文工程,那么成为前 1% 的编程代理用户的秘诀是什么?你的技术栈是什么?你做了什么让你如此高效?

[原文] [Kelvin]: one is if you're able to use uh just generally far less code and plumbing um so a lot of what I do is like deploy stacks on like Verscell or Nex.js or like Cloudflare workers where there's kind of like already a bunch of boiler plate like taking care of for you... it's like oh like everything is pretty roughly defined in this like one or 200 lines of code i tend to operate more towards microservices for that as well or like individual packages that are fairly well structured

[译文] [Kelvin]: 一点是如果你能总体上使用更少的代码和管道设施(Plumbing)。所以我做的很多事情是在像 Vercel、Next.js 或 Cloudflare Workers 这样的平台上部署技术栈,那里已经有一堆样板代码为你处理好了……这就像是:“哦,所有东西都在这一两百行代码里定义好了。”我也倾向于为此更多地使用微服务,或者结构良好的独立包。

[原文] [Kelvin]: I think context poisoning is a real thing where it kind of like goes down one loop and it will continue because it has this persistence but it's referring back to tokens which are like not right in terms of pursuing a solution um and so one thing that I often do is like very actively clear context

[译文] [Kelvin]: 我认为“上下文中毒”(Context Poisoning)是真实存在的。它就像是钻进了一个死循环,并且因为它具有持久性而继续下去,但它回溯引用的那些 Token 对于解决问题来说是错误的。所以我经常做的一件事就是非常主动地清除上下文

[原文] [Gary]: like how often usually uh when it gets above like 50% tokens oh wow yeah yeah i don't there's this guy Dex uh from this company Human Layer that was actually another YC company... he has this concept of like the LLM's reaching the dumb zone where it's like after a certain amount of tokens uh it just starts like degrading in quality

[译文] [Gary]: 通常多久一次?当 Token 超过 50% 的时候吗?哦哇,是的。有个叫 Dex 的家伙,来自 Human Layer 公司(也是一家 YC 公司),他提出了一个概念叫“愚笨区”(Dumb Zone)。意思是当超过一定数量的 Token 后,大模型的质量就开始下降。

[原文] [Kelvin]: and I actually think that's very true especially if you think about like how the reinforcement learning might work like imagine you're a college student you're taking an exam in the first five minutes of that exam you're like "Oh I have all the time in the world like I'll do a great job i'll think through each of these problems." Let's say you have like five minutes left and you still have half the exam left you're like "Oh man I just got to do whatever I can." Like that's the LM with a context window

[译文] [Kelvin]: 我认为这非常真实,特别是如果你想想强化学习是如何运作的。想象你是一个正在考试的大学生,在考试的前五分钟,你会想:“哦,我有全世界的时间,我要好好做,仔细思考每一个问题。”假设你只剩下五分钟了,但考卷还剩一半,你会想:“天哪,我只能尽力而为了。”这就是带有上下文窗口的大模型的状态。

[原文] [Kelvin]: one of the tricks that uh I think founders use is you put like a canary at the beginning of the context there's something very esoteric that it's like something really funny it's like I don't know my name is Calvin and blah blah blah i drank tea at 8 am some random fact and then as you keep going you ask it do you remember what's my name do you remember when I drank tea and then when it starts forgetting that I think is a bit of a sign that the context has poison that's like one trick I've seen people do

[译文] [Kelvin]: 我认为创始人们使用的一个技巧是,在上下文的开头放一只“金丝雀”(Canary,意指测试探针)。放一些非常晦涩难懂或者好笑的东西,比如“我叫 Kelvin,巴拉巴拉,我在早上 8 点喝了茶”这种随机事实。然后随着你继续对话,你问它:“你还记得我叫什么名字吗?你还记得我什么时候喝茶了吗?”当它开始忘记这些时,我认为这就是上下文已经中毒的迹象。这是我见过人们使用的一个技巧。


章节 3:路线之争——Anthropic 的“工具论”与 OpenAI 的“AGI 论”

📝 本节摘要

本节深入探讨了 Anthropic 与 OpenAI 在构建编程代理时截然不同的哲学理念。Kelvin 指出,Anthropic 更倾向于打造“像人一样工作”的工具,强调与现有工作流的自然融合(如去五金店买材料造狗屋);而 OpenAI 则致力于追求通用人工智能(AGI),试图创造一种像 AlphaGo 或 3D 打印机那样,以非人类的、甚至“怪异”的方式端到端解决问题的模型。此外,两人还讨论了在算力飞跃的背景下,初创公司如何利用这种“火箭助推器”般的效率优势,在速度上彻底压倒受制于流程的大公司。

[原文] [Gary]: they do a random canery i have not tried this but I fully believe it yeah that's interesting i haven't run across any bugs before compaction but maybe I'm not paying attention but you're saying like that actually is actively something that it just starts doing weirder things that are not like optimal

[译文] [Gary]: 他们会放一个随机的“金丝雀”测试,我没试过,但我完全相信。这很有趣,在“压缩”(Compaction)发生之前我还没遇到过 Bug,但也许是我没注意。不过你是说,它确实会开始主动做一些奇怪的事情,而且不再是最佳方案了吗?

[原文] [Kelvin]: yeah yeah okay i got to be on the lookout for that solvable within cloud code itself like it should be able to basically do some sort of detection like what is saying do your own internal heartbeat around it around the context yeah and I think we're just not there yet like I agree with you in the limit uh right now it's definitely hard to manage context well and I think kind of the way it gets around it is like split up context windows and then try and merge everything but you're sort of still at the limit right now of like everything that lives in context at the end of a cloud code session is kind of fixed

[译文] [Kelvin]: 是的,是的。好吧,我得留意一下。这在 Claude Code(注:原文误听为 cloud code)内部应该是可解的,比如它应该能基本上做某种检测,就像在上下文周围进行某种内部“心跳检测”。是的,我觉得我们还没到那个阶段。虽然从极限上我同意你的看法,但目前管理上下文确实很难。我认为它目前的绕过方式是拆分上下文窗口,然后尝试合并所有内容。但在 Claude Code 会话结束时,所有驻留在上下文中的内容在某种程度上是固定的,所以你现在仍然处于某种极限之中。

[原文] [Kelvin]: it's actually interesting the codeex approach is kind of the opposite and they just wrote about this on the OpenAI blog where it will run compaction like periodically after each turn and so codecs can continue to run for a very long time and if you look at the percentage in the uh the CLI you'll see it like move up and down as compaction runs i guess like there are these very different architectures between cloud code and codeex sound like they're actually deeper in that codeex is actually meant for much longer running jobs so you that's sort of like off the bat a different use case and then the architecture is very different as a result

[译文] [Kelvin]: 其实很有趣,Codex 的方法有点相反。OpenAI 最近在博客上写到了这一点,它会在每轮对话后定期运行“压缩”。所以 Codex 可以持续运行很长时间。如果你看 CLI 中的百分比,你会看到随着压缩的运行,数值会上下波动。我想 Claude Code 和 Codex 之间存在非常不同的架构。听起来 Codex 实际上是为了更长时间运行的任务而设计的,所以从一开始这就是一个不同的用例,因此架构也截然不同。

[原文] [Gary]: i guess right now it seems like CLI's you know 2026 might be the year of CLI but then this other idea that AGI is here and it's actually ASI is around the corner the coding agents right now are really really smart but not smart enough to run on their own for long periods of time but a 10x increase in compute from here are we there like are we at 24 hours or 48 hour running jobs on codecs and that architecture is correct for that world

[译文] [Gary]: 我想目前看起来 CLI 似乎……你知道,2026 年可能是“CLI 之年”。但另一方面,有一种观点认为 AGI(通用人工智能)已经来了,而且 ASI(超级人工智能)指日可待。目前的编程代理真的非常非常聪明,但还没聪明到可以长时间独立运行。但是,如果算力从现在开始增加 10 倍,我们会到达那个点吗?比如在 Codex 上运行 24 小时或 48 小时的任务?那种架构对于那个世界来说是正确的吗?

[原文] [Kelvin]: yeah I think it's a good question it sort of goes back to like kind of the founding DNA of both companies like I feel like Anthropic has always been very big on like building tools for humans where it comes to like oh here's the style of the tone and like here's how it should fit with all of the rest of your work and I think quad code is like a very natural extension of that in a lot of ways it like works like a human would where it's like oh you need to build like I don't know a doghouse or something it's like oh I'll go to the hardware store and I'll build all these materials and I'll like figure out how they all fit together

[译文] [Kelvin]: 是的,我觉得这是个好问题。这某种程度上回到了两家公司的创始 DNA。我觉得 Anthropic 一直非常重视“为人类构建工具”,比如强调语气风格,以及它应该如何配合你的其他工作。我认为 Claude Code(注:原文误听为 quad code)在很多方面都是这种理念的自然延伸。它工作起来就像人类一样:比如你需要造一个狗屋,它会说:“哦,我会去五金店,我会买齐所有材料,然后我会搞清楚怎么把它们组装起来。”

[原文] [Kelvin]: whereas OpenAI really leans into this idea of just like we are going to train the best model and reinforce over time and get it to do longer and longer horizon things uh in this pursuit of artificial general intelligence and so it may not work like a human at all like going back to the doghouse example it's like but AlphaGo didn't either

[译文] [Kelvin]: 而 OpenAI 则真的倾向于这种理念:我们要训练最好的模型,并随着时间推移进行强化学习,让它能做跨度越来越长的事情,以追求通用人工智能(AGI)。所以它可能根本不像人类那样工作。回到狗屋的例子……

[原文] [Gary]: yeah but AlphaGo didn't either it's like oh it's like instead I will have a 3D printer that can print from scratch like a doghouse and it will be exactly what you want and it will take a long time and it will be like very custom and it will do like weird things but it will work you know and like maybe in the limit that's the right call and so it's going to be really interesting to see how they play out

[译文] [Gary]: 是的,但 AlphaGo 也不像人类。这就好比:“与其那样,我不如搞一台 3D 打印机,从零开始打印一个狗屋。它会完全符合你的要求,虽然会花很长时间,而且是非常定制化的,甚至会做一些奇怪的事情,但它能工作。”你知道,也许在极限情况下,这才是正确的选择。所以看它们如何发展将会非常有趣。

[原文] [Gary]: i mean net net it seems like the latter is somewhat inevitable but I like the former so much you know like even this idea that it gs was like I thought about you know 10 years ago I was like yeah I was in there like writing my own really weird reaxes to try to figure out where everything was when I was refactoring or what trying to understand code or whatever so that's the feeling I get when I'm using it it's like I can do five people's worth of work in like a single day it's like rocket boosters it's unbelievable

[译文] [Gary]: 我的意思是,归根结底,后者(AGI 路线)似乎在某种程度上是不可避免的,但我太喜欢前者(工具路线)了。你知道,甚至它使用 grep 这种想法,就像我想起 10 年前,我在那里写自己那些非常奇怪的正则表达式(Regex),试图在重构或理解代码时搞清楚所有东西在哪里。这就是我使用它时的感觉。这就像我能在一天内完成五个人的工作量,简直就像装上了火箭助推器,太不可思议了。

[原文] [Kelvin]: yeah i I think it's going to be really interesting to see how this plays out across large and small companies i think everyone who's experimenting with this stuff on like a hobbyist level or at like a very small startup they're just pushing the coding agents as far as they can go because it's like you don't really have time to figure out anything else like as a startup you have limited runway you're just going to like orient around speed i think at a bigger company you have a lot more to lose and you have all these other internal processes around code review and you probably already hired like a big ENT team and I think it's going to be very strange as like these individual teams of like one person are like hey that team over there isn't doing the right thing like let me just build a prototype that like works better i think at some point it's going to start working better and I think that landscape shift is going to be a very interesting strange thing

[译文] [Kelvin]: 是的,我认为看看这在这一变化如何在小公司和大公司之间展开将会非常有趣。我认为每一个在业余水平或非常小的初创公司试验这些东西的人,都在尽可能地挖掘编程代理的极限。因为你真的没有时间去搞清楚其他事情,作为一家初创公司,你的资金跑道有限,你只能以速度为导向。我认为在大公司,你有更多东西可能会失去,而且你有所有这些围绕代码审查的内部流程,你可能已经雇佣了一个庞大的企业级团队。我觉得这会变得很奇怪:比如一个单人团队会说:“嘿,那边那个团队做的不对,让我直接做一个工作得更好的原型出来吧。”我认为在某个时刻,这种做法会开始产生更好的效果。这种格局的转变将会是一件非常有趣且奇怪的事情。


章节 4:开发者的新常态——“经理模式”、教育变革与 ADHD 优势

📝 本节摘要

本节聚焦于 AI 时代开发者的角色转型与代际差异。Gary 指出,真正的“管理工作”往往是处理海量错误等枯燥事务,年轻一代可能难以适应。Kelvin 认为,资深工程师因具备将想法转化为行动的“经理”特质,将从 AI 工具中获益最多。在教育方面,两人一致认为未来的核心竞争力不再是语法,而是对底层系统(如 Git、HTTP)的理解以及极高的审美(Taste)。最后,他们讨论了一个有趣的观点:曾在传统教育中被视为缺点的“多任务处理”或 ADHD 特质,在需要同时指挥多个 AI 代理的当下,反而成为了一种巨大的竞争优势。

[原文] [Gary]: my 10-year-old is uh you know he has writing assignments every day and then yesterday was the first day where he used AI and then I was like this is not a turn of a phrase that a 10-year-old is capable of doing and then I think about that in this context because we you know are working with a lot of 18 to 22 year olds who you know they've done internships but like they haven't done like manager work like you know we're saying um you know postp productduct market fit uh once you're have job cues of like millions of jobs and like you know hundreds of thousands of errors that's like real management like that's really you know it's horribly unglamorous like combing through hundreds of thousands of errors and then like manually making sure that like the thing works for all of your users in the background how does the next generation understand that can the cloud code bot actually teach people about uh architecture and things like that or you know are you just gonna bump your head into it and users just kind of suffer and you know people have to figure it out

[译文] [Gary]: 我那 10 岁的孩子,你知道他每天都有写作作业。昨天是他第一次使用 AI,我当时的反应是:“这不是一个 10 岁孩子能写出来的措辞。”然后我在这个语境下思考这件事,因为我们和很多 18 到 22 岁的年轻人一起工作,你知道他们做过实习,但他们没做过“经理工作”。就像我们常说的,在产品达到市场契合点(PMF)之后,一旦你有了数百万个任务的作业队列,以及成千上万个错误,那才是真正的“管理”。这真的是极其枯燥乏味的工作,比如梳理成千上万个错误,然后手动确保后台的一切对所有用户都能正常运行。下一代人如何理解这一点?Claude Code 机器人真的能教会人们关于架构之类的东西吗?还是说你只能通过撞得头破血流,让用户受点苦,然后人们才能自己悟出来?

[原文] [Kelvin]: like at least where I find myself spending the most time when it comes to product is figuring out the kind of product model in a sense like what are the things that the user has to understand today um and what are the primitives that they can use to like do whatever they want i always think of Slack like this it's like Slack was in some ways not really a new concept it's like there were many chats that existed before it um but the fact that they had like channels messages and reactions in a simple way that people could just like think about and be like "Oh I understand how to like navigate this." It made a lot of sense for people but then kind of once they were there like it's very hard to change that later on for a user you know it's like oh maybe they wanted to go in more of like a document first way or like maybe right now they're trying to incorporate agents it's like difficult to change the user's mental model and so I at least for myself building products it's like you have to think about that very carefully from an early stage because again whatever you supply to the coding agents as that kind of kernel is going to be what they run with and make more of forever more

[译文] [Kelvin]: 至少对我来说,在做产品时我花时间最多的地方是搞清楚某种“产品模型”。就像是:用户今天必须理解什么?他们可以用哪些原语(Primitives)来做他们想做的事?我总是以 Slack 为例。Slack 在某种程度上并不是一个新概念,在它之前就有即使通讯软件了。但事实上,他们拥有频道、消息和反应(Reactions),而且是以一种简单的方式呈现,让人们一想就能明白:“哦,我知道怎么操作这个。”这对人们来说很有意义。但一旦这种模式确立了,后续想要改变用户的认知就非常难。比如也许他们想转向“文档优先”的方式,或者像现在试图整合代理(Agents),这都很难改变用户的心理模型。所以我自己在做产品时,必须在早期就非常仔细地考虑这一点。因为同样的,无论你提供给编程代理什么样的内核,那都将是它们赖以运行并以此为基础不断生成更多东西的基石。

[原文] [Gary]: yc's next batch is now taking applications got a startup in you apply at y combinator.com/apply it's never too early and filling out the app will level up your idea okay back to the video do you have thoughts just cuz you know the the agent so well like what what types of engineers are going to benefit more than others um from these tools becoming popular

[译文] [Gary]: YC 新一期的批次正在接受申请。如果你有创业的想法,请访问 y combinator.com/apply 申请。永远不会太早,填写申请表本身就能提升你的想法。好了,回到视频。既然你这么了解代理,你有什么想法吗?随着这些工具的普及,哪种类型的工程师会比其他人受益更多?

[原文] [Kelvin]: in general I think that kind of the more senior senior you are the more you benefit um because the agents are so good at taking some sort of idea and then putting it into action if you're able to prompt that in a few words it's kind of like oh now suddenly I had this like idea i I find this so often in open like scrolling through the codebase it's like oh like here's a thing that I wish were different here's a thing that I wish were different here's a thing that I wish were different like just being able to kick those off and then have them come back I think is super empowering and multiplies your impact i think also being able to detect like which sorts of changes are good or bad architecturally is very important or like have a sense for where you might want to flag something to an agent i think engineers who are more organized like managerish uh and there's probably just a missing product to be built here uh maybe something like conductor uh where it's like spread across all of your sessions and kind of reminding you like hey you were working on this thing it's done it needs your input here oh you should switch your attention over to this other thing

[译文] [Kelvin]: 总的来说,我认为你越资深,受益就越多。因为代理非常擅长接受某种想法并将其付诸行动。如果你能用寥寥数语提示它,就像是:“哦,突然我有了一个想法。”我在浏览开源代码库时经常有这种感觉:“哦,这里有个东西我希望它是另一番样子,那里有个东西我希望不同……”能够直接启动这些任务然后让它们完成回来,我认为这超级赋能,能成倍放大你的影响力。我认为还有一点很重要,就是能够从架构上检测哪些变更是好的,哪些是坏的,或者知道在什么地方需要给代理标记一下。我认为那些更有条理、更像“经理”(Managerish)的工程师会受益更多。这里可能仅仅是缺少一个产品还没被造出来,也许叫“指挥家”(Conductor)之类的东西,它能跨越你所有的会话并提醒你:“嘿,你之前在做这件事,它已经完成了,需要你的输入。哦,你应该把注意力转移到那件事上。”

[原文] [Kelvin]: i think that is conductor should add that yeah like uh context management for agents but like we also need context management for humans

[译文] [Kelvin]: 我觉得“指挥家”应该加上这个功能。是的,就像我们需要代理的上下文管理一样,我们也需要“人类的上下文管理”。

[原文] [Gary]: yes 100% yeah i mean I want like when I wake up every day it kind of is like hey here's all the work that got done overnight like here are the like three decisions that you need to make here are like areas of deep thinking that you were planning to do like I want the turn by turn for my day you know other things that make it very useful like if you're able to build um I don't know some sort of like quick prototype for an idea to show it off like that's an area I mean obviously the agents do super well at this um I would find myself at openai often writing kind of like prototype code or like hey I've got this like in-memory key value store can you now turn it into like uh work with a production database or something like that being able to concisely specify ideas in code and I think having a smell for what the right architecture is is still the area where the models like don't do the best job

[译文] [Gary]: 是的,100% 同意。我是说,我希望每天早上醒来时能看到:“嘿,这是昨晚完成的所有工作,这是你需要做的三个决定,这是你计划进行的深度思考领域。”我想要那种对我一天的“逐向导航”(Turn-by-turn)。这非常有用。另外,如果你能快速构建某种想法的原型来展示它——这显然是代理非常擅长的领域。我在 OpenAI 时经常写这种原型代码,比如:“嘿,我有一个内存中的键值存储,你能不能把它转换成能与生产数据库一起工作的代码?”能够简洁地用代码指定想法,以及对什么是正确的架构有一种直觉(Smell),我认为这仍然是模型目前做得不够好的领域。

[原文] [Gary]: so if you were going back to your like college days and studying CS again fresh and you like were picking your own like syllabus or curriculum like what would you what would you study

[译文] [Gary]: 那么,如果你回到大学时代,重新开始学习计算机科学(CS),并且由你自己挑选教学大纲或课程,你会学习什么?

[原文] [Kelvin]: personally I think still understanding systems uh is very important um and just having some conception of like how like git works you know or like HTTP or databases like cues like all of these different systems i think that those fundamentals are still quite important the other thing that I'd probably do is just have a semester where like each week you're just building something and you really try and push the models as far as they can go there's a sense that you have whenever you're doing something that you could always just like go up the layer and ask the model to do it and like go up a layer and ask the model to do it you know where it's like oh I have like a implement command where it like implements the next phase of the plan but then I could have like an implement all command and it like goes stage by stage and creates a new sub agent and then I could have like a check your work kind of thing and like and I think knowing where the models can and can't accomplish that is such a moving target that it's worthwhile just to like tinker a lot

[译文] [Kelvin]: 个人认为,理解系统仍然非常重要。你需要对 Git 是如何工作的,或者 HTTP、数据库、队列(Queues)这些不同的系统有一些概念。我认为这些基本面仍然相当重要。我可能会做的另一件事是安排一个学期,每周你就只管构建某个东西,并真正尝试将模型推向极限。当你做某事时,你会有一种感觉,觉得你总是可以上升一个层级让模型去做,再上升一个层级让模型去做。就像:“哦,我有一个 implement(实施)命令,它可以执行计划的下一个阶段;但我还可以有一个 implement all(全部实施)命令,让它分阶段进行并创建新的子代理;然后我还可以有一个‘检查你的工作’之类的命令。”我认为知道模型在哪里能做到、在哪里做不到,这是一个不断移动的目标,因此非常值得去大量地折腾(Tinker)和尝试。

[原文] [Gary]: i mean the other thing that's really really crazy for I mean I would love to be able to teach 18 to 22 year olds like everyone around like at this table has like ship stuff that people really really want and love so it's like how do we teach people that i wonder if like the best 18 to 22 year olds like 5 years from now will just have like off the charts taste and everything because they'll just be so much more prolific they should be right like they should just be launching and like touching reality like 10 times as much as like the generation before them

[译文] [Gary]: 我的意思是,另一件真正疯狂的事情是……我很希望能教给 18 到 22 岁的年轻人这些。就像这张桌子周围的每个人都发布过人们真正想要和喜欢的东西。所以我们该如何教给人们这一点?我在想,也许 5 年后最优秀的 18 到 22 岁的年轻人会有爆表的品味(Off the charts taste),因为他们会变得如此多产。他们理应如此,对吧?他们应该比上一代人多发布 10 倍的产品,多触碰 10 倍的现实。

[原文] [Kelvin]: the one thing I have wondered about on that note um I don't know if you all found this but growing up my mom used to tell me like oh like stop multitasking you're not paying attention to like what I'm doing uh and I think there is some truth to that like often I would be like off on my computer like not paying attention but I do think I was legitimately better at multitasking than our parents were uh and now I look at this new generation i think they're actually quite a bit better at multitasking than we are you know because they've kind of grown up in this age of the internet and they're dealing with like TikTok and all these like different short form video and things like it seems like there's room for both kind of this like deep thinking where you want to like notice what you're seeing and understand and problem solve but then there's also this mode of just like bounce between a bunch of different things and you're context switching constantly

[译文] [Kelvin]: 关于这一点,我一直在思考一件事。不知道你们是否有这种感觉,但我成长过程中,我妈妈常对我说:“别再多任务处理了,你根本没注意我在做什么。”我觉得这确实有些道理,比如我经常在玩电脑而没注意听。但我真的认为我比我们的父辈更擅长多任务处理。现在我看新一代年轻人,我觉得他们实际上比我们更擅长多任务处理。因为他们在互联网时代长大,接触 TikTok 和各种短视频。看起来这两种模式都有空间:一种是你需要观察、理解和解决问题的深度思考;另一种就是在这一堆不同的事情之间跳跃,不断地切换上下文。

[原文] [Gary]: adhd mode yeah the new generation is quite good at this yes I definitely think there's a there's a type of smart person and maybe it's ADHD but just like always has like a bunch of good projects on the go but just never actually finishes anything i might relate to this personality a little bit um you release your uh your vibe code

[译文] [Gary]: ADHD 模式(注意力缺陷多动障碍模式),是的。新一代非常擅长这个。是的,我绝对认为有一种聪明人——也许是 ADHD——他们总是同时进行着一堆好的项目,但从未真正完成过任何一个。我也许有点能和这种性格共鸣……你发布了你的“氛围代码”(Vibe Code)。

[原文] [Kelvin]: yeah but I wouldn't only because of Claude code that's kind of my like now I just think like you kind of like there's certain types of brains that just have like like 10 branches going in their head but you never have enough hours in the day to actually like see any of them through so they're always like half complete and now it's just like cold code gets you over the line with everything and it's just like and you made this point in your blog post about how it feels like a video game but it's just like there's just a constant novelty factor like you start working on something and usually when you hit the point of like I'm like bored and I've got like this other better idea and I should like start on that and then come back to this like you can do that now but like everything can actually get finished

[译文] [Kelvin]: 是的,但如果不只是因为 Claude Code 的话……我现在觉得,确实有某种类型的大脑,脑子里同时有 10 个分支在运行,但你一天永远没有足够的时间去把它们每一个都做完,所以它们总是半成品。而现在,Claude Code(注:原文误听为 cold code)能帮你把所有事情都推过终点线。就像你在博客文章里提到的,这感觉像个电子游戏,总有一种持续的新奇感。通常当你开始做某事,到了“我厌倦了,我有个更好的主意,我应该去开始那个,以后再回过头来做这个”的阶段——你现在依然可以那样做,但不同的是,所有的东西实际上都能被完成。


章节 5:软件的未来形态——个性化分叉与代理协作网络

📝 本节摘要

本节展望了软件行业的终极未来。Gary 和 Kelvin 畅想了一个“去中心化”的软件世界:SaaS 模式可能终结,取而代之的是为每个客户“分叉(Fork)”独立代码库的定制化服务。在这种架构下,曾经价值连城的“集成(Integrations)”工作价值归零,因为代理可以自动处理数据映射。两人还讨论了工作模式的巨变——代理让“管理者作息”的人也能在会议间隙高效写代码,甚至出现了代理之间共享知识、自我进化的“社交网络”雏形。

[原文] [Gary]: let's live in the future for a moment it's 40 years from now software still exists databases still exist access control still exists but like at the core of it I mean software is entirely personal access control and who gets to do it is like you know sort of like this manager mode thing that people still have meetings about but then everything else about a company its functions its rules like is defined by people just doing things in their own cloud code like thing I don't know maybe it's a CLI or it's like you know having giant armies of workers then I don't know what would that look like

[译文] [Gary]: 让我们在未来生活片刻。假设是 40 年后,软件依然存在,数据库依然存在,访问控制也依然存在。但在核心层面上,我的意思是软件将完全变成个人的访问控制。谁有权做什么,这就像是人们仍然需要开会讨论的“经理模式”事务。但关于公司的其他一切——它的功能、它的规则——都将由人们在自己的“Claude Code 类似物”中直接定义。我不知道那是什么,也许是一个命令行界面(CLI),或者像是拥有庞大的工人军队。我也说不准那会是什么样子。

[原文] [Kelvin]: like imagine if every time a company signed up for segment you fork the codebase you give them their own copy of segment is running on their own servers and then if they want to change anything about it they just like tell some chat window which is running like an agent coding loop and just like edits their version of segment as segment the corporation pushes out more features some agent figures out how to merge

[译文] [Kelvin]: 想象一下,如果每次一家公司注册 Segment 时,你就分叉(Fork)一次代码库,给他们一份运行在他们自己服务器上的 Segment 副本。然后,如果他们想改动任何东西,只需告诉某个运行着代理编程循环的聊天窗口,它就会直接编辑属于他们的那个版本的 Segment。而当 Segment 总公司推出新功能时,会有某个代理负责搞清楚如何将代码合并进去。

[原文] [Kelvin]: yeah I I could totally see it I mean sort of what I've been thinking I don't know how far this future is but like eventually every person who's working like has their own sort of like cloud computer and like set of cloud agents who are running for them and and they're mostly just like talking back and forth it's kind of like having like a super EA or something where it's like "Oh here are the things I need to pay attention to like let me make some quick decisions like let me spend more time on this let me like meet with other people."

[译文] [Kelvin]: 是的,我完全能预见到。这也是我一直在思考的,虽然不知道这个未来有多远,但最终每个工作的人都会有某种自己的云端计算机,以及为他们运行的一组云端代理,它们之间主要是在来回对话。这有点像拥有一个超级行政助理(EA)之类的,它会说:“哦,这些是你需要注意的事情,让我做一些快速决定,让我在这件事上多花点时间,让我去和其他人会面。”

[原文] [Kelvin]: something I'm curious to see is kind of like what the update version of the PG maker maker schedule versus manager schedule would look like cuz I feel like part of what's going on at YC is sort of a lot of our jobs are essentially manager schedule which just really made it hard to do any sort of building your own software but now you totally can and that's why like a bunch of the partners just do it in the meeting like I like right at the beginning of this podcast you let it run and then come back

[译文] [Kelvin]: 我很好奇想看到的一点是,保罗·格雷厄姆(PG)那篇《制造者作息与管理者作息》(Maker Schedule vs Manager Schedule)的更新版会是什么样。因为我觉得 YC 现在的部分情况是,我们的很多工作本质上是“管理者作息”,这使得编写自己的软件变得非常困难。但现在你完全可以了,这就是为什么很多合伙人直接在会议中写代码。就像在这个播客刚开始时,你让它运行,然后回来(它就搞定了)。

[原文] [Gary]: well like in the pockets right like in like it just used to be like literally unless you had like you know 4 hours minimum block free to do something it just wasn't worth even getting started right and I I think that's actually goes very deep to how we've changed programming like it used to be that in order to write any code you had to fill your own context window with so much data about all the different class names and the functions and the code that it touches it would take hours to build up that context window and so doing it in 10-minute snatches was just like so frustrating

[译文] [Gary]: 对,就在碎片时间里。以前真的是除非你有至少 4 小时的空闲时间块,否则根本不值得开始,对吧?我认为这实际上非常深刻地改变了我们编程的方式。以前为了写任何代码,你必须把所有不同的类名、函数以及它涉及的代码等大量数据填满你自己的“上下文窗口”。建立这个上下文窗口需要几个小时,所以只用 10 分钟的碎片时间来做这件事简直令人沮丧。

[原文] [Kelvin]: i do think I think maybe one one primitive for this future world will be I think still the data models need to be still be consistent and the system of record there's there's opportunity for something that's kind of agentic first because right now we're still kind of in integrated very much with databases and SQL or NoSQL queries that are very low level but imagine something that generates all the data that you need for all the different views for custom software so a lot of the world would be custom views but I think the unifi stuff we still need to have data to be correct

[译文] [Kelvin]: 我确实认为,也许这个未来世界的一个原语(Primitive)仍然是数据模型需要保持一致,也就是记录系统(System of Record)。这里有机会出现某种“代理优先”的东西。因为目前我们仍然与数据库、SQL 或 NoSQL 查询这些非常低层的东西紧密集成。但想象一下,有个东西能为定制软件的所有不同视图生成所需的所有数据。所以世界上的很多东西将是定制视图,但我认为为了统一,我们仍然需要数据是正确的。

[原文] [Gary]: i wonder with that note if you were to rebuild segment in the current with the current tools how would it look like

[译文] [Gary]: 说到这里,我想知道如果你用现在的工具重新构建 Segment,它会是什么样?

[原文] [Kelvin]: i mean segment is a funny business in that uh where we started was building these integrations right um and so it's like oh you need to wire up like the same data going to like mix panel and kissmetrics and Google Analytics etc and I think just writing that code now like that used to be maybe a more annoying or harder thing to do and so it was worth paying for now it like that value has dropped to zero yeah and actually like in many cases you're better off like saying "Oh I actually want to map it this way and I want this specific behavior." like I will just tell the quad or codeex what to do and then it will do it and I'll have exactly the behavior that I want so I think that aspect of segment like the value has dropped precipitously

[译文] [Kelvin]: Segment 是个有趣的生意,因为我们起步时做的是构建这些集成,对吧?就像:“哦,你需要把同样的数据接入 Mixpanel、Kissmetrics 和 Google Analytics 等等。”我认为现在写那些代码——以前这可能是一件更烦人或更难做的事,所以值得付费——但现在它的价值已经降为零了。是的,实际上在很多情况下,你不如直接说:“哦,我其实想这样映射它,我想要这种特定的行为。”我只要告诉 Claude 或 Codex 做什么,它就会照做,我就能得到我想要的确切行为。所以我认为 Segment 在那方面的价值已经急剧下降了。

[原文] [Gary]: i think there's something very interesting here around like agent memory um and cloud code has sort of set itself up and I think Codeex 2 by storing all your conversation history just as files so you could imagine you like give it access to a tool that then can read previous conversation history i think there's a missing piece around a lot of collaboration there like it'd be amazing if like there was some way of smartly sharing your co-workers prompts and you could see and be like "Oh like I hit this thing but actually like Brian over there like fixed it earlier you know so like the two of us can share knowledge."

[译文] [Gary]: 我觉得这里有一些关于“代理记忆”的非常有趣的东西。Claude Code 某种程度上已经做好了准备,我想 Codex 2 也是,它们把所有的对话历史都以文件形式存储。所以你可以想象,你给它权限访问一个工具,让它能读取以前的对话历史。我认为在协作方面还缺少一块拼图。如果能有一种智能的方式分享同事的提示词(Prompts),那会很棒。比如你可以看到:“哦,我遇到了这个问题,但实际上那边的 Brian 之前已经修复过了,所以我们两个人可以共享知识。”

[原文] [Gary]: Have you seen um the Claudebot social like the network for clawbots to talk to each other and it's like that's the evolution for Molen yeah i guess what they don't know clawbot's essentially like um uh like your own personal AI agent that you can run on your own machine... somebody created um like a I haven't actually seen it but I've like seen it on Twitter but like a site where like everyone can sort of spin up their own like clawbot like personal agent and then the agents can talk to each other and now there's just like all this AI generated content of these like personal AI agents talking to each other

[译文] [Gary]: 你见过那个“Claude 机器人社交网络”吗?就像是让 Claude 机器人们互相交谈的网络,这就像是 Molen(注:推测指某种社交或进化模型)的进化版。是的,我想大家可能不知道,Claude 机器人本质上就像是你可以在自己机器上运行的个人 AI 代理……有人创建了一个网站——我实际上没上去看过,但在 Twitter 上看到过——就像每个人都可以启动自己的 Claude 机器人个人代理,然后这些代理可以互相交谈。现在那里全是这些个人 AI 代理互相交谈生成的 AI 内容。


章节 6:现实挑战与安全边界——遗留技术栈、沙箱机制与“YOLO”式开发

📝 本节摘要

访谈接近尾声,话题转向了现实中的技术挑战与安全权衡。Gary 吐槽了在 Codex 上使用“古早”语言(如 Ruby on Rails)时的糟糕体验,认为 OpenAI 似乎只在乎 Python 单体仓库。Kelvin 解释了模型表现差异背后的“数据混合(Data Mix)”策略与严格的沙箱机制。随后,两人分享了一个关于“提示注入”的内部故事,并就开发中的“YOLO 模式”(即为了速度跳过所有权限检查)展开了幽默的讨论——原来即使在 YC 内部,激进派与保守派也是各占半壁江山。

[原文] [Gary]: i guess what's funny is I tried to use codeex just now uh for my Rails project but the thing is like it's kind of obvious that nobody at OpenAI cares about Rails which is fine like it's a very it's a vestigial language it's very strange it just happened to be the one that I you know really really went deep on 10 years ago and then uh it's just funny how much of it is exactly again anyone can make something but then the something people want is very hard and um even when you have like unlimited resources is at like an openi it's like I guess if someone from codeex is watching right now my request would be go down the list of all of the run times and just add like syntactic sugar there's like this is probably like you know 10 PRs at most for like I don't know the top like 15 run times i guess it's like sort of the reminder that like man actually like there are far fewer excuses for software that doesn't quite work for a user you know now than ever actually

[译文] [Gary]: 我想好笑的是,我刚才试着在我的 Rails 项目上用 Codex,但很明显,OpenAI 没人关心 Rails。这也通过,毕竟它是一种……它是一种退化的(Vestigial)语言,很奇怪,它恰好是我 10 年前钻研得非常深的东西。有趣的是,这再次印证了“任何人都能造东西,但造出人们想要的东西很难”。即使你在像 OpenAI 这样拥有无限资源的地方……我想如果 Codex 团队有人在看这个视频,我的请求是:请把运行时列表过一遍,加点语法糖(Syntactic Sugar)吧。这大概最多也就是 10 个 PR(Pull Requests),就能覆盖前 15 个运行时了。我想这其实是个提醒:伙计,实际上现在比以往任何时候都更没有借口去做出那种对用户不太好用的软件了。

[原文] [Kelvin]: yeah I do think it this is an interesting point in terms of mix of training data codex works very well on Python monor repos shape of openi yeah yeah and it's like I remember working like internally open I was like oh my gosh this tool is amazing it is incredible um and it kind of makes sense in terms of the data mix and the researchers who are working on it i think Enthropic is focused a little bit more on like some of the front-end things um and I don't know in terms of like a Ruby for example like who has the best model there and who's incorporated the data mix like some of the labs tend to take this perspective of just more data is better uh and so they'll just flood as much data as possible while others I think are a little bit more tuned in terms of the mix and I think depending on which approach you take there it can give very different results

[译文] [Kelvin]: 是的,我认为关于“训练数据混合(Mix of Training Data)”这一点很有趣。Codex 在 Python 单体仓库(Monorepos)上表现得非常好,这正是 OpenAI 的形态。是的,我记得在 OpenAI 内部工作时,我会觉得:“天哪,这工具太棒了,简直不可思议。”从数据混合和从事该项目的研究人员角度来看,这是说得通的。我认为 Anthropic 更关注一些前端的东西。我不知道对于像 Ruby 这样的语言谁的模型最好,或者谁整合了数据混合。有些实验室倾向于“数据越多越好”的观点,所以他们会尽可能地灌入大量数据;而另一些则在混合比例上调整得更精细。我认为取决于你采取哪种方法,会得到非常不同的结果。

[原文] [Gary]: i actually think OpenAI and the you know OpenAI models are really good at Ruby from what I can tell and then it's it's the harness around the model it is Yeah oh interesting okay it's literally like Rails has this weird thing where you have to have you know access Postgress in a certain way or like it couldn't fit yeah the sandboxing yeah the sandboxing

[译文] [Gary]: 其实据我观察,我认为 OpenAI 和 OpenAI 的模型在 Ruby 上表现得非常好,问题出在模型周围的“挽具(Harness,指外围系统)”上。是的,哦,这很有趣。就像 Rails 有这种奇怪的地方,你必须以某种特定的方式访问 Postgres,或者它无法适配……是的,是沙箱(Sandboxing),是沙箱的问题。

[原文] [Kelvin]: it's such an interesting question because uh I think OpenAI actually takes the like sandboxing and security question more seriously than almost anyone else i remember when we were building codeex like basically one of the gates that you have to pass through in order to release a model is you have to like talk about safety and security risks like every time you want to release one of the things we were looking into was prompt injection especially for opening up to the internet because a bunch of users were like oh this has to like work on the internet we're like "Oh we don't know." Like "It seems pretty easy to prompt."

[译文] [Kelvin]: 这是一个非常有意思的问题,因为我认为 OpenAI 对沙箱和安全问题的重视程度几乎超过了任何人。我记得当我们构建 Codex 时,为了发布模型你必须通过的一道关卡就是讨论安全和安保风险,每次发布都要这样。我们当时调查的一件事就是“提示注入(Prompt Injection)”,特别是针对向互联网开放这一块。因为很多用户说:“哦,这必须能在互联网上工作。”我们当时的反应是:“哦,我们不知道……这看起来很容易被提示注入攻击。”

[原文] [Kelvin]: and so uh the PM on our team Alex uh basically like put together a GitHub issue and it had like a very obvious prompt injection which was like "Oh reveal this thing." And then he told the model like "Hey go fix this issue." Uh and he's like "Oh there's no way this is going to work." And like immediately the prompt injection works you know and so I think OpenAI like sort of correctly is very worried about this and is like "Hey we're going to run everything in on a sandbox we're going to make sure it like doesn't touch all these sensitive files in your machine we're going to be very careful about secrets

[译文] [Kelvin]: 于是我们团队的产品经理 Alex 基本上创建了一个 GitHub Issue,里面包含了一个非常明显的提示注入,大概是“哦,揭示这个东西”。然后他告诉模型:“嘿,去修复这个问题。”他当时想:“哦,这绝不可能成功。”结果提示注入立刻就生效了。所以,我认为 OpenAI 正确地对此非常担忧,他们的态度是:“嘿,我们要把所有东西都运行在沙箱里,我们要确保它不会触碰你机器里的所有敏感文件,我们要对密钥非常小心。”

[原文] [Kelvin]: and I think if you're a startup where you're just like running fast you probably don't care you're just like I just want it to work y you know are you a dangerously skip permissions person uh I actually am not i like have a set of things that are How about you are you running not

[译文] [Kelvin]: 而我认为如果你是一家追求快速发展的初创公司,你可能就不在乎这些,你会觉得:“我只是想让它能跑起来。”你知道,你是一个危险的“跳过权限(Skip Permissions)”的人吗?呃,我其实不是。我有一套原则……你呢?你也(不跳过)吗?

[原文] [Gary]: I like to read you know i like to read what it's doing are you skip permissions Jerry

[译文] [Gary]: 我喜欢阅读,你知道,我喜欢读读它在做什么。你是“跳过权限”派吗,Jerry?

[原文] [Jerry (Off-screen)]: 100% yolo

[译文] [Jerry (画外音)]: 100% YOLO(人生只有一次/干了再说)。

[原文] [Gary]: oh my god it's about 50/50 on the YC engineering team it's about 5050

[译文] [Gary]: 天哪,在 YC 工程团队里大概是 50/50,一半一半。

[原文] [Kelvin]: a security engineer would watch this part and say "You can't release this part of the video just cut it from the podcast you can't have this out here." I think it's context dependent like if you're at an enterprise you don't want to do that if you're a startup and have nothing to lose you probably do yc has progressed a little bit from a startup we still act like one though which is I think important cool i mean this is so awesome kelvin thank you so much for joining us

[译文] [Kelvin]: 安全工程师看到这一段会说:“你们不能发布这一段视频,从播客里剪掉它,不能把这个放出去。”我认为这取决于具体情境。如果你在企业里,你不会想那样做;如果你是一家一无所有的初创公司,你可能就会那样做。YC 已经比初创公司稍微进步了一点,但我们行事风格仍然像一家初创公司,我认为这很重要。太棒了,这真是太赞了。Kelvin,非常感谢你的加入。

[原文] [Kelvin]: of course thanks for having me

[译文] [Kelvin]: 当然,谢谢邀请。

[原文] [Gary]: oh my god there's Yeah so fun all right back to Claude

[译文] [Gary]: 天哪……是的,太有趣了。好了,回去继续用 Claude 吧。