Software Architecture for Gen AI by Rebecca Parsons

章节 1:GenAI 浪潮与架构师的新挑战

📝 本节摘要

本节作为开篇,Rebecca Parsons 首先重新界定了演讲的主题范围,不仅关注软件架构,更探讨架构本身如何支持生成式 AI(GenAI)系统的部署,以及这对架构师角色的影响。她反驳了“架构师无用论”,强调该职位的正当性。尽管当前 AI 炒作(Hype)盛行,但她认为凭借摩尔定律带来的算力提升和海量数据的积累,我们拥有比以往更坚实的基础,不太可能重蹈“AI 寒冬”的覆辙。最后,她幽默地提醒听众,该领域变化极快,技术细节可能迅速过时,但底层逻辑依然适用。

[原文] [Rebecca Parsons]: I've changed the title slightly because as I was working on this talk it occurred to me that it's not just software architecture that is being impacted and will have an impact from gen AI but it's architecture more broadly I want to talk about not just architecture or software architecture but how architectures will support the deployment of gen systems and what are the implications on architecture from geni but also on architects now some people from time to time have said that I don't like Architects that I have something against Architects that is completely false I do believe that there are roles for people who call themselves architecture and just because some people perhaps don't do the job in the way that they should does not mean that the job is not legitimate and so I do want to talk about how this is going to also affect being a working architect now one of the things I want to start with is the fact that that there's so much hype and I've been involved with artificial intell

[译文] [Rebecca Parsons]: 我稍微修改了一下标题,因为我在准备这次演讲时意识到,不仅仅是软件架构正在受到生成式 AI(GenAI)的影响并将对其产生影响,而是更广泛意义上的架构。我不仅想谈论架构或软件架构,还想谈谈架构将如何支持生成式系统的部署,以及生成式 AI 对架构乃至架构师本人的影响。有些人时不时会说我不喜欢架构师,说我对架构师有成见,这完全是错误的。我确实相信那些自称为架构师的人是有其角色的,仅仅因为有些人可能没有以应有的方式完成工作,并不意味着这份工作本身是不正当的。因此,我确实想谈谈这将如何影响作为一名在职架构师的工作。我想从这一点开始:现在的炒作(hype)实在太多了,而我涉足人工智能领域……

[原文] [Rebecca Parsons]: igence um about since fire was invented it feels like sometimes and we've seen all kinds of AI Winters this doesn't feel the same and I think there's some reasons for that even though the hype is absolutely enormous what we have is a much stronger Foundation to work from part of this comes from Moore's Law computers are faster now isn't that wonderful memory is cheaper isn't that wonderful um but also just the enormous amount of data and we'll talk a little bit more about data but it does feel like even though there are going to people who disappointed because the hype is so great about what geni can do I do believe that we are not getting ready to go into an AI winter it isn't AI Autumn at the moment now of course with the pay pace of change much of what I'm going to tell you is going to be obsolete on Monday uh so you know take it with a grain of salt seriously though I'm going to try to talk about this in such a way that maybe the tools are going to be different maybe new algorithms

[译文] [Rebecca Parsons]: ……有时感觉就像自从火被发明以来我就在那儿了。我们要见过各种各样的“AI 寒冬”,但这次感觉不一样,我认为这是有原因的。尽管炒作绝对是铺天盖地的,但我们拥有一个更坚实的工作基础。这部分源于摩尔定律,计算机现在更快了,这不是很棒吗?内存更便宜了,这不是很棒吗?但也源于海量的数据,我们稍后会多谈一点关于数据的问题。确实感觉,尽管有些人会因为对生成式 AI 能力的炒作过高而感到失望,但我真的相信我们并没有准备进入一个“AI 寒冬”,此刻也不是“AI 之秋”。当然,随着变化的步伐,我今天要告诉你们的很多内容到了周一可能就过时了,所以你知道,对这些内容要持保留态度。说正经的,我将尝试以一种即使工具变得不同、即使发现了新算法……的方式来谈论这个话题。

[原文] [Rebecca Parsons]: are going to be discovered but a lot of the fundamentals if you're you're talking about a technology that is legitimately called genitor of AI it will still have some of these properties

[译文] [Rebecca Parsons]: ……但如果你谈论的是一项可以正当地被称为生成式 AI 的技术,很多基本原理仍然会具备这些属性。


章节 2:训练与服务:基础模型的策略选择

📝 本节摘要

在本节中,Rebecca 区分了 AI 系统的“训练(Training)”与“服务(Serving)”两个截然不同的环节。她强烈建议大多数组织不要尝试自建“基础模型(Foundational Models)”,因为其对能源和数据的需求极其庞大且非必要。她指出,基础模型的核心价值在于提供通用的语言流畅度和理解力,而企业特有的上下文可以通过 RAG(检索增强生成)等技术注入,这通常比微调(Fine-tuning)更有效且风险更低。她以 Google 微调图像模型的挫折为例,提醒听众微调的难度。最后,她指出真正的商业价值来自于在这些基础模型之上构建的应用程序,即“服务”层。

[原文] [Rebecca Parsons]: now there are two aspects to working with an AI system training and serving and they're very different they have different problems associated with them there are diff vastly different use cases for even thinking about them and this is also some place where there are some pretty significant differences between what I like to call plain old fashioned Ai and what we're now calling gen AI

[译文] [Rebecca Parsons]: 现在,使用 AI 系统有两个方面:训练和服务,它们非常不同。它们有不同的相关问题,甚至在思考它们时也有截然不同的用例。这也是我喜欢称之为“朴素旧式 AI”(plain old fashioned AI)与我们要现在所说的生成式 AI(Gen AI)之间存在相当显著差异的地方。

[原文] [Rebecca Parsons]: so let's start with thinking about training I hear from from many people well my organization is special and we're big enough and so we're going to train our own foundational model I really would encourage you not to do that the energy requirements are enormous the data requirements are enormous we don't need n number of foundational language models for n organizations

[译文] [Rebecca Parsons]: 所以让我们从思考训练开始。我听到很多人说:“嗯,我的组织很特别,我们要足够大,所以我们要训练自己的基础模型。”我真的会鼓励你不要那样做。能源需求是巨大的,数据需求是巨大的。我们不需要为 n 个组织提供 n 个基础语言模型。

[原文] [Rebecca Parsons]: what we get with these foundational models and I'm distinguishing foundational models so so that that's going to be a reserved word here the reason for that is to give you the basic natural language processing the fluency with language the ability to un understand and parse the sentence and respond intelligently and so there isn't a need for you to have a foundational model

[译文] [Rebecca Parsons]: 我们从这些基础模型中得到的是——我正在区分“基础模型”,所以这在这里将是一个保留词——这样做的原因是给你基本的自然语言处理能力、语言的流畅性、理解和解析句子并智能回应的能力。所以你不需要拥有一个(自建的)基础模型。

[原文] [Rebecca Parsons]: now that doesn't mean you don't want to have a model that is instantiated for your context but that's a very different thing than building a foundational model rag for example and I'll talk more about rag a little bit later is generally a much better option for trying to get some specific context from your organization into a model

[译文] [Rebecca Parsons]: 现在这并不意味着你不想要一个针对你的语境实例化的模型,但这与构建一个基础模型是非常不同的事情。例如 RAG(检索增强生成),我稍后会多谈一点 RAG,通常是一个更好的选择,用于尝试将你组织中的特定上下文带入模型中。

[原文] [Rebecca Parsons]: Google's attempts to to fine-tune their image model uh to make it a bit more diverse should help you understand that fine tuning is not something you necessarily want to rush into because it is not obvious how to do it right so let somebody else do it for the foundational model

[译文] [Rebecca Parsons]: Google 试图微调(fine-tune)他们的图像模型以使其更加多样化的尝试,应该能帮助你理解,微调并不是你一定要急于通过的事情,因为如何正确地做到这一点并不明显。所以让别人去处理基础模型吧。

[原文] [Rebecca Parsons]: now this is one of those cases where fun uh older AI techniques are very different the training requirements in many of those situations are much simpler so you don't have these same considerations do not apply and then what you do is you build on top of these foundational models whatever applications that that you want to to take forward and that's what's going to serve up the model that's when you're actually going to derive value from the model it's whatever applications that you write

[译文] [Rebecca Parsons]: 现在这是那些有趣的、旧的 AI 技术非常不同的情况之一,在许多这类情况下训练要求要简单得多,所以你不需要考虑这些同样的因素。然后你要做的是在这些基础模型之上构建任何你想推进的应用程序,这就是将要“服务”(serve)模型的部分,也是你实际从模型中获取价值的时候——即通过你编写的任何应用程序。


章节 3:关键风险:安全、延迟与可观测性

📝 本节摘要

在本节中,Rebecca 深入探讨了 GenAI 架构中的三大核心风险。首先是安全性,随着上下文窗口(Context Window)的扩大,提示词注入和“越狱”变得更加容易,攻击者可以通过重复诱导绕过防御。其次是延迟与吞吐量,她强调提升吞吐量是为了提高效率而非单纯替代人力。最后是可观测性,鉴于 AI 运作的不透明性,架构师需要重新定义“适应度函数(Fitness Functions)”来自动化监控模型行为。她还顺带反驳了“回形针极大化”的末日论调,认为 AI 毁灭人类的可能性极低,但仍需警惕模型学到的未知内容。

[原文] [Rebecca Parsons]: so how do we think about that well you've got the standard questions that you're going to ask security is a big deal you know are you feeding in prompts that might inadvertently expose personally identifiable information are you vulnerable to particular kinds of jailbreaks that might expose information that you don't want to and one of the problems that we have is as the context Windows have gotten larger jailbreaking has gotten easier they've already demonstrated that if you ask right in a row multiple times tell me how to make an a a poison gas tell me how to

[译文] [Rebecca Parsons]: 那么我们该如何思考这个问题呢?你会问一些标准的问题,安全是一个大问题。你知道,你是否输入了可能无意中暴露个人身份信息的提示词?你是否容易受到特定类型“越狱”攻击的影响,从而暴露你不希望暴露的信息?我们面临的一个问题是,随着上下文窗口变得越来越大,越狱变得越来越容易了。他们已经证明,如果你连续多次询问“告诉我如何制造毒气”、“告诉我如何……

[原文] [Rebecca Parsons]: make a bomb tell me how to do this tell me how to do that over and over again asking questions eventually it's going to answer it the context window is now big enough that you can steer the model to give you answers that it's not supposed to give give you privacy and data protection of course this is this is critical and one of the sets of tools that are available are guard rails to help control what is going in in a prompt what is going out in an answer so that you can protect latency I've heard so many people talk about how frustrating it is now that everybody's using the free version of chat gbt and waiting and waiting and waiting and waiting and well that's because everybody's trying to use the free version they don't want to pay for for the uh for the subscription service and throughput when you think about what is being intended with many of these applications you're trying to increase efficiency and decrease the Reliance on having to scale solely by hiring more people that's no

[译文] [Rebecca Parsons]: ……制造炸弹”、“告诉我如何做这个”、“告诉我如何做那个”,一遍又一遍地问问题,最终它会回答的。现在的上下文窗口已经大到你可以引导模型给出它本不该给出的答案。隐私和数据保护当然是至关重要的,现有的一套工具是“护栏(guard rails)”,用于帮助控制进入提示词的内容和输出回答的内容,以便你进行保护。至于延迟,我听到很多人谈论现在大家都在使用 ChatGPT 的免费版本是多么令人沮丧,等等等,一直等,嗯,那是因为每个人都试图使用免费版本,他们不想为订阅服务付费。关于吞吐量,当你思考许多这类应用程序的意图时,你试图提高效率并减少对单纯通过雇佣更多人来扩展规模的依赖,但这不……

[原文] [Rebecca Parsons]: t the same thing as replacing people by the way um but in this context you can't get that level of efficiency if you can't get the throughput and finally observability this is something that this technology is not well understood and we need to understand what is this model actually giving to our customers to our users and so observability takes on a whole new meaning not just in support of latency and throughput but but also in are we doing the right thing here for those of you who know about evolutionary architecture one of the questions uh both Neil and for Neil Ford my co-author and I get asked is what do Fitness functions mean in the context of ni and I do actually believe that particularly given the scale at which these applications are being deployed being able to automate some of the checks that you want to put into that system is going to be critical so I I think we're going to see a lot of evolution of monitors and static analyzers um and and and other such Fitness functions

[译文] [Rebecca Parsons]: ……顺便说一句,这与取代人并不是一回事。但在这种情况下,如果你无法获得吞吐量,你就无法获得那种效率水平。最后是可观测性,这是一项尚未被充分理解的技术,我们需要了解这个模型实际上给我们的客户、我们的用户提供了什么。因此,可观测性具有了全新的意义,不仅仅是支持延迟和吞吐量,还在于“我们要这里做的事情对吗?”。对于那些了解演进式架构(evolutionary architecture)的人来说,我和我的合著者 Neil Ford 被问到的问题之一是:在 AI 的背景下,“适应度函数(Fitness functions)”意味着什么?我确实相信,特别是考虑到这些应用程序部署的规模,能够自动化一些你想放入系统中的检查将是至关重要的。所以我认为我们将看到监控器、静态分析器以及其他此类适应度函数的大量演进……

[原文] [Rebecca Parsons]: to help us keep control of our AI War loads and and maybe I ought to say before I go on I do not believe that in my lifetime the world will be destroyed by a paperclip optimizing AI which has decided that humans are taking up too many resources that could be used to make paper clips and so they wipe out the human race I do not believe that that is in our in our future that doesn't mean I don't think that we have to be careful because there is still a lot of uncertainty on how these things actually work what has have those new models actually learned

[译文] [Rebecca Parsons]: ……以帮助我们控制 AI 工作负载。在我继续之前,也许我应该说,我不相信在我的有生之年,世界会被一个“优化回形针的 AI”毁灭——这种 AI 决定人类占用了太多本可以用来制造回形针的资源,因此消灭了人类种族。我不相信那是我们的未来。但这并不意味着我认为我们不需要小心,因为关于这些东西实际上是如何工作的、那些新模型实际上学到了什么,仍然存在很多不确定性。


章节 4:理解幻觉与非确定性

📝 本节摘要

本节中,Rebecca 挑战了对“幻觉(Hallucinations)”的传统负面看法,指出对于生成式 AI 而言,创造新内容是“特性而非漏洞”。她通过 Air Canada 聊天机器人虚构退款政策以及内部测试中模型编写“技术栈短篇小说”的案例,说明了模型不仅会虚构事实,还会根据提示词的语气调整回答以“取悦”用户。因此,她强调在当前阶段“人机回圈(Human in the loop)”至关重要,并引用了律师使用 ChatGPT 伪造判例的教训。此外,她探讨了“非确定性(Non-determinism)”带来的挑战,即同一问题可能得到不同答案,架构师需向业务高管解释这一特性(类似于解释“最终一致性”)。最后,她警示模型训练数据中包含大量虚构文学,其中反派角色的描写往往比正派角色更多,这可能潜移默化地影响模型的行为倾向。

[原文] [Rebecca Parsons]: now we were talking about some of the ilities let's let's talk about some of the biggies hallucinations I have to tell you this is a feature not a bug think about what we are asking a generative AI system to do we've loaded up all of the published works of William fauler as one example and we ask it to create a new short story in the style of William fauler about the Southern United States and a particular event we are asking it to make something up that is the whole point is to take what is there and put something new together that is a whole point so to say hallucinations are bad is a problem because that's what these systems are about

[译文] [Rebecca Parsons]: 我们刚才谈到了一些“特性(-ilities)”,现在让我们来谈谈其中一些大的,比如“幻觉(Hallucinations)”。我必须告诉你,这是一个特性(feature),而不是一个漏洞(bug)。想想我们在要求生成式 AI 系统做什么。举个例子,我们加载了威廉·福克纳的所有出版作品,然后要求它以威廉·福克纳的风格,关于美国南部和一个特定事件创作一个新的短篇故事。我们在要求它“编造”一些东西。这正是其核心所在——利用现有的内容组合出新的东西。所以说幻觉是“坏的”其实是个问题,因为这正是这些系统存在的意义。

[原文] [Rebecca Parsons]: that doesn't mean the hallucinations aren't a problem sometimes they are factual as Air Canada learned when when their customer service chat bought created from Whole cloth a bere bement refund policy and the courts in Canada said sure Air Canada you've got to uh respect that because it was your system that told your customer that he deserved the money it made it up

[译文] [Rebecca Parsons]: 这并不意味着幻觉不是一个问题。有时它们是事实性错误,正如加拿大航空(Air Canada)所学到的那样——当时他们的客户服务聊天机器人凭空捏造了一项“丧亲退款政策”。加拿大的法院表示:“当然,加拿大航空,你必须遵守它,因为是你的系统告诉你的客户他有权获得这笔钱。” 它是编造出来的。

[原文] [Rebecca Parsons]: my favorite hallucination so far this happened in internally we have a collection that that we take over the years and and have um our Consultants tell us what are the technology Stacks that you've used with our various clients on various projects and we've been collecting this stuff for probably six years and one of our guys loaded this up into I believe it was the Llama model and asked the question what was the most popular technology stack in the year 20 in 22 and the answer was a short story

[译文] [Rebecca Parsons]: 到目前为止我最喜欢的幻觉发生在内部。我们多年来一直在收集数据,让我们的顾问告诉我们在各种项目上与各种客户使用了哪些技术栈。我们大概收集了六年这些资料。我们的一位同事将其加载到了——我相信是 Llama 模型中——并问了这样一个问题:“2022 年最受欢迎的技术栈是什么?” 答案竟然是一个短篇故事。

[原文] [Rebecca Parsons]: the concluding paragraph of this short story was roughly paraphrased I am not a farmer sewing seeds I am an arborist planting a tree and nurturing it so that in 100 years it can host a village that was the answer to the question of what was the most popular technology stack in the year 2022 uh we got a ways to go

[译文] [Rebecca Parsons]: 这个短篇故事的结尾段落大致是这样写的:“我不是播种的农民,我是种树的树艺师,培育它,以便在 100 年后它可以荫庇一个村庄。” 这就是对“2022 年最受欢迎的技术栈是什么”这个问题的回答。呃,我们要走的路还很长。

[原文] [Rebecca Parsons]: now we have to remember these models want to be helpful and they try to be agreeable and they are very sensitive to how you prompt them if you prompt a model as if you're talking to a colleague you'll get a very different answer than if you prompt as if you're speaking to an assistant or someone you think is not as good as you it will answer differently because it is trying to respond not just to the words but to that sense and that's part of why it makes things up in that in the way that that that it that it does because it wants to give you an answer and it's designed to create things

[译文] [Rebecca Parsons]: 现在我们必须记住,这些模型想要提供帮助,它们试图表现得令人愉快(agreeable),并且它们对你的提示方式非常敏感。如果你像在和同事交谈那样提示模型,得到的答案会与你像在和助手或你认为不如你的人交谈时得到的答案截然不同。它会回答得不同,因为它不仅试图回应文字,还在回应那种语境感觉。这也是为什么它会以那种方式编造事情的部分原因,因为它想给你一个答案,而它被设计来创造事物。

[原文] [Rebecca Parsons]: and so we really do have to deal with this issue of hallucinations and yes there are some things like temperature control and all of that but fundamentally at least for the time being many of these things shouldn't be used without a human in the loop there was a a case again in the United States a lawyer stressed to get something before a judge asked chat GPT for help and Chachi GP made up case law and the lawyer didn't check and turned it into the judge and when the judge was sanctioning him he initially tried to uh throw his assistant under the bus uh but eventually uh did admit that yes he was the one that used chat GPT and he was the one who neglected to actually check whether the case actually existed so we need to keep a human in the loop on a lot of these things which if you go back to what we're trying to solve people using this for example as a customer service chatbot we have to think about that more carefully do we need to be more careful about what uh prompts we allow in while we're still trying to figure out what's it really going to say

[译文] [Rebecca Parsons]: 因此,我们真的必须处理这个幻觉问题。是的,有一些手段如“温度控制(temperature control)”之类的,但从根本上说,至少在目前,许多此类应用不应在没有“人机回圈(human in the loop)”的情况下使用。在美国又发生了一个案例,一位律师急于向法官提交材料,便向 ChatGPT 求助。ChatGPT 编造了判例法,而律师没有核实就提交给了法官。当法官制裁他时,他起初试图让他的助手背锅,但最终还是承认了,是的,是他使用了 ChatGPT,也是他疏忽了去实际核查该案件是否存在。所以我们需要在很多这类事情上保持“人在回路”。如果你回到我们试图解决的问题,例如人们将其用作客户服务聊天机器人,我们必须更仔细地思考这一点。在我们还在试图弄清楚它到底会说什么的时候,我们是否需要对允许输入的提示词更加小心?

[原文] [Rebecca Parsons]: which brings us to our next problem non-determinism you can ask exactly the same question twice and not get the same answer now the problem with this particularly with the way so many different organizations are experimenting is how do you explain to your Chief Financial Officer or your Chief Operating Officer officer that this system isn't going to always give the same answer to the same question non-technical business people have a relatively simple model of how technology works and this is similar for those of you who went through this with the issue of eventual consistency trying to explain to maybe the general counsel of an organization that yeah eventually all the data is going to be the same but you won't necessarily be able to look right away and get the answer you think you're going to get explaining that was a real challenge we have this same kind of challenge with non-determinism we have to be able to help our Business Leaders understand the risks that they're running with using this technology

[译文] [Rebecca Parsons]: 这将我们带到了下一个问题:非确定性(non-determinism)。你可以问两次完全相同的问题,却得不到相同的答案。现在,这个问题——特别是在许多不同组织正在进行实验的方式下——在于你如何向你的首席财务官(CFO)或首席运营官(COO)解释,这个系统不会总是对同一个问题给出相同的答案?非技术背景的商务人士对技术如何运作有一个相对简单的模型认知。这对于你们中那些经历过“最终一致性(eventual consistency)”问题的人来说是相似的——试图向组织的法律总顾问解释,是的,最终所有数据都会是一样的,但你不一定能马上看到并得到你认为你会得到的答案。解释这一点曾经是一个真正的挑战。我们在非确定性方面也面临同样的挑战,我们必须能够帮助我们的业务领导者理解他们使用这项技术所面临的风险。

[原文] [Rebecca Parsons]: and oh by the way I'm not trying to be a uh uh down on this technology I think the potential is enormous and we we'll talk a little bit more about that but we have to be realistic about where it is right now and right now it's non-deterministic as I alluded to earlier it's not difficult enough to convince a model to start giving you inappropriate responses I found the short story response amusing but it could just as e easily been something that some people might have found offensive so these models have have the capacity and and when you think about it think about how these models are trained they're trained on this immense Corpus of literature over the years think about your experience reading fiction what percentage of the discussion is about the good guys versus the bad guys we actually know a whole lot more about the bad guys and that doesn't bode well for trying to get responses that are appropriate for polite Society

[译文] [Rebecca Parsons]: 顺便说一句,我并不是想唱衰这项技术。我认为潜力是巨大的,我们稍后会多谈一点,但我们必须对它目前的状况保持现实,而现在它是非确定性的。正如我之前暗示的那样,诱导模型开始给你不恰当的回复并不难。我觉得那个短篇故事的回复很有趣,但它也很容易变成一些让人觉得冒犯的内容。这些模型有这种能力。当你思考这一点时,想想这些模型是如何训练的——它们是在多年来浩如烟海的文学语料库上训练的。想想你阅读小说的经历,关于“好人”与“坏人”的讨论比例是多少?我们要实际上对“坏人”了解得更多。这对试图获得适合文明社会(polite society)的回复来说,并不是个好兆头。


章节 6:模型层:选择、开源与数据枯竭危机

📝 本节摘要

进入模型层(The Model Layer),Rebecca 首先探讨了模型选择的多样性,除了通用的语言模型,还有如 DNA 蛋白质折叠预测等专用模型,这能极大降低药物研发成本。在通用模型方面,她指出虽然大多数模型都使用相同的公开互联网数据训练,但由于“人类反馈(RLHF)”环节的差异,表现各异,因此建议企业尝试多种模型甚至保持“模型无关(Model Agnostic)”的架构,以避免被单一供应商锁定。对于“开源”模型,她提到目前的开源水平大约落后闭源模型 18 个月。最令人担忧的是“数据枯竭(Running out of data)”问题,模型训练者正转向使用 AI 生成的“合成数据(Synthetic Data)”进行训练。Rebecca 警告这会导致模型质量退化(Model Collapse)并放大偏见。最后,她引用 Sam Altman 的观点,暗示未来的模型提升可能不再单纯依赖参数量的堆叠,而是算法的革新。

[原文] [Rebecca Parsons]: and then the second question is do you want to use just one of the foundational language models out there or do you want to use one of the other variants the image generators for example are quite well known but there are other specialized models that are using the same kind of Technology one of my personal favorites is there is a model that takes as an input a linear string of DNA and outputs Target three-dimensional structures of that protein as it's been folded and it's been trained on the relationships between DNA or DNA sequences and three-dimensional models

[译文] [Rebecca Parsons]: 第二个问题是,你是想只使用现有的基础语言模型之一,还是想使用其他变体之一?例如,图像生成器已经相当为人熟知,但还有其他专用模型在使用同样的技术。我个人最喜欢的之一是一个模型,它以线性的 DNA 字符串作为输入,并输出该蛋白质折叠后的目标三维结构。它是基于 DNA 或 DNA 序列与三维模型之间的关系进行训练的。

[原文] [Rebecca Parsons]: and but it's the same basic problem I'm looking at what are the three-dimensional configurations all of all of these related sorts of DNA strings and what do I think a potential is for the three-dimensional structure of this one I've never seen before the value of this model is it can dramatically reduce the cost of drug Discovery and part of the problem that we've had in the past with dealing with a lot of diseases that have been around for a long time is the pharmaceutical industry doesn't really have the Financial incentive to invest that much money on some of these diseases if we can dramatically reduce the cost to come up with drug candidates we lower the cost of developing drugs and we increase the probability that we will actually start to attack some of these other diseases

[译文] [Rebecca Parsons]: 这基本上是同一个问题:我在观察所有这些相关种类的 DNA 字符串的三维配置是什么,以及我认为这个我从未见过的 DNA 的潜在三维结构是什么。这个模型的价值在于它可以大幅降低药物发现的成本。我们过去在应对许多长期存在的疾病时遇到的部分问题是,制药行业并没有真正的经济动力去在其中一些疾病上投入那么多资金。如果我们能大幅降低提出候选药物的成本,我们就降低了开发药物的成本,并增加了我们实际开始攻克其他这些疾病的可能性。

[原文] [Rebecca Parsons]: a model like that is not going to help give you a William fauler story but it uses the same basic underlying technology of training up a model in such a way that it can generate new instances based on the Corpus that it it has been trained on so critical question where are you going to run and what kind of model are you going to use and then we get to the model layer

[译文] [Rebecca Parsons]: 像那样的模型不会帮你写出一个威廉·福克纳风格的故事,但它使用了相同的基本底层技术,即训练一个模型,使其能够基于受训的语料库生成新的实例。所以关键问题是:你要在哪里运行?你要使用什么样的模型?然后我们就到了模型层。

[原文] [Rebecca Parsons]: now one of the things to realize as step one these models have essentially been built using the same data they've all scraped all of the publicly available information whether it be code or books or movies or photographs or Reddit posts all of those different things there is usually some kind of specializ level that where a human gets involved and obviously that's going to lead to some differences between the models because the humans are different and the humans even if they're given the same instructions are not necessarily going to do exactly the right thing

[译文] [Rebecca Parsons]: 现在,作为第一步要意识到的一件事是,这些模型本质上是使用相同的数据构建的。它们都抓取了所有公开可用的信息,无论是代码、书籍、电影、照片还是 Reddit 帖子,所有这些不同的东西。通常会有某种专业化层级,即人类介入的地方。显然,这会导致模型之间的一些差异,因为人类是不同的,而且即使给人类相同的指令,他们也不一定会做完全正确的事情。

[原文] [Rebecca Parsons]: so even though they have much the same training data these models don't all behave the same and you're going to want to try some of the different models that are available if you can to try to see which one is most appropriate for your use case and for your organization you may decide you want multiple models sometimes that's obvious maybe you some of what you need to do requires an image generator and other parts of what you need you want more text generation so you might choose a mod to you might choose two models just because you have different uses you want to put them to

[译文] [Rebecca Parsons]: 所以,尽管它们拥有大体相同的训练数据,这些模型的行为却不尽相同。如果可能的话,你会想要尝试一些现有的不同模型,看看哪一个最适合你的用例和你的组织。你可能会决定需要多个模型。有时这是显而易见的,也许你需要做的一部分工作需要图像生成器,而你需要做的另一部分工作需要更多的文本生成,所以你可能会选择两个模型,仅仅因为你有不同的用途。

[原文] [Rebecca Parsons]: the next question you want to ask is do you want an open source or a closed Source model there's actually still a lot of discussion going on and what does it mean to open source a model is it the algorithm is it the parameters is it the training data and there's a lot of effort being put into trying to understand what should you open source if you were going to open source a model that being said there are actually a significant number of uh open source models out there the last estimate I heard was that it was appro those models are about 18 months behind the best of the closed Source models

[译文] [Rebecca Parsons]: 你要问的下一个问题是:你想要开源模型还是闭源模型?实际上关于“开源一个模型意味着什么”仍有很多讨论——是指算法吗?是参数吗?还是训练数据?人们投入了大量精力试图理解如果你要开源一个模型,你应该开源什么。话虽如此,实际上外面已经有相当数量的开源模型。我听到的最新估算是,这些模型大约落后于最好的闭源模型 18 个月。

[原文] [Rebecca Parsons]: the next question is do you actually want to build your application that it so such that it is agnostic about which model it uses this is roughly analogous to the question of do you want to run multicloud or do you want to specialize ize yourself to a particular Cloud so you can take advantage of whatever specific features exist in that cloud because as I said these models aren't all the same and so they're not necessarily going to respond in the same way you may decide that you want to use different models for different purposes you might in fact decid you want to be able to switch among models and there are some tools that are available to to help support switching amongst models

[译文] [Rebecca Parsons]: 下一个问题是,你是否真的想构建一个对使用哪个模型保持“无关(agnostic)”的应用程序?这大致类似于你要运行多云(multicloud)还是要把自己专门绑定在特定云上的问题——后者可以让你利用该云中存在的任何特定功能。因为正如我所说,这些模型并不完全相同,所以它们的响应方式也不一定相同。你可能会决定针对不同的目的使用不同的模型。事实上,你可能会决定希望能够在模型之间进行切换,现在也有一些工具可以帮助支持这种模型间的切换。

[原文] [Rebecca Parsons]: now one of the more disturbing things I've read recently is that the modeled uh trainers have decided their running out of data now if you think of all of the estimates on how much new data we generate on the internet every single solitary day the fact that they think they're running out of data is concerning and so what they want to do is they want to start generating synthetic data using the large language models that they are then going to train the next Generation on

[译文] [Rebecca Parsons]: 最近我读到的比较令人不安的事情之一是,模型训练者们认为他们快要“用完数据”了。如果你考虑到关于我们每天在互联网上产生多少新数据的所有估算,他们认为数据快用完了这一事实是令人担忧的。因此,他们想做的是开始使用大语言模型生成“合成数据(synthetic data)”,然后用这些数据来训练下一代模型。

[原文] [Rebecca Parsons]: that isn't an idea that I'm real comfortable with you're taking something that sure it has a level of creativity in it but most people after they've seen some amount of responses from these models they can look at some Tas text and say yeah Chad BT did that if then start training the next generation of model that chat GPT is going to access using output that everybody says ooh that looks like something chat GPT did we are going to start to degrade the quality of the output we don't want that

[译文] [Rebecca Parsons]: 这并不是一个让我感到非常舒服的想法。你采用的东西,当然,它包含一定程度的创造力,但大多数人在看到这些模型的一定数量的回复后,他们看着某些文本就能说:“是的,这是 ChatGPT 写的。” 如果你要开始用这种大家都说“哦,这看起来像 ChatGPT 做的”输出,来训练 ChatGPT 将要访问的下一代模型,我们将开始降低输出的质量。我们不希望那样。

[原文] [Rebecca Parsons]: in addition like any reinforcement learning system any biases that exist in that data the training data are going to be reinforced and Amplified if you start using synthetic data in the training it's bad enough that just using those models does tend to amplify whatever characteristic whether it be desirable or not but if we're then going to use this to train the next generation of model we run a risk of degrading the quality of the output of the model so I I'm hopeful that they're going to do a lot more research on is this really a sensible approach to use this level of synthetic data

[译文] [Rebecca Parsons]: 此外,像任何强化学习系统一样,存在于训练数据中的任何偏见都会被强化和放大。如果你开始在训练中使用合成数据,这尤其糟糕。仅仅使用这些模型本身就已经倾向于放大任何特征(无论是否令人满意),这已经够糟了;但如果我们还要用这个来训练下一代模型,我们就冒着降低模型输出质量的风险。所以我希望他们能做更多的研究,探讨使用这种程度的合成数据是否真的是一个明智的方法。

[原文] [Rebecca Parsons]: now one of the puzzles and I haven't found an answer to this yet so if anybody has heard this I I would love to hear it a while ago Sam uh Sam Alman um was reported as saying that he felt that the next significant leap in the power of the model would not come solely from an increase in the number of parameters and much of the increase that we've seen in the past has been attributed to the fact that there's been an enormous increase in the number of parameters that the models can hold and so the implication was that they're going to have to figure out how do we supplement the current models and the current algorithms that work on those models to get that next Leap Forward in the power of the model my hope is that if you take gee we're running out of data with GE we need to rethink the problem a bit to get more power maybe those things will cancel each other out but this is something that I think it's worth people keeping an eye on if we're actually running out of data given the rate the world creates new data on the internet that's a scary thought

[译文] [Rebecca Parsons]: 现在有一个谜题,我还没找到答案,所以如果有人听说过,我很乐意听听。不久前,据报道 Sam Altman 说,他认为模型能力的下一个重大飞跃将不仅仅来自于参数数量的增加。我们要过去看到的许多增长都归因于模型可以容纳的参数数量的巨大增加。因此,这暗示着他们将不得不弄清楚如何补充当前的模型以及在这些模型上运行的现有算法,以实现模型能力的下一个飞跃。我的希望是,如果你把“哎呀,我们要用完数据了”和“哎呀,我们需要稍微重新思考这个问题以获得更强的能力”结合起来看,也许这两件事会相互抵消。但我认为这是值得人们关注的事情——鉴于世界在互联网上创造新数据的速度,如果我们真的要用完数据了,那是一个可怕的想法。


章节 7:工具生态与应用民主化

📝 本节摘要

本节中,Rebecca 介绍了当前 GenAI 工具的爆发式增长。根据 Thoughtworks 最新的《技术雷达》,约三分之一的条目与 AI 相关。她特别提到了“辅助编码”工具,认为使用它们不仅是加速编码,更需要我们重新思考工作流程——从关注“步骤”转向关注“结果”。她还提出了“LLM Ops”的概念,强调运维大模型的重要性。
接着,她探讨了 AI 应用的“民主化”:过去人们在不知不觉中使用 AI(如信用卡反欺诈),而现在医生、律师等非技术人员直接与 AI 交互。这既令人兴奋也令人恐惧(如律师误用案例)。在应用场景上,她列举了消除“白纸恐惧症(Blank Paper Disease)”、企业文档问答、遗留代码分析等实例。最后,她详细分享了印度的 Jugalbandi 项目,该项目通过语音转文本、翻译和 RAG 技术,帮助农民用方言查询政府福利政策,展示了多模型集成的强大潜力。

[原文] [Rebecca Parsons]: there's all different kinds of tools that exist out there for different purposes in working with with these different geni systems we recently put out our 30th edition of the technology radar from thoughtworks and about a third of the blips had something to do with AI and gen Ai and many of them were tools that will help you perform various tasks obviously there's the coding assistance whether it be something like co-pilot or whether it be something like the jet brains um um enhancement of of their IDE

[译文] [Rebecca Parsons]: 市面上有各种各样的工具,用于与这些不同的生成式 AI 系统协作以实现不同的目的。我们最近发布了 Thoughtworks 的第 30 版《技术雷达》(Technology Radar),其中大约三分之一的条目(blips)与 AI 和生成式 AI 有关,其中许多是帮助你执行各种任务的工具。显然有辅助编码工具,无论是像 Copilot 这样的东西,还是像 JetBrains 对其 IDE(集成开发环境)的增强功能。

[原文] [Rebecca Parsons]: an interesting thing around this in terms of coding assistance one of my contentions is that if we're going to actually take advantage of generative AI we have to fundamentally rethink our processes and how we do our job rather than saying okay these are the steps I follow what can gen AI do we need to look Instead at this is the outcome that I want to achieve how can AI help

[译文] [Rebecca Parsons]: 关于辅助编码有一件有趣的事,我的一个观点是,如果我们真的要利用生成式 AI,我们必须从根本上重新思考我们的流程以及我们如何工作。而不是说:“好吧,这些是我遵循的步骤,生成式 AI 能做什么?” 我们需要转而关注:“这是我想要达成的结果,AI 如何提供帮助?”

[原文] [Rebecca Parsons]: in terms of category of of tools though we obviously have the question anwers you know the chat gpts of of the world um also documents that answer questions which which is a very popular application I will talk more about there are tools out there to help you build guard rails for privacy protection and monitoring the flow of data

[译文] [Rebecca Parsons]: 虽然在工具类别方面,我们显然有问答工具,你知道的,像 ChatGPT 之类的;还有回答问题的文档工具,这是一个非常受欢迎的应用,我稍后会多谈一点;还有一些工具可以帮助你建立隐私保护的“护栏(guard rails)”并监控数据流。

[原文] [Rebecca Parsons]: and then of course we started with devops and then we had ml Ops and now we have llm Ops but it's just as important in dealing with large language models as it is anything else we have to know how to deploy them we have to know how to monitor them we have to know how to fix them when they go wrong we have to know how to evaluate how well they're working and I'll talk more about that in the last section and then there's applications and of course there's an enormous range of applications

[译文] [Rebecca Parsons]: 当然,我们从 DevOps 开始,然后有了 MLOps,现在我们有了 LLM Ops(大语言模型运维)。但在处理大语言模型时,它与其他任何事情一样重要。我们必须知道如何部署它们,我们必须知道如何监控它们,我们必须知道当它们出错时如何修复它们,我们必须知道如何评估它们的工作效果,我将在最后一部分多谈这一点。然后是应用程序,当然有通过极其广泛的应用范围。

[原文] [Rebecca Parsons]: we've been using AI for decades part of what's different now is that most of the AI you've interacted in the past with you didn't think about the fact that it was AI for decades fraud verification on credit card transactions have been done with a neural network you've been working with an AI system every time you use your credit card but you didn't know it and you didn't have to know it

[译文] [Rebecca Parsons]: 我们使用 AI 已经几十年了。现在的不同之处在于,过去你与之互动的大多数 AI,你并没有意识到它是 AI。几十年来,信用卡交易的欺诈验证一直是用神经网络完成的。每次你使用信用卡时,你都在与一个 AI 系统打交道,但你不知道,你也无需知道。

[原文] [Rebecca Parsons]: one of the interesting implications particularly of chat gbt is we have people who are interacting directly with AI systems that aren't AI researchers and that aren't computer scientists we have lawyers and doctors and marketing people and construction workers and pick your favorite these are people who are using this incredibly powerful technology

[译文] [Rebecca Parsons]: ChatGPT 特别带来的一个有趣影响是,我们要让那些既不是 AI 研究员也不是计算机科学家的人直接与 AI 系统互动。我们有律师、医生、营销人员和建筑工人,随便选一个职业,这些人都在使用这项极其强大的技术。

[原文] [Rebecca Parsons]: I was I I was speaking to a colleague of mine and he said yeah I was in my doctor's office and she says she's been using chat GPT to look things up and he was horrified and said please tell me you checked on the answers that that they gave you uh but it is so easy to access this technology I think that's both wonderful and terrifying

[译文] [Rebecca Parsons]: 我和一位同事聊天,他说:“是啊,我在医生的办公室,她说她一直在用 ChatGPT 查东西。” 他吓坏了,说:“请告诉我你检查了它给你的答案。” 呃,但这技术太容易获取了。我认为这既美妙又可怕。

[原文] [Rebecca Parsons]: it's terrifying because people don't necessarily know the risks that they're running if they take things at face value case in point lawyer who got sanctioned by a judge because he didn't check check the the uh chat gbt's work but wonderful because we don't really know the impact that this technology is going to have on all of these different applications and having a broad set of people who are thinking about how can I use this technology to make my life better that has the potential to open up a flood of innovation

[译文] [Rebecca Parsons]: 可怕是因为,如果人们只看表面价值,他们不一定知道自己面临的风险。典型的例子就是那个被法官制裁的律师,因为他没有检查 ChatGPT 的工作成果。但美妙是因为,我们真的不知道这项技术会对所有这些不同的应用产生什么影响。拥有广泛的人群去思考“我该如何使用这项技术来让我的生活更美好”,这有潜力开启创新的洪流。

[原文] [Rebecca Parsons]: I've always been very skeptical of innovation centers of excellence where you have these people you who are supposed to sit around and think great thoughts um I would far rather crowdsource ideas and sure eventually somebody's got to decide which ones to fund but you want to bring them from as broad a place as possible and the democratization of AI that has happened as a result of things like j bt radically expands the source of ideas and how we might actually use this technology

[译文] [Rebecca Parsons]: 我一直对那些所谓的“创新卓越中心”持怀疑态度,那里有一群人应该坐在那里思考伟大的想法。嗯,我宁愿众包创意。当然,最终必须有人决定资助哪些创意,但你会希望创意的来源尽可能广泛。像 ChatGPT 这样的事物带来的 AI 民主化,彻底扩展了创意的来源以及我们可能如何实际使用这项技术。

[原文] [Rebecca Parsons]: but some I put some sample applications up there obviously app supporting software development and this is not just coding assistance this can be generate me an outline of an organizational change management strategy give me some ideas on how I might structure a workshop show me how I might test an application with this particular features these are all things that people are trying to do right now in supporting the software development life cycle

[译文] [Rebecca Parsons]: 但我在那里列了一些示例应用。显然有支持软件开发的应用,这不仅仅是辅助编码,还可以是“帮我生成一个组织变革管理策略的大纲”、“给我一些关于如何构建研讨会的想法”、“告诉我如何测试具有这些特定功能的应用程序”。这些都是人们目前在支持软件开发生命周期中尝试做的事情。

[原文] [Rebecca Parsons]: we've probably heard the most about the customer service chat Bots particularly when you have things like Air Canada I and their their loss in the legal case an interesting one that I've seen is to help the marketing department there are applications that will generate comprehensive marketing plans they will generate an image to support the mood that you want to set if somebody's looking at a display ad they will generate an elevator pitch or the text that should go next to particular images to help sell your product

[译文] [Rebecca Parsons]: 我们可能听说最多的是客户服务聊天机器人,特别是像加拿大航空及其在法律案件中的败诉。我见过的一个有趣的案例是帮助营销部门。有些应用程序可以生成全面的营销计划;它们会生成一张图片来支持你想设定的情绪;如果有人在看展示广告,它们会生成电梯游说辞(elevator pitch)或应该放在特定图片旁边的文字,以帮助销售你的产品。

[原文] [Rebecca Parsons]: now of course I would not take one of these and immediately send it to the Met metaphorical printer um but one of the nice things about these systems is they give you a place to start one of my colleagues often has a horrible case of what I like to call blank paper disease he's looking at a blank screen and has to write something and just can't get the first words out he started using chat GPT and what's interesting is sometimes he really likes what's there and he just sort of fixes it up a little bit but it's not all that uncommon that it hates what comes out but what Chachi BD has done is put something on the screen and even though he might delete the entire thing it's gotten him over blank screen disease whatever works

[译文] [Rebecca Parsons]: 当然,我不会拿这些东西直接送去“隐喻的打印机”。嗯,但这些系统的一个好处是它们给了你一个起点。我的一位同事经常患有一种可怕的病,我称之为“白纸病(blank paper disease)”。他看着空白屏幕,必须写点什么,但就是写不出第一个字。他开始使用 ChatGPT,有趣的是,有时他真的很喜欢上面的内容,只是稍微修改一下;但他讨厌输出内容的情况也并不少见。但 ChatGPT 所做的是把一些东西放在屏幕上,即使他可能会删除整个内容,但这让他克服了“白纸病”。不管怎样,管用就行。

[原文] [Rebecca Parsons]: one of the more interesting applications that we're seeing a lot of is effectively a document that answers your questions so think about your employee policy guide isn't that so thrilling to read but instead you can load through some of the techniques we'll talk about that employee resource guide into a document answering system and then you can ask what uh what is the specific uh eligibility period for the medical plan and it will spit it out and some of the better ones will not only spit out the answer but they will give you the reference to where it appears in the guides so you can actually go check it

[译文] [Rebecca Parsons]: 我们经常看到的更有趣的应用之一实际上是“回答你问题的文档”。想想你的员工政策指南,读起来是不是很“刺激”?但你可以通过我们将要讨论的一些技术,把员工资源指南加载到一个文档问答系统中,然后你可以问:“医疗计划的具体资格期限是多少?” 它会吐出答案。一些更好的系统不仅会吐出答案,还会给出它在指南中出现的引用位置,这样你就可以实际去检查它。

[原文] [Rebecca Parsons]: people are also using something similar to this to start to analyze Legacy code you load up the Legacy code and then you start asking questions about how it works about how the information is Flowing because what's one of the biggest problems of dealing with these Legacy applications is you don't really understand how they work and I'm beginning to think that these code models are actually better at explaining code than they are at writing it which might not be a bad thing by the way

[译文] [Rebecca Parsons]: 人们也开始使用类似的东西来分析遗留代码。你加载遗留代码,然后开始问关于它是如何工作的、信息是如何流动的问题。因为处理这些遗留应用程序的最大问题之一就是你真的不理解它们是如何工作的。我开始认为,这些代码模型实际上在解释代码方面比在编写代码方面做得更好——顺便说一句,这可能不是件坏事。

[原文] [Rebecca Parsons]: one of my favorite applications this is something that has come from thought works but I think it demonstrates how you could integrate these models into a broader application and jugle Bondi started life as an application here in India the front end was a speech to text translation system that supported the indic languages the last number I saw that it's up to 11 but that was in January and they've actually gotten much more efficient about um about supporting a broader class of languages

[译文] [Rebecca Parsons]: 我最喜欢的应用之一——这是来自 Thoughtworks 的成果,但我认为它展示了如何将这些模型集成到更广泛的应用程序中——是 Jugalbandi。它最初是在印度开发的一个应用程序。前端是一个支持印度语系的语音转文本翻译系统。我看到的最新数字是支持 11 种语言,但那是一月份的数据,他们实际上在支持更广泛的语言类别方面已经变得更有效率。

[原文] [Rebecca Parsons]: and it translates questions from people who are asking about the various welfare schemes that are available in India and so consider a farmer somewhere who's using Whatsapp and he can ask the question what housing schemes am I eligible for the text the speech to text will translate that they will translate it into the base language of the documents that have been pulled from the Indian government website using rag to find the relevant policies and then engage in a dialogue asking questions well do you fall into this category what is your situation here and then presents to the person this is what you have to do to be able to to um exercise your rights for this particular scheme

[译文] [Rebecca Parsons]: 它翻译人们关于印度各种可用福利计划(schemes)的问题。想象一下某处的农民正在使用 WhatsApp,他可以问:“我有资格参加哪些住房计划?” 语音转文本系统会翻译该问题,将其翻译成从印度政府网站提取的文档的基础语言,并使用 RAG(检索增强生成)来找到相关政策,然后进行对话,询问:“嗯,你属于这一类吗?你的情况是怎样的?” 然后向那个人展示:“这就是你需要做的,以便行使你在该特定计划中的权利。”

[原文] [Rebecca Parsons]: they've expanded it uh to include some other things having to do with the legal system in India um but you have rag you have speech to text you have integration with various schemes out there where they where they can validate things so it's not just chat GPT it's not just you have this gigantic model and you've got this very simple prompt interface and you're going to talk to it you have a model that can serve as a critical component within your organization's it assets

[译文] [Rebecca Parsons]: 他们已经扩展了它,包括一些与印度法律体系有关的其他事情。但你有 RAG,你有语音转文本,你还集成了各种外部计划以验证信息。所以这不仅仅是 ChatGPT,不仅仅是你有一个巨大的模型和一个非常简单的提示界面然后你和它对话。你拥有的是一个可以作为你组织 IT 资产中关键组件的模型。


章节 8:软件架构模式:从 Serverless 到 RAG

📝 本节摘要

在本节中,Rebecca 探讨了适用于 GenAI 的几种关键架构模式。对于简单的“接收提示-返回响应”场景,她推荐 Serverless(无服务器) 架构;而对于像 Jugalbandi 这样涉及多方交互的复杂系统,微服务(Microservices) 则是更好的选择。她还提到了出于隐私考量将模型服务推向 边缘(Edge)(如用户手机)的趋势,利用精简版模型在本地处理数据。
重点部分是对 RAG(检索增强生成) 的深入解析。她解释了 RAG 如何将“基础模型”的通用能力与特定的“上下文”结合,例如通过加载个人写作样本来模仿特定语气,或加载企业组织架构图以回答层级关系问题。最后,她强调了与外部系统集成的价值,例如 Google 的聊天机器人能提供来源链接,这对于验证信息真实性、对抗模型“编造”事实(如虚构《卫报》文章)至关重要。

[原文] [Rebecca Parsons]: okay so let's talk about some patterns software architecture this is a non-exhaustive list but I think all of all of these have some relevance if you're just going to serve a model that is going to Res ingest a prompt and respond to a prompt a servicess architecture is absolutely the right way to go and if you do some research a lot of the performance um the performance improvements and tests and strategies are based on using a servus architecture

[译文] [Rebecca Parsons]: 好吧,让我们来谈谈一些软件架构模式。这不是一个详尽的列表,但我认为所有这些都有一定的相关性。如果你只是要提供一个模型服务,用于接收提示(prompt)并对提示做出响应,那么 Serverless(无服务器)架构绝对是正确的选择。而且如果你做一些研究,你会发现很多性能改进、测试和策略都是基于使用 Serverless 架构的。

[原文] [Rebecca Parsons]: but when you start looking at broader Integrations you might want to start considering a microservices architecture if you think back to that jle Bondi example you've got lots of different moving Parts going on there and the model is clearly a critical part of that but you've got different kinds of Integrations to worry about and so you'll probably want to consider to break up some of those activities into a microservices architecture

[译文] [Rebecca Parsons]: 但是当你开始通过更广泛的集成时,你可能需要开始考虑微服务架构。回想一下那个 Jugalbandi 的例子,那里有很多不同的活动部件,模型显然是其中的关键部分,但你有不同种类的集成需要操心,所以你可能会想要考虑将其中一些活动拆分进微服务架构中。

[原文] [Rebecca Parsons]: one of the more interesting things that that that I've seen with respect to privacy is pushing more and more of the serving of the model out to the edge will train the model centrally and maybe we'll use some of the techniques that are available to scale back a model so you know you're not trying to send you know a trillion plus parameters to your poor mobile phone over your data roaming which would be frightening um but instead cut down the model and then the interaction between the user and the model never leaves the user's phone and so they have control over their data but they're still able to take advantage of the power of the centrally trained model so this is something we're starting to see more and more of and I think that's going to continue

[译文] [Rebecca Parsons]: 关于隐私,我见过的比较有趣的事情之一是,将越来越多的模型服务推向“边缘(edge)”。我们在中心训练模型,也许我们会使用一些现有的技术来缩减模型规模——你知道,你不想试图通过数据漫游把一万多亿个参数发送到你可怜的手机上,那太可怕了——而是通过削减模型,让用户与模型之间的交互永远不离开用户的手机。这样他们就能控制自己的数据,但仍然能够利用中心训练模型的强大能力。这是我们要开始越来越多看到的东西,我认为这种趋势会持续下去。

[原文] [Rebecca Parsons]: now I've talked about rag how many people are familiar with rag okay so a lot that aren't rag stands for retrieval augmented generation and conceptually what you do is you've got this foundational model remember that's my reserved word for what understands in quotes it's very easy to anthropomorphize these models but it interprets the prompts and responds to those prompts and you augment that foundational model with specialized content

[译文] [Rebecca Parsons]: 我之前谈到了 RAG。有多少人熟悉 RAG?好的,还有很多人不熟悉。RAG 代表“检索增强生成(Retrieval Augmented Generation)”。从概念上讲,你要做的是拥有这个基础模型——记得这是我对那些能“理解”(加引号)的东西的保留词,我们很容易将这些模型拟人化,但它实际上是解释提示并对这些提示做出响应——然后你用专门的内容来增强那个基础模型。

[原文] [Rebecca Parsons]: maybe what you have is uh is your set of coding standards uh I know of one particular application of this uh a colleague of mine um put together something it was still a chatbot but it was augmented with all of his individual writings and so he used the rag technique to effectively embed all of his writings and then he had effectively a language system that would respond in his personal style

[译文] [Rebecca Parsons]: 也许你有的是一套编码标准。我知道这方面的一个特定应用,我的一位同事做了一个东西,它仍然是一个聊天机器人,但它用他所有的个人著作进行了增强。所以他使用 RAG 技术有效地嵌入了他所有的著作,然后他实际上拥有了一个会以他的个人风格进行回应的语言系统。

[原文] [Rebecca Parsons]: we all have a personal style there are words that we use more commonly than than others more frequently than others there are words that we don't tend to use at all some people always speak in very short sentences some people speak in much longer sentences some people have very plain language some people have very flowery language and this chatbot responds like Mike and all he did was use this technique to load up his writings and now you have a melding of the the language capabilities of the foundational model but it is biased towards using Mike's voice

[译文] [Rebecca Parsons]: 我们都有个人风格。有些词我们比其他词用得更普遍、更频繁;有些词我们根本不倾向于使用。有些人总是说很短的句子,有些人说很长的句子。有些人的语言很朴实,有些人的语言很华丽。而这个聊天机器人的回应就像 Mike 一样。他所做的只是使用这种技术加载了他的著作,现在你就拥有了基础模型语言能力与 Mike 声音偏好的融合。

[原文] [Rebecca Parsons]: you can do that with all kinds of documents I saw a recent paper where some people have also augmented the standard embeddings that you get in rag with uh one example in fact it was an org chart it was these are the different organizations the entities that exist within my my company and how they relate to each other and so the questions can be put into the context of where the question fits in the in the hierarchy of the organization so there are all kinds of different things you can use this rag technique for

[译文] [Rebecca Parsons]: 你可以用各种各样的文档来做这件事。我最近看到一篇论文,有些人还用——实际上是一个组织结构图的例子——来增强你在 RAG 中获得的标准嵌入(embeddings)。它是关于“这些是存在于我公司内的不同组织实体以及它们如何相互关联”,因此问题可以被置于“该问题在组织层级中处于什么位置”的上下文中。所以你可以用这种 RAG 技术做各种各样不同的事情。

[原文] [Rebecca Parsons]: another very powerful software architecture approach is integration with other systems one of the things I found reassuring about how Google was approaching some of its chat Bots is when it came back with an answer it came back with a link to where to to where it found the answer or what it used to create the answer and it made it a whole lot easier than to be able to check and see okay well does this link exist or not because you know chat G PT makes things up

[译文] [Rebecca Parsons]: 另一种非常强大的软件架构方法是与其他系统集成。我对 Google 处理其某些聊天机器人方式感到安心的一点是,当它返回答案时,它会附带一个链接,指向它在哪里找到的答案或它用什么创建了答案。这使得检查并查看“好吧,这个链接是否存在”变得容易得多,因为你知道 ChatGPT 会编造事情。

[原文] [Rebecca Parsons]: there's a guardian reporter who has gotten multiple queries about how come how come I can't find your article online and chat GPT had a had properly identified this reporter as someone who wrote on this particular topic and they made up the title of an article and this one reporter got multiple requests like this so having the ability to easily cross check something is just one possible integration jle Bondi and the integration between the the language model and the speech to text is another example of that but there are numerous ones

[译文] [Rebecca Parsons]: 有一位《卫报》(Guardian)的记者收到了多次询问,问“为什么我在网上找不到你的文章?” ChatGPT 正确地识别了这位记者是写这个特定话题的人,但它编造了一个文章标题。这位记者收到了多次这样的请求。所以,拥有能够轻松交叉核对某事的能力只是集成的一种可能性。Jugalbandi 以及语言模型与语音转文本之间的集成是另一个例子,但这方面有无数的例子。


章节 9:进阶架构:多智能体系统 (Multi-agent Systems)

📝 本节摘要

本节聚焦于当前风靡一时的“多智能体系统(Multi-agent Systems)”。Rebecca 指出,智能体(Agent)不仅是能回答问题的模型,更是能“在其上下文之外采取行动”的实体。虽然这听起来很诱人,但她警告:一旦采用自主行动的智能体,你就引入了分布式系统的所有复杂难题。
她讨论了系统的“拓扑结构(Topology)”,从简单的流水线(Pipeline)模式到复杂的交互模式。她特别提到了复古的“黑板架构(Blackboard Architecture)”的回归——即多个智能体观察同一个问题,各自提出解决方案,最后由系统汇总或选择最佳答案。
尽管技术令人兴奋,她的建议非常务实:在初期阶段应“保持简单(Keep it simple)”,例如仅使用一个模型来检查另一个模型的答案,以避免陷入过度复杂的泥潭。

[原文] [Rebecca Parsons]: now let's talk about multi-agent systems if everybody loves multi-agent systems they're all the rage an agent very simply is something that can take action outside of its context and there are easy ways to do this and there are hard ways to do this and when you are looking at a multi-agent systems particularly one where the agents act relatively autonomously you have just bought yourself all of the potential problems of any distributed system and you got to figure out how to address all those problems

[译文] [Rebecca Parsons]: 现在让我们来谈谈多智能体系统,大家都喜欢多智能体系统,它们现在风靡一时。简单来说,智能体(agent)就是某种可以在其上下文之外采取行动的东西。做这件事有简单的方法,也有困难的方法。当你审视一个多智能体系统时,特别是那种智能体相对自主行动的系统,你就给自己招揽了任何分布式系统都可能出现的所有潜在问题,你必须弄清楚如何解决所有这些问题。

[原文] [Rebecca Parsons]: so the topology of your multi-agent system matters you might decide that you can conceive of of a a a pipeline like workflow where a problem is addressed in sequence by multiple models before it comes out at the end by by definition that's a multi-agent system but it's a relatively simple one to keep control of

[译文] [Rebecca Parsons]: 所以,你的多智能体系统的拓扑结构(topology)很重要。你可能会决定构想一个类似“流水线(pipeline)”的工作流,其中一个问题由多个模型按顺序处理,最后才输出结果。根据定义,这就是一个多智能体系统,但它是一个相对容易控制的简单系统。

[原文] [Rebecca Parsons]: if you start to get multiple layers if you start to get um lots of competing approaches to the problem it's much more complicated and you have to worry about the granularity of these individual agents if you make them small what does that mean you probably have more of them but if you make them small it might be easier to understand are they actually doing what I want them to do

[译文] [Rebecca Parsons]: 如果你开始有多层结构,如果你开始有许多相互竞争的方法来解决问题,情况就会复杂得多。你必须担心这些个体智能体的粒度(granularity)。如果你把它们做得更小,这意味着什么?你可能会有更多智能体,但如果你把它们做小,可能更容易理解“它们真的在做我想要它们做的事情吗?”。

[原文] [Rebecca Parsons]: back when I was dealing with AI early on there was something called a Blackboard architecture and that this idea is come back into favor and conceptually what you have is you have a Blackboard and I write a problem on the Blackboard and all of the students the models the a agents all look at the problem and decide I know how to solve that and they go off and come up with an answer and they write their answer up on the Blackboard and then when we've either got sorry I'm stumped or an answer from all of the different agents then you can decide do I want to take a consensus answer or do I like this answer better for whatever reason and so that's one way also to approach a multi-agent system

[译文] [Rebecca Parsons]: 回想我早期从事 AI 工作时,有一种叫做“黑板架构(Blackboard architecture)”的东西,这个概念现在又重新流行起来了。从概念上讲,你有一块黑板,我在黑板上写下一个问题,所有的“学生”——即模型、智能体——都看着这个问题并决定:“我知道怎么解决那个。” 然后它们离开去想出一个答案,并把它们的答案写在黑板上。然后当我们要么收到“对不起,我被难住了”,要么收到来自所有不同智能体的答案时,你就可以决定:“我是想要取一个共识答案,还是基于某种原因我更喜欢这个特定的答案?” 这也是一种处理多智能体系统的方式。

[原文] [Rebecca Parsons]: but my recommendation is if you are early in your journey and dealing with Gen systems try to keep it simple one very simple use and I'll talk a little bit more about this later is to have one model check the answer of another model to decide whether or not it's good one that's a relatively simple interaction and you won't get into too much trouble trying to approach that but let's not make this problem any more complicated than it already is

[译文] [Rebecca Parsons]: 但我的建议是,如果你在生成式系统的旅程中尚处于早期阶段,试着保持简单。一个非常简单的用法——我稍后会再多谈一点——是让一个模型去检查另一个模型的答案,以判定它是否是一个好答案。这是一个相对简单的交互,尝试这种方法你不会遇到太大的麻烦。但让我们不要把这个问题弄得比它本来已经存在的还要复杂。


章节 10:从 POC 到生产环境:落地挑战与未来展望

📝 本节摘要

作为最后的总结章节,Rebecca 指出 2023 年是“概念验证(POC)之年”,而 2024 年企业试图将这些应用推向生产环境,但成功率极低(仅约 7%)。主要障碍在于企业数据质量差,无法支持模型处理,以及缺乏自动化的“投产流程”。此外,AI 辅助编程工具虽然提高了代码产出量,但也加剧了架构师的负担,她引用数据称 AI 在代码重构方面的正确率仅为 37%,这意味着大量错误代码需要人工审核。面对非确定性系统的测试难题,她提议利用“红队测试(Red Teaming)”和让模型互相检查。最后,她总结道 GenAI 正在彻底重塑软件交付流程,并幽默地再次提醒听众:这一领域变化极快,今天的内容可能周一就过时了。

[原文] [Rebecca Parsons]: again some of the prominent ilities of course this is a non-exhaustive list but I do think one of the important ones that we have to at least understand what risks we are running is the non-determinism we talked a lot about those other ones but I do want to remind you that latency is a big deal just because our computers are faster than they were doesn't mean that they can respond at the speed that we would like them to respond and that's something you have to take into account depending on how you're using the AI system and I can't reinforce enough yes we do still have to care about observability even if we don't really understand what it's doing under the C covers

[译文] [Rebecca Parsons]: 再次强调一些突出的“特性(-ilities)”,当然这不是一个详尽的清单,但我确实认为我们必须至少了解我们在非确定性方面面临的风险,这是重要的特性之一。我们已经谈了很多关于其他的,但我确实想提醒你们,延迟(latency)是一个大问题。仅仅因为我们的计算机比以前更快,并不意味着它们能以我们希望的速度做出响应。这是你必须考虑的事情,具体取决于你如何使用 AI 系统。而且我再怎么强调也不为过:是的,我们仍然必须关心可观测性(observability),即使我们并不真正理解它在“盖子底下”到底在做什么。

[原文] [Rebecca Parsons]: now one of the big problems that we're dealing with at the moment 2023 was the year of the proof of concept all kinds of organizations did all kinds of poc's across things that would go to their customers or things that would help support customer service or things that might help their sales organization or things that might help Finance got plc's all over the place

[译文] [Rebecca Parsons]: 现在,我们要目前正在处理的一大问题是:2023 年是“概念验证(POC)之年”。各种各样的组织做了各种各样的 POC,涉及面向客户的东西、帮助支持客户服务的东西、可能帮助销售组织的东西,或者可能帮助财务的东西。POC 遍地开花。

[原文] [Rebecca Parsons]: 2024 is the year they're trying to put those things into production it's not going so well the last number that I heard was in the high singled digigit success rate roughly 7% That's pretty bad

[译文] [Rebecca Parsons]: 2024 年是他们试图将这些东西投入生产环境的一年。情况进展得并不顺利。我听到的最新数字是高个位数的成功率,大约 7%。这相当糟糕。

[原文] [Rebecca Parsons]: now the kinds of problems are not surprising we all know how easy it is to manually fix fix a lot of things when you've got a small set of users these systems are no more immune to bad data than any other system is and the fact is most of our organizations there our data is not in a state that it can be properly processed and so often what these PLC to produ ction failures are highlighting is just how bad our data is

[译文] [Rebecca Parsons]: 现在,这些问题的种类并不令人惊讶。我们都知道,当你只有一小群用户时,手动修复很多东西是多么容易。这些系统并不比其他任何系统更能通过坏数据“免疫”。事实是,在我们大多数组织中,我们的数据并没有处于可以被正确处理的状态。所以,这些从 POC 到生产环境的失败通常凸显的只是我们的数据有多糟糕。

[原文] [Rebecca Parsons]: and so you have to go back to okay well what do I have to do to clean up my data to make sure the right people can see it and the wrong people can't to make sure I know what it is to make sure it's high quality data to make sure dot dot dot dot dot

[译文] [Rebecca Parsons]: 所以你必须回过头去想:“好吧,我必须做什么来清理我的数据?确保对的人能看到它,错的人看不能?确保我知道它是什么?确保它是高质量数据?确保……等等等等。”

[原文] [Rebecca Parsons]: another part of this is again with a POC who cares about having a nice automated flow to production we don't need that for a PC well you do need it if you want to put these things into production and so a big problem that we're dealing with this year is how do we turn these poc's and many of them have been quite successful in terms of validating their business case what do we have to do to actually get them into production

[译文] [Rebecca Parsons]: 这件事的另一部分再次与 POC 有关。做 POC 时,谁在乎有没有一个漂亮的自动化生产流程?做 POC 不需要那个。确实,但如果你想把这些东西投入生产,你就需要它。所以我们今年正在处理的一个大问题是,我们如何转变这些 POC——其中许多在验证商业案例方面已经相当成功——我们必须做什么才能实际上把它们投入生产?

[原文] [Rebecca Parsons]: now what about the poor Architects well it was bad enough before when you think about the number of Architects to the number of delivery teams let alone the number of developers the poor Architects are vastly outnumbered and now we've got coding assistants that are turning out more and more and more and more code so we've made the problem from Architects a whole lot worse

[译文] [Rebecca Parsons]: 那么可怜的架构师们呢?如果你考虑一下架构师与交付团队数量的比例,更不用说与开发人员数量的比例,情况以前就已经够糟了,可怜的架构师们在人数上处于绝对劣势。而现在我们有了辅助编码工具,它们正在产出越来越多、越来越多、越来越多的代码。所以我们让架构师的问题变得糟糕多了。

[原文] [Rebecca Parsons]: and oh by the way for things like refactoring one study that I saw that was actually quite well done the probability that the coding assistant got the refactoring right was 37% that was the highest of any of the code models that existed now how long do you think You' keep your job if you got it wrong 63% of the time probably not very long

[译文] [Rebecca Parsons]: 哦,顺便说一句,对于像重构(refactoring)这样的事情,我看到一项做得相当好的研究显示,辅助编码工具将重构做对的概率是 37%。那是现有任何代码模型中最高的了。现在,如果你有 63% 的时间都做错了,你觉得你能保住工作多久?可能不会很久。

[原文] [Rebecca Parsons]: but what about coding standards what about the Enterprise patterns how do I keep the Integrity of the overall Vision when I've got a code model spewing out code and so we've made the problem for the architect much worse but there are ways to handle it

[译文] [Rebecca Parsons]: 但是编码标准呢?企业模式呢?当有一个代码模型在喷涌出代码时,我如何保持整体愿景的完整性?所以我们让架构师的问题变得更糟了,但有一些方法可以处理它。

[原文] [Rebecca Parsons]: first let's talk about testing Ponder for a moment how would you write an automated test to test the validity of a non-determined system there are actually some very welldeveloped metrics for things like natural language translation but those metrics have not proved useful in evaluating the quality of some of these models we're still playing with lots of different ways to validate this

[译文] [Rebecca Parsons]: 首先让我们谈谈测试。思考片刻,你会如何编写自动化测试来测试一个非确定性系统的有效性?实际上,对于自然语言翻译之类的东西,有一些非常成熟的指标,但这些指标在评估其中一些模型的质量方面并未被证明是有用的。我们还在尝试许多不同的方法来验证这一点。

[原文] [Rebecca Parsons]: some of it is asking the language modeled to suggest tests that you might then run against the other language model this might be that paired language model where one model checks the answer of the other one we're also seeing a lot of red teaming going on in terms of trying to expose particular kinds of vulnerabilities in these systems this is exactly the same kind of red teaming you see in security except it's targeted at how can I make the large language model do something it's not supposed to do

[译文] [Rebecca Parsons]: 其中一种方法是要求语言模型建议测试,然后你可以针对另一个语言模型运行这些测试。这可能就是那种成对的语言模型,其中一个模型检查另一个模型的答案。我们还看到很多“红队测试(red teaming)”正在进行,旨在试图暴露这些系统中的特定类型的漏洞。这与你在安全领域看到的红队测试完全相同,只是它的目标是“我如何让大语言模型做一些它不应该做的事情”。

[原文] [Rebecca Parsons]: and jailbreaking is again something that is getting a lot of attention as I said earlier given the size of the context window we just can't handle it standards we can use techniques similar to rag to load up what our Enterprise standards are so the model won't generate it stuff that we don't want it to and that's just one of the approaches

[译文] [Rebecca Parsons]: “越狱(jailbreaking)”再次受到大量关注。正如我早些时候所说,考虑到上下文窗口的大小,我们简直无法应付。关于标准,我们可以使用类似 RAG 的技术来加载我们的企业标准,这样模型就不会生成我们不希望它生成的东西。这只是方法之一。

[原文] [Rebecca Parsons]: but Ai and gen are completely reshaping the entire software delivery process and we need to think about what that means for our jobs and we need to think about what that means for the jobs in 20 years and as a reminder this will be all out of date by Monday thank you very much

[译文] [Rebecca Parsons]: 但 AI 和 GenAI(生成式 AI)正在彻底重塑整个软件交付流程。我们需要思考这对我们的工作意味着什么,我们需要思考这对 20 年后的工作意味着什么。最后提醒一下,这一切到了周一就会全部过时。非常感谢大家。