羊驼系列大模型和ChatGPT差多少？详细测评后，我沉默了( 三 ) _大模型

文章插图

文章插图

测试者表示，他们之所以要多次迭代 prompt ，是因为 OpenAI API 不允许他们做部分输出补全（即他们不能指定 AI 助手如何开始回答），因此他们很难引导输出。
相反，如果使用一个开源模型，他们就可以更清楚地指导输出，迫使模型使用他们规定的结构。
新一轮测试使用如下 prompt：

qa_guided = guidance ('''{{#system~}}{{llm.default_system_prompt}}{{~/system}}{{#user~}}You will read a meeting transcript, then extract the relevant segments to answer the following question:Question: {{query}}----{{transcript}}----Based on the above, please answer the following question:Question: {{query}}Please extract the three segment from the transcript that are the most relevant for the answer, and then answer the question.Note that conversation segments can be of any length, e.g. including multiple conversation turns. If you need less than three segments, you can leave the rest blank.As an example of output format, here is a fictitious answer to a question about another meeting transcript:CONVERSATION SEGMENTS:Segment 1: Peter and John discuss the weather.Peter: John, how is the weather today?John: It's raining.Segment 2: Peter insults JohnPeter: John, you are a bad person.Segment 3: BlankANSWER: Peter and John discussed the weather and Peter insulted John.{{/user}}{{#assistant~}}CONVERSATION SEGMENTS:Segment 1: {{gen'segment1'}}Segment 2: {{gen'segment2'}}Segment 3: {{gen'segment3'}}ANSWER: {{gen 'answer'}}{{~/assistant~}}''')

如果用 Vicuna 运行上述 prompt ，他们第一次就会得到正确的格式，而且格式总能保持正确：

文章插图

当然，也可以在 MPT 上运行相同的 prompt：

文章插图
虽然 MPT 遵循了格式要求，但它没有针对给定的会议资料回答问题，而是从格式示例中提取了片段。这显然是不行的。
接下来比较 ChatGPT 和 Vicuna 。
测试者给出的问题是「谁想卖掉公司？」两个模型看起来答得都不错。
以下是 ChatGPT 的回答：

文章插图

以下是 Vicuna 的回答：

文章插图

接下来，测试者换了一段材料。新材料是马斯克和采访人员的一段对话：

文章插图

测试者提出的问题是：「Elon Musk 有没有侮辱（insult）采访人员？」
ChatGPT 给出的答案是：

文章插图
Vicuna 给出的答案是：

文章插图

Vicuna 给出了正确的格式，甚至提取的片段也是对的。但令人意外的是，它最后还是给出了错误的答案，即「Elon musk does not accuse him of lying or insult him in any way」。
测试者还进行了其他问答测试，得出的结论是：Vicuna 在大多数问题上与 ChatGPT 相当，但比 ChatGPT 更经常答错。
用 bash 完成任务测试者尝试让几个 LLM 迭代使用 bash shell 来解决一些问题。每当模型发出命令，测试者会运行这些命令并将输出插入到 prompt 中，迭代进行这个过程，直到任务完成。
ChatGPT 的 prompt 如下所示：

terminal = guidance ('''{{#system~}}{{llm.default_system_prompt}}{{~/system}}{{#user~}}Please complete the following task:Task: list the files in the current directoryYou can give me one bash command to run at a time, using the syntax:COMMAND: commandI will run the commands on my terminal, and paste the output back to you. Once you are done with the task, please type DONE.{{/user}}{{#assistant~}}COMMAND: ls{{~/assistant~}}{{#user~}}Output: guidance project{{/user}}{{#assistant~}}The files or folders in the current directory are:- guidance- projectDONE{{~/assistant~}}{{#user~}}Please complete the following task:Task: {{task}}You can give me one bash command to run at a time, using the syntax:COMMAND: commandI will run the commands on my terminal, and paste the output back to you. Once you are done with the task, please type DONE.{{/user}}{{#geneach 'commands' stop=False}}{{#assistant~}}{{gen 'this.command'}}{{~/assistant~}}{{~#user~}}Output: {{shell this.command)}}{{~/user~}}{{/geneach}}''')
上一页
1
2
3
4
5
下一页
		  	





























推荐阅读

           
                  
              
                  |黑科技加持，助力从容差旅——Wenger威戈德莱蒙系列双肩背包 
                
                   
                
              
            

                  
              
                  「老绵洋」青春减龄有活力，让你轻松穿出原宿美少女范，清新绿色T恤巧搭配 
                
                   
                
              
            

                  
              
                  南京家具城有哪些(南京哪里有卖家居摆件的) 
                
                   
                
              
            

                  
              
                  用花生壳软件怎么绑定域名？ 
                
                   
                
              
            

                  
              
                  白茶泡法盖碗泡法,可用盖碗法和杯泡法 
                
                   
                
              
            

                  
              
                  互联网圈里人华为海思的高光时刻，Q1营收26.7亿美元，跻身全球半导体前十 
                
                   
                
              
            

                  
              
                  【】多地影院上座率限制放宽至50% 大片或加快入场 
                
                   
                
              
            

                  
              
                  小爱说游戏|UZI或将成为某俱乐部“老板”？知名圈内大佬表示：Uzi存在复出可能 
                
                   
                
              
            

                  
              
                  出租车|出租车司机右手骨折缠绷带挂脖 副驾乘客帮挂挡！网友：自助打车 
                
                   
                
              
            

                  
              
                  紫甘蓝怎么炒才好吃？ 
                
                   
                
              
            

                  
              
                  【娱乐小队】5系真的比X3更有档次吗？，四十几万买宝马 
                
                   
                
              
            

                  
              
                  怎样看待预防医学与临床医学越走越远 
                
                   
                
              
            

                  
              
                  「民航局」特朗普告诉医务人员：\口罩消毒可重复使用，别扔\ 
                
                   
                
              
            

                  
              
                  金橙橙|就是有一个懂你的人（说的太对了），幸福 
                
                   
                
              
            

                  
              
                  珍惜有缘人唯美句子 有缘相见经典短句 
                
                   
                
              
            

                  
              
                  关于朋友之间有啥另你讨厌的事情 
                
                   
                
              
            

                  
              
                  无油鸡米花 
                
                   
                
              
            

                  
              
                  忠橙家族体育|女乒要变天？国乒策略调整丁宁被边缘化，新生代危机已解除 
                
                   
                
              
            

                  
              
                  原本你只想要一个拥抱，不小心多了一个吻，这时你发现需要一张床 
                
                   
                
              
            

                  
              
                  我想变成什么仿写一句话动物的歌声?我想把什么变成什么仿写一句话_2 
                
                   
                
              
            

          

一文读懂什么是AIGC、ChatGPT、大模型 

考研|军官职业发展系列谈之三——“考研” 

大模型赛道正“热”：卷场景、卷芯片、卷人才 

AI大模型的未来市场在中国 

“AI的商业化路线已经清晰” 2023京东“赶考”千亿级产业大模型 

鱼龙混杂大模型：谁在蹭热点？谁有真实力？ 

MathGPT来了！专攻数学大模型，解题讲题两手抓 

大模型“群雄逐鹿”，科大讯飞何以脱颖而出？ 

除了推出大模型，AI发展还应做什么 

欧莱雅护肤系列分别适用的年龄段;欧莱雅护肤品哪种好用?