-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: new 'consensus' mode for multi-model consultation #3098
Comments
Hey! good to see a like-minded individual. I know there are tons more models and its quite a big miss to not see any one of the ai assistant move to having them collaborate together. like why choose just one over sonnet, r1, o3, etc, when you can have all of them? :) anyhoo, ive had this pr #2628 since december which does some sort of the roundtable discussion. I'd appreciate a look over it :) |
Hey, thanks for the comment! Looks like you beat me to the general idea - as I commented on your PR, I won't repeat myself here, but I'll just say that our approaches differ in the sense that mine is really just an enhancement of the architect mode plus my new proposed 'tweak' ability. I've tried to keep the changes to the original mode fairly minimal whilst maximising impact. Maybe we can work together on this if it gains traction and you're interested, as I see you're already a believer in the importance of this idea. Thanks for getting in touch! |
Imagine if software was actually designed through a lead architect giving his idea to a single software engineer. A lot of time is surely going to be wasted bouncing back and forth between these two people and biases would amplify.
Having a team of engineers listen to the architect’s ideas can bring fresh perspectives. Then, having the architect review the discussion at the end helps refine the final instructions. This approach leads to more realistic, polished results that meet the client’s needs while also reducing the workload on both the lead architect and engineers.
The "consensus" mode would in essence extend "architect" mode by allowing more than one model to be consulted as intermediates before we get to the editor-model. Each model would critique the previous responses, refining the approach while staying within scope. A lead architect model would then consolidate everything into a structured prompt before passing it to the editor model.
To illustrate this, I tested the concept manually by improving the UI for my midnight calculator project, which currently has a very basic design as you can see:
Let's suppose I want to give the following prompt to an AI model:
I know it's generally a bad idea to suggest multiple fixes at once with AI, however I do believe that AI is more than capable now to handle such a demand if it's handled right. For the normal architect mode (r1 and sonnet) we got the following result:
I then reverted this change back to the first screenshot above and tested the consensus mode manually by having multiple models refine the initial prompt in a 'discussion' before passing the final version to the editor. Each model was basically asked to review and add to the previous suggestions while staying strictly within scope and focusing on the client's requirements.
In the last prompt, we gave the whole conversation to a reasoning model along with some directions and a reminder of the original prompt.
Here's the result for the consensus mode, which was basically the same single prompt--> R1 --> 4o --> gpt-3.5-turbo --> R1 --> Sonnet (editor) with appended instructions at each step to each respective model. The user should also have the option to tweak it at each step before passing it to Sonnet to streamline the job (this is also in the works for architect mode), however it's important to note that this is supposed to just simply be a single prompt:
Although they might look similar at first glance, looking closely at the original prompt and seeing how well both modes matched the prompt's requirements, we get the following results from a single prompt in both modes:
Final Score:
If you want to see the calculations for how I arrived at these numbers here they are:
In terms of cost, the cost will largely depend on what models you use as part of your 'team'. There's no doubt about the effectiveness and speed of the consensus mode, and for more complex cases, money and tokens are expected to be saved when compared to the architect model due to higher efficiency, control and quality.
I picked a simple example so it's easy to see the difference. The result might be a lot better if there was a more complex task to work on or more specific requirements. If enough interest is shown I might consider developing a simple prototype of this myself and testing some benchmarks. This example more serves to illustrate how this mode would work. Obviously a lot of tweaking can happen to make this better. If anyone wants more details about the exact appended prompts I used for the intermediate models to have context please let me know.
The text was updated successfully, but these errors were encountered: