Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: new 'consensus' mode for multi-model consultation #3098

Open
gitkenan opened this issue Feb 1, 2025 · 2 comments
Open

Feature request: new 'consensus' mode for multi-model consultation #3098

gitkenan opened this issue Feb 1, 2025 · 2 comments

Comments

@gitkenan
Copy link

gitkenan commented Feb 1, 2025

Imagine if software was actually designed through a lead architect giving his idea to a single software engineer. A lot of time is surely going to be wasted bouncing back and forth between these two people and biases would amplify.

Having a team of engineers listen to the architect’s ideas can bring fresh perspectives. Then, having the architect review the discussion at the end helps refine the final instructions. This approach leads to more realistic, polished results that meet the client’s needs while also reducing the workload on both the lead architect and engineers.

The "consensus" mode would in essence extend "architect" mode by allowing more than one model to be consulted as intermediates before we get to the editor-model. Each model would critique the previous responses, refining the approach while staying within scope. A lead architect model would then consolidate everything into a structured prompt before passing it to the editor model.

Image

To illustrate this, I tested the concept manually by improving the UI for my midnight calculator project, which currently has a very basic design as you can see:

Image

Let's suppose I want to give the following prompt to an AI model:

This site needs a series of UI improvements. Firstly, let's make it mor
e visually appealing by changing the '- If Isha is at:' format for the Dawud
's night prayer section. Secondly, the section under 'How do you know?' is i
nconsistent in its styling with the weird blue bar on the left and the font,
 italics etc styling and it being centred. The general feeling of the websit
e is very dull when it should be a lot more sleek. Include an interesting su
btitle for the website and make the 'Islamic Midnight' and 'Last Third of th
e Night' parts more visually appealing by shortening them.

I know it's generally a bad idea to suggest multiple fixes at once with AI, however I do believe that AI is more than capable now to handle such a demand if it's handled right. For the normal architect mode (r1 and sonnet) we got the following result:

Image

I then reverted this change back to the first screenshot above and tested the consensus mode manually by having multiple models refine the initial prompt in a 'discussion' before passing the final version to the editor. Each model was basically asked to review and add to the previous suggestions while staying strictly within scope and focusing on the client's requirements.

In the last prompt, we gave the whole conversation to a reasoning model along with some directions and a reminder of the original prompt.

Here's the result for the consensus mode, which was basically the same single prompt--> R1 --> 4o --> gpt-3.5-turbo --> R1 --> Sonnet (editor) with appended instructions at each step to each respective model. The user should also have the option to tweak it at each step before passing it to Sonnet to streamline the job (this is also in the works for architect mode), however it's important to note that this is supposed to just simply be a single prompt:

Image

Although they might look similar at first glance, looking closely at the original prompt and seeing how well both modes matched the prompt's requirements, we get the following results from a single prompt in both modes:

Architect Mode Consensus Mode Requirement
✔️ ✔️ Removing dashes
✔️ Removing blue bar from "How do you know" section
✔️ Making the text in the last section un-centred
35% 35% Making the website more sleek in feeling
✔️ ✔️ Adding an interesting subtitle
✔️ Shorten and make midnight results more appealing

Final Score:

  • Architect Mode: 56%
  • Consensus Mode: 73%

If you want to see the calculations for how I arrived at these numbers here they are:

For the Architect:

    100 + 0 + 0 + 35 + 100 + 100 = 335
    Dividing by 6 tasks gives an average of about 55.8%.

For the Consensus:

    100 + 100 + 100 + 35 + 100 + 0 = 435
    Dividing by 6 tasks gives an average of about 72.5%.

In terms of cost, the cost will largely depend on what models you use as part of your 'team'. There's no doubt about the effectiveness and speed of the consensus mode, and for more complex cases, money and tokens are expected to be saved when compared to the architect model due to higher efficiency, control and quality.

I picked a simple example so it's easy to see the difference. The result might be a lot better if there was a more complex task to work on or more specific requirements. If enough interest is shown I might consider developing a simple prototype of this myself and testing some benchmarks. This example more serves to illustrate how this mode would work. Obviously a lot of tweaking can happen to make this better. If anyone wants more details about the exact appended prompts I used for the intermediate models to have context please let me know.

@gitkenan gitkenan changed the title New /chat-mode 'roundtable' for multi-model discussion New mode 'roundtable' for multi-model discussion Feb 1, 2025
@gitkenan gitkenan changed the title New mode 'roundtable' for multi-model discussion New mode 'concensus' for multi-model discussion Feb 1, 2025
@gitkenan gitkenan changed the title New mode 'concensus' for multi-model discussion New mode 'consensus' for multi-model discussion Feb 1, 2025
@gitkenan gitkenan changed the title New mode 'consensus' for multi-model discussion Feature request: new 'consensus' mode for multi-model consultation Feb 1, 2025
@gembancud
Copy link

Hey! good to see a like-minded individual. I know there are tons more models and its quite a big miss to not see any one of the ai assistant move to having them collaborate together. like why choose just one over sonnet, r1, o3, etc, when you can have all of them? :) anyhoo, ive had this pr #2628 since december which does some sort of the roundtable discussion. I'd appreciate a look over it :)

@gitkenan
Copy link
Author

gitkenan commented Feb 1, 2025

Hey! good to see a like-minded individual. I know there are tons more models and its quite a big miss to not see any one of the ai assistant move to having them collaborate together. like why choose just one over sonnet, r1, o3, etc, when you can have all of them? :) anyhoo, ive had this pr #2628 since december which does some sort of the roundtable discussion. I'd appreciate a look over it :)

Hey, thanks for the comment! Looks like you beat me to the general idea - as I commented on your PR, I won't repeat myself here, but I'll just say that our approaches differ in the sense that mine is really just an enhancement of the architect mode plus my new proposed 'tweak' ability. I've tried to keep the changes to the original mode fairly minimal whilst maximising impact. Maybe we can work together on this if it gains traction and you're interested, as I see you're already a believer in the importance of this idea. Thanks for getting in touch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants