dumb question but "why can't we just" 🙈 #70

mindplay-dk · 2024-11-22T10:20:01Z

mindplay-dk
Nov 22, 2024

This suggestion is way too obvious to work, but humor me:

Why can't we just take the input/output from an existing CoT or ToT agent program and fine-tune on that?

Literally just run queries through the most successful CoT/ToT agent and train on their output?

We already have programs that produce the kind of reasoning workflow we'd like the LLM to learn, right?

Obviously someone has thought of this and it won't work, I'm just curious to learn why. 😌

thehunmonkgroup · 2024-11-22T17:24:26Z

thehunmonkgroup
Nov 22, 2024
Collaborator

Why can't we just take the input/output from an existing CoT or ToT agent program and fine-tune on that?

Defeats the purpose of the project, which is to demonstrate that the o1 reasoning advance is replicable. Swiping their CoTs will not accomplish that goal.

4 replies

mindplay-dk Nov 23, 2024
Author

No, I'm no suggesting you swipe O1's CoT - I'm suggesting you could take any existing open source implementation of a CoT or ToT agent (backed by any chat model with an appropriate license) and modify it slightly to capture and annotate it's internal input/output and train on that.

MizuleGPT Nov 30, 2024

No, I'm no suggesting you swipe O1's CoT - I'm suggesting you could take any existing open source implementation of a CoT or ToT agent (backed by any chat model with an appropriate license) and modify it slightly to capture and annotate it's internal input/output and train on that.

ToT requires code for proper execution so I assume what you mean is we're combining every intermediate generation and the final answer into what will be treated as the full answer during finetuning.

this is an idea I had ages ago, even before Q* was first referenced, in that time I figured "Graph of Thought" is a much better option and the inclusion of "Self Discover" could further diversify thought and improve the final output.

I truly hope david comes back to read this

mindplay-dk Dec 1, 2024
Author

I assume what you mean is we're combining every intermediate generation and the final answer into what will be treated as the full answer during finetuning.

That is what I'm trying to say, yes, thank you. 🙂

mindplay-dk Dec 7, 2024
Author

Someone mentioned this elsewhere:

https://github.com/DataBassGit/o7

"Agent framework for generating a synthetic dataset" 👀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dumb question but "why can't we just" 🙈 #70

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

dumb question but "why can't we just" 🙈 #70

mindplay-dk Nov 22, 2024

Replies: 1 comment · 4 replies

thehunmonkgroup Nov 22, 2024 Collaborator

mindplay-dk Nov 23, 2024 Author

MizuleGPT Nov 30, 2024

mindplay-dk Dec 1, 2024 Author

mindplay-dk Dec 7, 2024 Author

mindplay-dk
Nov 22, 2024

Replies: 1 comment 4 replies

thehunmonkgroup
Nov 22, 2024
Collaborator

mindplay-dk Nov 23, 2024
Author

mindplay-dk Dec 1, 2024
Author

mindplay-dk Dec 7, 2024
Author