Carlos Perez: strawberry takes "turns" #31

daveshap · 2024-09-15T10:53:25Z

daveshap
Sep 15, 2024
Maintainer

https://x.com/IntuitMachine/status/1835256547179413672

russellballestrini · 2024-09-15T11:46:25Z

russellballestrini
Sep 15, 2024

The reasoning tokens are discarded from context for the end user to prevent poisoning the context but the tokens that make the chain are certainly saved and used to train.

0 replies

russellballestrini · 2024-09-15T11:47:53Z

russellballestrini
Sep 15, 2024

We seem to have counter claims of multi turn or single shot.

0 replies

Neoathenian · 2024-09-17T07:31:12Z

Neoathenian
Sep 17, 2024

It doesn´t seem possible to me for it to be a single shot, how would they limit the amount of compute if so?
I feel like the innovation is probably a model that reasons better, and then the normal CoT with that model.

Regarding this post, the image looks weird because it seems unlikely to me that the model would be able to catch itself in it´s errors if it only saw the ouptut and not it´s reasoning

6 replies

Neoathenian Sep 17, 2024

If I remember correctly the training data you showed was basically including the reasoning steps in it too (so CoT).
The issue I have is that this wouldn´t allow you to say, have the LLM produce an output for 10 days straight to get a better response. (Like control if it´s 1 day or 10 days)

daveshap Sep 17, 2024
Maintainer Author

That's not necessarily true. Noam Brown said they trained it to just keep producing tokens until it gets an answer. There's no evidence that the token window is large enough to generate for "10 days straight" and we're not trying to supersede SOTA. The longest generation I've seen from o1-preview is about 5 minutes.

ElliottDyson Sep 17, 2024

That's not necessarily true. Noam Brown said they trained it to just keep producing tokens until it gets an answer. There's no evidence that the token window is large enough to generate for "10 days straight" and we're not trying to supersede SOTA. The longest generation I've seen from o1-preview is about 5 minutes.

You could also increase or decrease the likelihood for it to generate the token that shifts it out of chain-of-thought based on how long you want generations to occur. I imagine openAI must have done something like this in order to obtain their graphs on accuracy Vs compute time at inference via introducing a new token for this purpose.

russellballestrini Sep 17, 2024

You would maybe want to signal it to finish its thought and end properly so the tags can be closed and output parsed (segregated) nicely from the solution or answer.

ElliottDyson Sep 18, 2024

You would maybe want to signal it to finish its thought and end properly so the tags can be closed and output parsed (segregated) nicely from the solution or answer.

Good idea, like a second token that we might choose to introduce that indicates to the model to finish its thoughts soon and end them with an COT-end token

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carlos Perez: strawberry takes "turns" #31

{{title}}

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Carlos Perez: strawberry takes "turns" #31

daveshap Sep 15, 2024 Maintainer

Replies: 3 comments · 6 replies

russellballestrini Sep 15, 2024

russellballestrini Sep 15, 2024

Neoathenian Sep 17, 2024

Neoathenian Sep 17, 2024

daveshap Sep 17, 2024 Maintainer Author

ElliottDyson Sep 17, 2024

russellballestrini Sep 17, 2024

ElliottDyson Sep 18, 2024

daveshap
Sep 15, 2024
Maintainer

Replies: 3 comments 6 replies

russellballestrini
Sep 15, 2024

russellballestrini
Sep 15, 2024

Neoathenian
Sep 17, 2024

daveshap Sep 17, 2024
Maintainer Author