fix: token suspicion adjustment #477

tazlin · 2025-01-17T14:56:59Z

Once upon a time, before batching and other optimizations, these were the speeds we considered unreasonable but new paradigms, backends and breakthroughs have made these numbers increasingly inaccurate or irrelevant.

While I do think there has to be some sort of longer term solution (such as the one addressing the problem detailed in #463), there have been virtually only false positives, and the few true positives boiled down to innocent misconfigurations.

Further, it appears that certain types of worker-reported failures can artificially inflate token count, which may be its own issue.

For the time being, I am advocating that the number is increased to 100t/s for most models, as recommended by henky, and that we respond to possible abuse of this relaxation with other, more complete and sound, measures.

Once upon a time, before batching and other optimizations, these were the speeds we considered unreasonable but new paradigms, backends and breakthroughs have made these numbers increasingly inaccurate or irrelevant. While I do think there has to be some sort of longer term (such as the problem detailed in #463), there have been virtually *only* false positives, and the few true positives boiled down to innocent misconfigurations. Further, it appears that certain terms of worker-reported failures can artificially inflate token count, which may be its own issue. For the time being, I am advocating that the number is increased to 100t/s, as recommended by henky, and that we respond to possible abuse of this relaxation with other, more complete and sound, measures.

tazlin · 2025-01-17T14:59:42Z

and just to clarify, this is to resolve the problem of new workers very consistently being flagged and shadow banned. It happens with shocking regularity and due to the nature of the sanction, goes undetected for long periods of time, reducing good compute and confusing users (who see a model as available while in reality it is not)

Further, I would invite any horde moderator to review the suspicion of any text worker: it will be absurdly high, in the hundreds at least and I have seen several examples of being in the tens-of-thousands. This is almost certainly a direct result of tripping this particular safety and really is a testament to the idea that the values are much too low.

horde/classes/kobold/processing_generation.py

tazlin added the allow-ci A PR with this label will run through CI. label Jan 17, 2025

fix: add two gradations to the text token/s thresholds

f6a7915

db0 reviewed Jan 18, 2025

View reviewed changes

horde/classes/kobold/processing_generation.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: token suspicion adjustment #477

fix: token suspicion adjustment #477

tazlin commented Jan 17, 2025 •

edited

Loading

tazlin commented Jan 17, 2025 •

edited

Loading

fix: token suspicion adjustment #477

Are you sure you want to change the base?

fix: token suspicion adjustment #477

Conversation

tazlin commented Jan 17, 2025 • edited Loading

tazlin commented Jan 17, 2025 • edited Loading

tazlin commented Jan 17, 2025 •

edited

Loading

tazlin commented Jan 17, 2025 •

edited

Loading