-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The results of Mean detector are consistent #7
Comments
@wrx1990 Couple of questions
as watermarked model? The watermarked model needs to set up differently which you can walk through the colab implementation. |
@sumedhghaisas2
second, Can you tell me how to run the program and get different results on the output with and without adding watermark? |
I met the similar problem. I run the colab's code in the local, and use the same instance and settings. However, when I run the experiments on the Gemma-2b-it, the results are as fellow.
It seems that I can't classify the watermarked and unwatermarked responses. Can you tell me if this is a normal result? |
@Kegard the scores look as intended. The (weighted) mean scores for unwatermarked text should be close to 0.5 and for watermarked text, slightly higher than 0.5. Here, looks like weighted mean watermarked scores are between 0.514-0.533 and watermarked scores are between 0.484 - 0.509, so they're fully separated... longer lengths & higher temperatures will make the scores more separable. If you want more interpretable values, I would recommend turning them into p-values or z-scores as discussed in the paper. |
@sdathath Thanks. It seems that I can set 0.5 as the thereshold to classify the watermarked text and the unwatermarked. What's more, I find that outputs is used as |
I would recommend against using 0.5 directly as the threshold since you
would expect 50% of scores for unwatermarked samples to cross that
threshold. Would it be possible to maybe use z-scores?
…On Fri, Dec 6, 2024, 1:13 PM Kegard ***@***.***> wrote:
@sdathath <https://github.com/sdathath> Thanks. It seems that I can set
*0.5* as the thereshold to classify the watermarked text and the
unwatermarked. What's more, I find that outputs is used as tensor to
compute the g_values. If I use a different tokenizer to convert the
watermarked text to a tensor, will that affect the results?
—
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BMJM3UZNZ4M7BRLJVAP6VYD2EFIQZAVCNFSM6AAAAABRKP4SSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRSGM4TCNBZHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
"If I use a different tokenizer to convert the watermarked text to a tensor, will that affect the results?" -- with the current implementation, you would need to use the same tokenizer used to generate the text. There maybe ways around this requirement, but that would require redesigning a few components to look the contents of the tokens instead of the tokenizer ids. |
ynthid_text_huggingface_integration.ipynb
,in part ofOption 1: Mean detector
,i got sample result.From the results, it cannot be distinguished whether there is a watermark or not.When I add watermarks to the text or not, I set the same random number conditions and use different models, and the output is the same. From my understanding, I use synthid_mixin.SynthIDGemmaForCausalLM and transformers.GemmaForCausalLM. The results of the model should not be consistent
SynthIDGemmaForCausalLM
GemmaForCausalLM
What I understand is that after adding the watermark, the output results should be slightly different, but judging from the results, the two are currently completely consistent.Can you provide a sample of the notebook's result set? Thank you.
The text was updated successfully, but these errors were encountered: