The results of Mean detector are consistent #7

wrx1990 · 2024-11-07T07:03:31Z

ynthid_text_huggingface_integration.ipynb ,in part of Option 1: Mean detector,i got sample result.From the results, it cannot be distinguished whether there is a watermark or not.

Mean scores for watermarked responses:  [0.5001013  0.49999997 0.49628338 0.49961364]
Mean scores for unwatermarked responses:  [0.5001013  0.49999997 0.49628338 0.49961364]
Weighted Mean scores for watermarked responses:  [0.49701414 0.5016432  0.4982002  0.5014231 ]
Weighted Mean scores for unwatermarked responses:  [0.49701414 0.5016432  0.4982002  0.5014231 ]

When I add watermarks to the text or not, I set the same random number conditions and use different models, and the output is the same. From my understanding, I use synthid_mixin.SynthIDGemmaForCausalLM and transformers.GemmaForCausalLM. The results of the model should not be consistent

SynthIDGemmaForCausalLM

# Initialize a SynthID Text-enabled model.
model = synthid_mixin.SynthIDGemmaForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map='auto',
    torch_dtype=torch.bfloat16,
)
# Prepare your inputs in the usual way.
inputs = tokenizer(
    INPUTS,
    return_tensors='pt',
    padding=True,
).to(DEVICE)
# Generate watermarked text.
outputs = model.generate(
    **inputs,
    do_sample=True,
    max_length=1024,
    temperature=TEMPERATURE,
    top_k=TOP_K,
    top_p=TOP_P,
)

GemmaForCausalLM

# Initialize a standard tokenizer from Transformers.
tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Initialize a GemmaForCausalLM model.
model = transformers. GemmaForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map='auto',
    torch_dtype=torch.bfloat16,
)

inputs = tokenizer(
    INPUTS,
    return_tensors='pt',
    padding=True,
).to(DEVICE)
outputs = model.generate(
    **inputs,
    do_sample=True,
    max_length=1024,
    temperature=TEMPERATURE,
    top_k=TOP_K,
    top_p=TOP_P,
)

What I understand is that after adding the watermark, the output results should be slightly different, but judging from the results, the two are currently completely consistent.Can you provide a sample of the notebook's result set? Thank you.

The text was updated successfully, but these errors were encountered:

sumedhghaisas2 · 2024-11-07T12:02:47Z

@wrx1990 Couple of questions
Are you using this model

model = synthid_mixin.SynthIDGemmaForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map='auto',
    torch_dtype=torch.bfloat16,
)

as watermarked model? The watermarked model needs to set up differently which you can walk through the colab implementation.

wrx1990 · 2024-11-12T02:10:49Z

@sumedhghaisas2
first.
I run colab's code in the local environment, and the results in the Mean detector part are exactly the same. In theory, the results should be different between adding watermarks and not adding watermarks.

Mean scores for watermarked responses:  [0.5001013  0.49999997 0.49628338 0.49961364]
Mean scores for unwatermarked responses:  [0.5001013  0.49999997 0.49628338 0.49961364]
Weighted Mean scores for watermarked responses:  [0.49701414 0.5016432  0.4982002  0.5014231 ]
Weighted Mean scores for unwatermarked responses:  [0.49701414 0.5016432  0.4982002  0.5014231 ]

second,
In the Generate watermarked output section, enable_watermarking is true or false, the output results are consistent.Can't see any difference in results with watermark。

Can you tell me how to run the program and get different results on the output with and without adding watermark?

Kegard · 2024-12-06T07:16:17Z

I met the similar problem. I run the colab's code in the local, and use the same instance and settings. However, when I run the experiments on the Gemma-2b-it, the results are as fellow.

Mean scores for watermarked responses:  [0.5109244  0.5097022  0.51533335 0.52014655]
Mean scores for unwatermarked responses:  [0.4982544  0.49496853 0.511985   0.48817205]
Weighted Mean scores for watermarked responses:  [0.5145333  0.51516587 0.5331452  0.5292408 ]
Weighted Mean scores for unwatermarked responses:  [0.49958172 0.49456924 0.5091847  0.4849563 ]

It seems that I can't classify the watermarked and unwatermarked responses. Can you tell me if this is a normal result?
And how can I classify the watermarked or unwatermarked responses base on the scores?

sdathath · 2024-12-06T07:25:29Z

@Kegard the scores look as intended. The (weighted) mean scores for unwatermarked text should be close to 0.5 and for watermarked text, slightly higher than 0.5. Here, looks like weighted mean watermarked scores are between 0.514-0.533 and watermarked scores are between 0.484 - 0.509, so they're fully separated... longer lengths & higher temperatures will make the scores more separable. If you want more interpretable values, I would recommend turning them into p-values or z-scores as discussed in the paper.

Kegard · 2024-12-06T07:42:46Z

@sdathath Thanks. It seems that I can set 0.5 as the thereshold to classify the watermarked text and the unwatermarked. What's more, I find that outputs is used as tensor to compute the g_values. If I use a different tokenizer to convert the watermarked text to a tensor, will that affect the results?

sdathath · 2024-12-06T10:10:56Z

I would recommend against using 0.5 directly as the threshold since you would expect 50% of scores for unwatermarked samples to cross that threshold. Would it be possible to maybe use z-scores?

…

On Fri, Dec 6, 2024, 1:13 PM Kegard ***@***.***> wrote: @sdathath <https://github.com/sdathath> Thanks. It seems that I can set *0.5* as the thereshold to classify the watermarked text and the unwatermarked. What's more, I find that outputs is used as tensor to compute the g_values. If I use a different tokenizer to convert the watermarked text to a tensor, will that affect the results? — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BMJM3UZNZ4M7BRLJVAP6VYD2EFIQZAVCNFSM6AAAAABRKP4SSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRSGM4TCNBZHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

sdathath · 2024-12-06T16:03:17Z

"If I use a different tokenizer to convert the watermarked text to a tensor, will that affect the results?" -- with the current implementation, you would need to use the same tokenizer used to generate the text. There maybe ways around this requirement, but that would require redesigning a few components to look the contents of the tokens instead of the tokenizer ids.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results of Mean detector are consistent #7

The results of Mean detector are consistent #7

wrx1990 commented Nov 7, 2024 •

edited

Loading

sumedhghaisas2 commented Nov 7, 2024

wrx1990 commented Nov 12, 2024

Kegard commented Dec 6, 2024

sdathath commented Dec 6, 2024

Kegard commented Dec 6, 2024

sdathath commented Dec 6, 2024 via email

sdathath commented Dec 6, 2024

The results of Mean detector are consistent #7

The results of Mean detector are consistent #7

Comments

wrx1990 commented Nov 7, 2024 • edited Loading

sumedhghaisas2 commented Nov 7, 2024

wrx1990 commented Nov 12, 2024

Kegard commented Dec 6, 2024

sdathath commented Dec 6, 2024

Kegard commented Dec 6, 2024

sdathath commented Dec 6, 2024 via email

sdathath commented Dec 6, 2024

wrx1990 commented Nov 7, 2024 •

edited

Loading