-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question of a recent commit (and also reproducibility questions) #9
Comments
Hi Rose, Thank you for sharing the code! I also have the same question about simulate_brownian_bridge. I thought that equation (1) is used in producing a bridge during generation, so I am a bit confused about this line. Could you please explain a bit more about where dt=0.05 comes from? Thanks a lot! |
Hi both, The cause of the issues is my refactoring of two repositories (one for the encoder and decoder), which has introduced bugs in replicating the results; this includes the recent issue @da03 raised in #10. I want to make my plan transparent:
Apologies for the inconvenience, and thank you so much for your patience! Rose Update: We ran into some seeding issues and discrepancies in generation that we're currently fixing. Apologies for the delay! |
Hi Rose,
The recent commit changed x_tp1=x_tp to x_tp=x_tp1 (cb3d345?diff=split?), but are your results based on this new commit or the old one? Since I also got different numbers on Wikisection compared to your paper (based on the version before the above commit), as brought up by another person in the latest post of #7. (Just to make sure, the results in the paper are based on GPT2-small which is called gpt-2 in huggingface, but not GPT2-large/xl right?)
Besides, I have a question about simulate_brownian_bridge: in this function,
x_tp1 = x_t * (1- dt/(1-t)) + (dt/(1-t))*B_T + noise
, but why is there a dt=0.05? According to the Brownian bridge process, shouldn't this be eitherx_tp1 = x_0 * (1 - t) + t * B_T + noise
(if you use the older version always interpolating between x_0 and x_T), orx_tp1 = x_t * (1-1/(T-num_samples)) + 1/(T-num_samples) * B_T + noise
(if you use the newer version interpolating between x_t and x_T)? And why is this noise term fixed rather than depending on t and T as in Equation 1 in the paper?Lastly, I wonder if it's possible for you to share one setting of your model (Wikisection, TC-32) since that would address the issue of #7 as well. I can understand that it's hard to share big files, but I think google drive allows uploading big files, and you can remove optimizer.pt to make the checkpoint smaller.
Thanks,
Yuntian
The text was updated successfully, but these errors were encountered: