Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question of a recent commit (and also reproducibility questions) #9

Open
da03 opened this issue Jun 12, 2022 · 2 comments
Open

question of a recent commit (and also reproducibility questions) #9

da03 opened this issue Jun 12, 2022 · 2 comments

Comments

@da03
Copy link

da03 commented Jun 12, 2022

Hi Rose,

The recent commit changed x_tp1=x_tp to x_tp=x_tp1 (cb3d345?diff=split?), but are your results based on this new commit or the old one? Since I also got different numbers on Wikisection compared to your paper (based on the version before the above commit), as brought up by another person in the latest post of #7. (Just to make sure, the results in the paper are based on GPT2-small which is called gpt-2 in huggingface, but not GPT2-large/xl right?)

Besides, I have a question about simulate_brownian_bridge: in this function, x_tp1 = x_t * (1- dt/(1-t)) + (dt/(1-t))*B_T + noise, but why is there a dt=0.05? According to the Brownian bridge process, shouldn't this be either x_tp1 = x_0 * (1 - t) + t * B_T + noise (if you use the older version always interpolating between x_0 and x_T), or x_tp1 = x_t * (1-1/(T-num_samples)) + 1/(T-num_samples) * B_T + noise (if you use the newer version interpolating between x_t and x_T)? And why is this noise term fixed rather than depending on t and T as in Equation 1 in the paper?

Lastly, I wonder if it's possible for you to share one setting of your model (Wikisection, TC-32) since that would address the issue of #7 as well. I can understand that it's hard to share big files, but I think google drive allows uploading big files, and you can remove optimizer.pt to make the checkpoint smaller.

Thanks,
Yuntian

@daniellaye
Copy link

Hi Rose,

Thank you for sharing the code! I also have the same question about simulate_brownian_bridge. I thought that equation (1) is used in producing a bridge during generation, so I am a bit confused about this line. Could you please explain a bit more about where dt=0.05 comes from? Thanks a lot!

@rosewang2008
Copy link
Owner

rosewang2008 commented Aug 29, 2022

Hi both,
Thank you for raising these issues! I just wanted to post a small update and let you know that I have been actively looking into the issues.

The cause of the issues is my refactoring of two repositories (one for the encoder and decoder), which has introduced bugs in replicating the results; this includes the recent issue @da03 raised in #10.

I want to make my plan transparent:

  • I plan to upload the two original repositories so that the results reported are replicable. I will update the README to this repository accordingly. I will also respond to the other issues accordingly. Time allowing (and assuming I find a good way to host the models), I will also upload the pretrained encoder and decoder so that running the discourse and long-text generation results is straightforward.
  • In the following weeks, I will rerun the long-text generation results with the decoder fix from this issue. I will do a careful root cause analysis to determine why the long text generation benefited significantly from the non-updated version of sampling.
  • After determining the causes, I will make sure to follow up on Github and update the paper accordingly with our findings.

Apologies for the inconvenience, and thank you so much for your patience!

Rose

Update: We ran into some seeding issues and discrepancies in generation that we're currently fixing. Apologies for the delay!

This was referenced Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants