Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple zero-shot eval function: website description #88

Open
norakassner opened this issue Dec 13, 2021 · 1 comment
Open

simple zero-shot eval function: website description #88

norakassner opened this issue Dec 13, 2021 · 1 comment
Assignees

Comments

@norakassner
Copy link
Collaborator

norakassner commented Dec 13, 2021

ppl on website specific testset. Contact @cccntu and Christopher

@shanyas10
Copy link
Collaborator

	
def calc_ppl(sentence):
    tokenize_input = tokenizer.encode(sentence)  
    tensor_input = torch.tensor([tokenize_input])
    loss=model(tensor_input, labels=tensor_input)[0]
    return np.exp(loss.detach().numpy())

def eval_website_desc(orig_text, orig_website_desc, website_desc_list):
	"""
	Our goal is to evaluate the ppl of orig_website_desc ||| orig_text > website_desc ||| orig_txt for website_desc in website_desc_list

	Example: 
    orig_text = "George Orwell, was an English novelist, essayist, journalist and critic."
    orig_website = "Wikipedia is a multilingual encyclopedia"
    website_desc_list = <list of desc from different websites>
	"""
    orig_concat = f"{orig_website_desc} ||| {orig_text}"
    orig_ppl = calc_ppl(orig_concat)
    for website_desc in website_desc_list:
        if calc_ppl(f"{website_desc} ||| {orig_text}") < orig_ppl:
            return False
    
    return True

Colab to test this: https://colab.research.google.com/drive/15ap5LvMuX_kIZTDpai1bCbCcWlF2FgWD#scrollTo=B812bcUG6KWW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Open
Development

No branches or pull requests

2 participants