-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROUGE-N added #69
Open
astariul
wants to merge
4
commits into
Maluuba:master
Choose a base branch
from
astariul:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
ROUGE-N added #69
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
# Author : Ramakrishna Vedantam <[email protected]> | ||
|
||
import numpy as np | ||
import itertools | ||
import pdb | ||
|
||
def my_lcs(string, sub): | ||
|
@@ -33,19 +34,97 @@ def my_lcs(string, sub): | |
|
||
return lengths[len(string)][len(sub)] | ||
|
||
def _get_ngrams(n, text): | ||
"""TAKEN FROM https://github.com/pltrdy/seq2seq/blob/master/seq2seq/metrics/rouge.py | ||
|
||
Calcualtes n-grams. | ||
Args: | ||
n: which n-grams to calculate | ||
text: An array of tokens | ||
Returns: | ||
A set of n-grams | ||
""" | ||
ngram_set = set() | ||
text_length = len(text) | ||
max_index_ngram_start = text_length - n | ||
for i in range(max_index_ngram_start + 1): | ||
ngram_set.add(tuple(text[i:i + n])) | ||
return ngram_set | ||
|
||
def _split_into_words(sentences): | ||
"""TAKEN FROM https://github.com/pltrdy/seq2seq/blob/master/seq2seq/metrics/rouge.py | ||
|
||
Splits multiple sentences into words and flattens the result""" | ||
return list(itertools.chain(*[_.split(" ") for _ in sentences])) | ||
|
||
def _get_word_ngrams(n, sentences): | ||
"""TAKEN FROM https://github.com/pltrdy/seq2seq/blob/master/seq2seq/metrics/rouge.py | ||
|
||
Calculates word n-grams for multiple sentences. | ||
""" | ||
assert len(sentences) > 0 | ||
assert n > 0 | ||
|
||
words = _split_into_words(sentences) | ||
return _get_ngrams(n, words) | ||
|
||
def rouge_n(evaluated_sentences, reference_sentences, n=2): | ||
""" TAKEN FROM https://github.com/pltrdy/seq2seq/blob/master/seq2seq/metrics/rouge.py | ||
|
||
Computes ROUGE-N of two text collections of sentences. | ||
Sourece: http://research.microsoft.com/en-us/um/people/cyl/download/ | ||
papers/rouge-working-note-v1.3.1.pdf | ||
Args: | ||
evaluated_sentences: The sentences that have been picked by the summarizer | ||
reference_sentences: The sentences from the referene set | ||
n: Size of ngram. Defaults to 2. | ||
Returns: | ||
A tuple (f1, precision, recall) for ROUGE-N | ||
Raises: | ||
ValueError: raises exception if a param has len <= 0 | ||
""" | ||
if len(evaluated_sentences) <= 0 or len(reference_sentences) <= 0: | ||
raise ValueError("Collections must contain at least 1 sentence.") | ||
|
||
evaluated_ngrams = _get_word_ngrams(n, evaluated_sentences) | ||
reference_ngrams = _get_word_ngrams(n, reference_sentences) | ||
reference_count = len(reference_ngrams) | ||
evaluated_count = len(evaluated_ngrams) | ||
|
||
# Gets the overlapping ngrams between evaluated and reference | ||
overlapping_ngrams = evaluated_ngrams.intersection(reference_ngrams) | ||
overlapping_count = len(overlapping_ngrams) | ||
|
||
# Handle edge case. This isn't mathematically correct, but it's good enough | ||
if evaluated_count == 0: | ||
precision = 0.0 | ||
else: | ||
precision = overlapping_count / evaluated_count | ||
|
||
if reference_count == 0: | ||
recall = 0.0 | ||
else: | ||
recall = overlapping_count / reference_count | ||
|
||
f1_score = 2.0 * ((precision * recall) / (precision + recall + 1e-8)) | ||
|
||
# return overlapping_count / reference_count | ||
return f1_score, precision, recall | ||
|
||
class Rouge(): | ||
''' | ||
Class for computing ROUGE-L score for a set of candidate sentences for the MS COCO test set | ||
|
||
''' | ||
def __init__(self): | ||
def __init__(self, n=2): | ||
# vrama91: updated the value below based on discussion with Hovey | ||
self.beta = 1.2 | ||
self._n = n | ||
|
||
def calc_score(self, candidate, refs): | ||
""" | ||
Compute ROUGE-L score given one candidate and references for an image | ||
:param candidate: str : candidate sentence to be evaluated | ||
:param candidate: list of str : candidate sentence to be evaluated | ||
:param refs: list of str : COCO reference sentences for the particular image to be evaluated | ||
:returns score: int (ROUGE-L score for the candidate evaluated against references) | ||
""" | ||
|
@@ -54,6 +133,12 @@ def calc_score(self, candidate, refs): | |
prec = [] | ||
rec = [] | ||
|
||
# Compute ROUGE-n scores | ||
rouge_n_scores = [] | ||
for n in range(1, self._n + 1): | ||
f_score, _, _ = rouge_n(candidate, refs, n) | ||
rouge_n_scores.append(f_score) | ||
|
||
# split into tokens | ||
token_c = candidate[0].split(" ") | ||
|
||
|
@@ -72,7 +157,7 @@ def calc_score(self, candidate, refs): | |
score = ((1 + self.beta**2)*prec_max*rec_max)/float(rec_max + self.beta**2*prec_max) | ||
else: | ||
score = 0.0 | ||
return score | ||
return rouge_n_scores + [score] | ||
|
||
def compute_score(self, gts, res): | ||
""" | ||
|
@@ -98,8 +183,12 @@ def compute_score(self, gts, res): | |
assert(type(ref) is list) | ||
assert(len(ref) > 0) | ||
|
||
average_score = np.mean(np.array(score)) | ||
return average_score, np.array(score) | ||
score_type = [] | ||
for s_idx, s_type in enumerate(score[0]): | ||
score_type.append([s[s_idx] for s in score]) | ||
|
||
average_score = [np.mean(np.array(s)) for s in score_type] | ||
return average_score, [np.array(s) for s in score_type] | ||
|
||
def method(self): | ||
return "Rouge" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Would you add some tests for the value of the ROUGE metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I will not :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other repo's code seems to be Apache-licensed, I'm not sure if we can merge it, particularly without including their license. I'm not too worried about slightly different values as long as we're clear about the methods used in the docs. Are you aware where differences might come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could at least test that the values are within some reasonable bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried comparing the results with the results using the rouge package. But scores are different (and not only a few digits...). Not only ROUGE-N are different, but also the existing ROUGE-L.
This package is also Apache-licensed. Not sure if we can just use it without including their license (without modifying their code).