Skip to content

Commit b0aec23

Browse files
Xarvalusqamilnowak
authored andcommitted
Do not match second slash and dot in DOI (#1)
They are the only reserved characters, according to https://www.doi.org/doi_handbook/2_Numbering.html#2.5 Cf. mediawiki-utilities#7
1 parent a714bbd commit b0aec23

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Diff for: mwcites/extractors/doi.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
TAGS_RE = re.compile(r'<(/\s*)?(' + '|'.join(HTML_TAGS) + ')(\s[^>\n\r]+)?>', re.I)
1515

1616
'''
17-
DOI_RE = re.compile(r'\b(10\.\d+/[^\s\|\]\}\?\,]+)')
17+
DOI_RE = re.compile(r'\b(10\.\d+/[^\./\s\|\]\}\?\,]+)')
1818
1919
def extract_regex(text):
2020
for match in DOI_RE.finditer(text):

0 commit comments

Comments
 (0)