Skip to content

Commit 618032d

Browse files
authored
Do not match second slash and dot in DOI
They are the only reserved characters, according to https://www.doi.org/doi_handbook/2_Numbering.html#2.5 Cf. mediawiki-utilities#7
1 parent 1c4bdb4 commit 618032d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

mwcites/extractors/doi.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
TAGS_RE = re.compile(r'<(/\s*)?(' + '|'.join(HTML_TAGS) + ')(\s[^>\n\r]+)?>', re.I)
1515

1616
'''
17-
DOI_RE = re.compile(r'\b(10\.\d+/[^\s\|\]\}\?\,]+)')
17+
DOI_RE = re.compile(r'\b(10\.\d+/[^\./\s\|\]\}\?\,]+)')
1818
1919
def extract_regex(text):
2020
for match in DOI_RE.finditer(text):

0 commit comments

Comments
 (0)