You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases we might need to extract not just identifiers but also the rest of the metadata contained in {{cite}} templates. In this case, the task looks less trivial (author lists can be input in many different ways, for instance). For this reason, I have wrapped the Lua code that parses citations on wikipedia in a Python lib, and the result is here: https://github.com/dissemin/wikiciteparser
Any comments / contributions / anything welcome!
The text was updated successfully, but these errors were encountered:
@halfak: Just in case you are still interested in evaluating what proportion of citations do not have any identifier, I have run my citation parser on a fresh dump of the English Wikipedia.
Of course, this parser covers much more than just scholarly citations (it parses {{cite web}} for instance). It also misses a lot of citations that your method catches (all unformatted citations with an identifier matching your regular expressions). So the scope is quite different.
Hi,
In some cases we might need to extract not just identifiers but also the rest of the metadata contained in {{cite}} templates. In this case, the task looks less trivial (author lists can be input in many different ways, for instance). For this reason, I have wrapped the Lua code that parses citations on wikipedia in a Python lib, and the result is here:
https://github.com/dissemin/wikiciteparser
Any comments / contributions / anything welcome!
The text was updated successfully, but these errors were encountered: