Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External link not parsed #13

Closed
RobSchoenaker opened this issue Apr 22, 2020 · 7 comments
Closed

External link not parsed #13

RobSchoenaker opened this issue Apr 22, 2020 · 7 comments
Assignees
Milestone

Comments

@RobSchoenaker
Copy link

RobSchoenaker commented Apr 22, 2020

Example:
[[Bestand:Bundesarchiv Bild 146III-373, Modell der Neugestaltung Berlins ("Germania").jpg|miniatuur|260px|right| Schaalmodel van de [[Welthauptstadt Germania]], 1939]]

This is a link on this particilar page:
https://nl.wikipedia.org/wiki/Albert_Speer

With the code
var ast = LoadAndParse(fileName.Trim(' ', '\t', '"')); var text = ast.ToPlainText(NodePlainTextOptions.RemoveRefTags);

I would expect the text to read:
Schaalmodel van de Welthauptstadt Germania, 1939

I have been trying to get this sorted, but I am kind of lost in the code...

@CXuesong CXuesong self-assigned this Apr 22, 2020
@CXuesong
Copy link
Owner

I think there is currently a lack of parsing rule for images (File: namespace). I'll try adding it and hopefully make it done before end of week.

@RobSchoenaker
Copy link
Author

Would it be an idea to have an option for including the specific namespaces? These are language-dependant.

@CXuesong
Copy link
Owner

Well, that's a good point! I think I will go on with a new configuration for you to specify such namespace names.

Btw, there is a related discussion on earwig/mwparserfromhell#136 .

@CXuesong CXuesong added this to the v0.3 milestone Apr 24, 2020
@RobSchoenaker
Copy link
Author

Read the discussion. Same issue indeed. I think it would make sense to have a static language class for these situations. I can provide the Dutch version based on my findings on all WikiPedia articles.

CXuesong added a commit that referenced this issue Apr 25, 2020
@CXuesong
Copy link
Owner

The updated ETA is before end of next week 😂

CXuesong added a commit that referenced this issue Apr 26, 2020
Rename: FILE_LINK --> IMAGE_LINK
CXuesong added a commit that referenced this issue Apr 27, 2020
@CXuesong
Copy link
Owner

CXuesong commented Apr 27, 2020

Published v0.3.0-int.3.

See the following snippet for an example on how to customize namespace prefixes used as File: namespace with WikitextParserOptions. The presets are ["File", "Image"].

root = ParseAndAssert(
"[[Bestand:Bundesarchiv Bild 146III-373, Modell der Neugestaltung Berlins (\"Germania\").jpg|miniatuur|260px|right| Schaalmodel van de [[Welthauptstadt Germania]], 1939]]",
"P[![[Bestand:Bundesarchiv Bild 146III-373, Modell der Neugestaltung Berlins (\"Germania\").jpg|P[miniatuur]|P[260px]|P[right]|P[ Schaalmodel van de [[Welthauptstadt Germania]], 1939]]]]",
new WikitextParserOptions { ImageNamespaceNames = new[] { "File", "Image", "bestand" } });
Assert.Equal(" Schaalmodel van de Welthauptstadt Germania, 1939", root.ToPlainText());

Additionally, you may use CanonicalName, CustomName, and Aliases provided in WikiClientLibrary.Sites.NamespaceInfo to retrieve the valid live namespace names on a MW site, if you are using WikiClientLibrary.

using WikiClientLibrary;
using WikiClientLibrary.Client;
using WikiClientLibrary.Sites;

var client = new WikiClient();
var endpointUrl = await WikiSite.SearchApiEndpointAsync(client, "nl.wikipedia.org")
var site = new WikiSite(client, endpointUrl);
await site.Initialization;

site.Namespaces[BuiltInNamespaces.File]

image

@RobSchoenaker
Copy link
Author

This is perfect. I will complete this for the Dutch (NL) WikiPedia as I find the namespaces. Will take some time though :)

@CXuesong CXuesong closed this as completed May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants