-
Notifications
You must be signed in to change notification settings - Fork 252
Disable trim_text
in Deserializer from_reader
#285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there a way to determine whether set trim_text by looking if there is xml:space = "preserve"?
The trailing space here should not be trimmed. |
There are very little customization on the serde deserializer so far. I don't think there is any major blocking point, someone just needs to write it. |
In the coming release Processing of |
Was this implemented already? |
Yes, this is oversight. So, currently this is still not possible, even in |
According to the original use case -- I do not think that simply disabling |
Yeah, that's exactly what happened. I didn't get why that option, even though internal, exists. |
The trimming of spaces within elements probably ought to be separated from the trimming of spaces between elements. It should be possible (and probably the default) to ignore the latter without affecting the text contents of elements themselves. Having an option for trimming spaces around text contents is nice of course, but not at all necessary (the user could easily do this themselves) and as this issue points out it is more difficult to do "correctly" than originally envisioned. Maybe we should eliminate this feature and just keep the "ignore spaces between XML elements" functionality? |
Yes, I think, we should move in that direction. A couple of thoughts:
|
Hello, little update here, since |
Just disabling Well, I think, that we could add a |
I've just created a #572. When it would be merged, we could change the content of an introduced struct Text<'a> {
/// Untrimmed text after concatenating content of all
/// [`Text`] and [`CData`] events
text: Cow<'a, str>,
/// A range into `text` which contains data after trimming
content: Range<usize>,
} Such a change will open a door to use a per-field control for trimming |
Are you still looking at fixing this? If not, what remains to be done? This is a breaking issue for my team, and we may be interested in contributing in order to help fix it. |
I did not put my efforts in this issue since my last comment. Because #572 was merged, we can move forward by the way outlined in that comment. We also can add a way to globally disable trimming, but I think such setting will have a limited usefulness. If you wish feel free to explore those opportunities. |
Is there any workaround for this problem short of a proper fix (#561 or other)? My case is mixed content such as Thanks for your work on this library! Any help appreciated. |
@mmcloughlin I started on #855 before getting sidetracked. No workaround as far as I can tell, and the fix is taking more hours than I have. |
Thanks for the update! I'd be willing to spend some time on a fix, but I'm not sure I understand exactly what the preferred resolution is? Looks like there's some nuances, and a multi-year discussion in this issue and others. Is there a description somewhere of what kind of PR would be accepted? Meanwhile, the only (horrific) workaround I am considering is a pre-processor that inserts sentinels around certain tags to prevent trimming, which can be removed after deserialization. So for example |
Trying to gather the options under discussion:
Which of these would be preferred? |
@mmcloughlin Thanks for distilling this. Personally, and despite having initially started on a different route, I like the idea in Option 1 of following the spec for string primitives if we're not to concern ourselves with backward compatibility. In #855, I started down the path of having a mutable |
Feels like a situation where even if semantic versioning allows it, we'd need to be careful about Hyrum's Law. |
Very valid point. I think we do at least have @Mingun's blessing to explore adjusting the default based on this comment in #561 (emphasis my own):
I'm no authority, but if the spec states that string primitives should preserve whitespace, and that simplifies our implementation using @Pastequee's work in b4355c6 as a base, then that approach has my vote. If compliance with the w3 spec is a stated or implied goal of quick-xml, then v0.x is our window to bring it into alignment. |
@mmcloughlin, thanks for summary. Yes, we definitely will implement option 1. @Pastequee catches the idea.
Yes, and it seems that only the I'm not sure about option 2 (global Option 3 also could be implemented to override default behavior from option 1, but maybe later if it will really be needed. At least this is purely non-breaking change which may be implemented in patch update.
Yes, that is the goal (at least I want to have a mode where we will be fully compliance) and we will break compatibility if that will be needed, because v0.x is designed for that. After that we will release v1. |
Sounds like a plan. @mmcloughlin I'm going to close my wip PR. Feel free to put me to work in whatever way is most helpful. |
Haha, I'm not sure I can put you to work! Perhaps @Mingun is the one to listen to :) According to @Mingun's last message, it seems there's consensus on trying Option 1: trimming for primitive types except strings. If you have time to work on that, that would be awesome. |
Is there a easy way to set trim_text to false in the
Deserializer::from_str
when i usequick_xml::de::from_str
?quick-xml/src/de/mod.rs
Lines 160 to 167 in a4be484
The text was updated successfully, but these errors were encountered: