Skip to content

Add ability to deserialize serde types from Reader #611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ndtoan96 opened this issue Jun 5, 2023 · 6 comments
Open

Add ability to deserialize serde types from Reader #611

ndtoan96 opened this issue Jun 5, 2023 · 6 comments
Labels
enhancement help wanted serde Issues related to mapping from Rust types to XML

Comments

@ndtoan96
Copy link

ndtoan96 commented Jun 5, 2023

When working with deeply nested xml, most of the time, we are only interested in a portion of the whole tree close to the leaf node. My idea is to extract the string of the target node and deserialize it with serde. But I can't find any convenient way to do that.

Currently I use read_text to get the inner content of the node and add the start and end tag manually, but then the code looks really weird, especially when the node has many attributes. It would be great if there's a method (read_node or something) to do that.

By the way, is there any reason why read_text is not implemented for Reader<File>?

@Mingun Mingun added enhancement serde Issues related to mapping from Rust types to XML labels Jun 5, 2023
@Mingun
Copy link
Collaborator

Mingun commented Jun 5, 2023

Having a deserialize method for Reader that would be able to deserialize piece of XML into a type using serde from current position is definitely a feature I also want -- as a counterpart to #610. Implementation, however, not so simple, because serde deserializer requires some (potentially unbounded) lookahead, therefore we need to buffer events somewhere.

The possible API could look something like this:

impl<'a> Reader<&'a [u8]> {
  fn deserialize<T>(&mut self, seed: Event<'a>) -> Result<T, DeError>
  where
    T: Deserialize<'a>,
  {}
}

impl<R: Read> Reader<R> {
  fn deserialize_into<'de, T>(&mut self, seed: Event<'de>, buffer: &'de mut Vec<u8>) -> Result<T, DeError>
  where
    T: Deserialize<'de>,
  {}
}

The seed here is an event that we got from Reader in typical read cycle which likely will be a part of the type that we want to deserialize.

Another possible API (very schematic):

impl<R> Reader<R> {
  fn deserializer(&mut self, seed: Event) -> FragmentDeserializer { ... }
}

struct FragmentDeserializer { ... }
impl FragmentDeserializer {
  fn deserialize<T>(self) -> Result<T, DeError>
  where
    T: Deserialize<'a>,
  {}
  fn deserialize_into<'de, T>(self, buffer: &'de mut Vec<u8>) -> Result<T, DeError>
  where
    T: Deserialize<'de>,
  {}
}

Another question, in what state we should leave Reader if deserialization fails? Or how we should provide access to an events that was consumed during lookahead, but not used to deserialize the final type? What if we want to call deserialize twice -- then we should to consider lookaheaded events from the first deserialize call. Probably we need a more generic API:

impl<R> Reader<R> {
  /// Convert to a reader that can store up to `count` events in the internal buffer
  fn lookahead(self, count: usize) -> LookaheadReader<R> { ... }
}

impl<'de, 'a, R> Deserializer<'de> for &'a mut LookaheadReader<R> { ... }

@Mingun
Copy link
Collaborator

Mingun commented Jun 5, 2023

By the way, is there any reason why read_text is not implemented for Reader<File>?

It is not trivial to do that, because we cannot just reuse read_to_end_into method -- it stores into buffer only content of the tags, but skips markup characters (<, > and so on). The attempts to implement it tracked in #483.

@Mingun Mingun changed the title Deserialize a small node Add ability to deserialize serde types from Reader Jun 5, 2023
@tstenner
Copy link

tstenner commented Aug 6, 2024

I would also like this. Go makes it easy to mix pull based parsing with a state machine and deserializing structs:

	decoder := xml.NewDecoder(r.Body)
	decoder.Strict = true
	for {
		switch se := t.(type) {
		case xml.StartElement:
			level++
			switch se.Name.Local {
			case "fooTag":
				var req schema.FooRequest
				decoder.DecodeElement(&req, &se)
				// do stuff
			case "barRequest":
				var req schema.BarRequest
				err = decoder.DecodeElement(&req, &se)
				// do stuff
                     }
		case xml.EndElement:
			level--
		}
	}
}

I could live with an implementation that ties the lifetime of the Reader and the deserialized object to the source lifetime, i.e. only applies to readers backed by a &str.

@LiosK
Copy link

LiosK commented Sep 1, 2024

By any chance is it possible to implement something like:

  • fn deserialize_to_end(&'de mut self, end: QName<'_>) -> Result<T<'de>, E>
  • fn deserialize_to_end_into(&mut self, end: QName<'_>, buf: &'de mut Vec<u8>) -> Result<T<'de>, E>

for Reader?

I would like to deserialize some specific <elem> ... </elem> ranges in a large document. To do this currently, I read events until the end tag, write them using Writer to a separate buffer, and then pass the buffer to quick_xml::de::from_str(). It's apparently not efficient because it parses XML twice and serializes it once as well. It would be great if Reader deserialized the elements when it first read the content up to the end tag.

@Mingun
Copy link
Collaborator

Mingun commented Sep 1, 2024

As I already explained, serde deserializer requires lookahead which Reader does not provide. The plan is:

  • rename Reader to RawReader. Each RawReader will handle one XML source (which is called entities in XML spec)
  • create intermediate Reader with the stack of RawReaders. That new Reader will able to handle DTD references to other entities. Because of that it will naturally have storage inside (no *_into methods anymore). That also would mean that it can store cached events
  • use that reader in Deserializer
  • provide interface to deserialize types from the Reader

@LiosK
Copy link

LiosK commented Sep 1, 2024

Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

No branches or pull requests

4 participants