Skip to content

SubString serialization #24266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bkamins opened this issue Oct 22, 2017 · 3 comments · Fixed by #24275
Closed

SubString serialization #24266

bkamins opened this issue Oct 22, 2017 · 3 comments · Fixed by #24275
Labels
strings "Strings!"

Comments

@bkamins
Copy link
Member

bkamins commented Oct 22, 2017

Method serialize(s::AbstractSerializer, ss::SubString{T}) seems to lead to an infinite loop of calls in with Test.GenericString. I have fixed Test.GenericString to work around this problem in #24255.

However, in general other <:AbstractString types might lead to a similar problem.
The intention of the method seems to be guaranteed to be achieved only for String as in general we do not know what convert(T,ss) will do (GenericString was wrapping SubString instead of performing trimming).

I can see the following solution:

  • redefine it to serialize(s::AbstractSerializer, ss::SubString{String}) which is probably the typical use case and should work as intended;

Other natural methods like:

  • change convert(T,ss) to ss.string[start(ss.string)+offset:endof(ss.string)+offset], but this touches internal implementation of SubString so I am not sure if this is desired;
  • define a new method convert(T,Substring{T}) where {T<:AbstractString} = ss.string[start(ss.string)+offset:endof(ss.string)+offset] in strings/types.jl.

will not give benefit as in general getindex for AbstractString uses SubString to make slices.

@nalimilan
Copy link
Member

It would make sense in general to provide a method to convert a SubString to a standard AbstractString, making a copy to allow freeing the parent. For SubArray, this is what copy does. We could define copy on strings, with a no-op for String and a fallback converting to String for AbstractString type which do not implement it. This is similar to your convert suggestion, and indeed the two could be combined.

@bkamins
Copy link
Member Author

bkamins commented Oct 22, 2017

This could work, but then deserialize(serialize(s)) could change type of s.
The more I think about it the more I prefer the restriction serialize(s::AbstractSerializer, ss::SubString{String}).

In general String and SubString{String} are the only string types to be used in Base.
All other entries of subtype{AbstractString) are phasing out from Base.
I understand that only exceptions are:

  • GenericString, but it has a special function - exactly to check what happens with non-standard strings;
  • SubstitutionString, where a probability of SubString{SubstitutionString{T}} in actual code is minimal so I would not squeeze the serialization in this case.

If in the future someone introduces a new type of string then I would leave it for then to define an efficient serialization method if it would be required.

@nalimilan
Copy link
Member

This could work, but then deserialize(serialize(s)) could change type of s.

Ah, right.

It would be nice to provide automatically an efficient serialization method for custom AbstractString types by default, but as you note that doesn't seem to be possible without requiring types to provide a convert(::Type{T}, ::SubString{T})::T method. Let's just restrict the signature to SubString{String} for now then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants