Skip to content

Conversation

cdornan
Copy link
Contributor

@cdornan cdornan commented Dec 12, 2022

I am restoring #546 to see if it can be rescued as it seemed to be a good idea.

Rendered.

@cdornan cdornan self-assigned this Dec 12, 2022
@cdornan cdornan added the Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion label Dec 12, 2022
@cdornan cdornan changed the title restore proposal #546 for consideration Make Symbol a newtype over String Dec 12, 2022
@Ericson2314
Copy link
Contributor

With @adamgundry's suggestions, especially the notice on ghc-prim, this looks good to me.

@re-xyr
Copy link

re-xyr commented Dec 13, 2022

This all look pretty good, but I just have a tangentially related question: what are the performance implications of having type Symbol = String instead, specifically? In the last PR that was closed I saw mentions of Generic but I wasn't sure what that was exactly about.

@adamgundry
Copy link
Contributor

With @adamgundry's suggestions, especially the notice on ghc-prim, this looks good to me.

The credit here really goes to @phadej, I'm just interpreting. 😄

what are the performance implications of having type Symbol = String instead, specifically? In the last PR that was closed I saw mentions of Generic but I wasn't sure what that was exactly about.

At the moment, GHC represents type-level Symbols as packed strings. Naively changing the representation to be [Char] would significantly expand the AST size of the types being manipulated internally, which would hurt compile-time performance. In practice I think it would be possible to retain a packed representation, but then the constraint solver would need to be more complex (e.g. to solve an equality between a packed symbol and a cons). The added complexity doesn't seem worth it, compared to the alternative of having Symbol be a properly abstract type.

@cdornan cdornan added Pending committee review The committee needs to evaluate the proposal and make a decision and removed Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion labels Jan 19, 2023
@simonpj
Copy link
Contributor

simonpj commented Jan 19, 2023

Chris says

Oleg has proposed that we do essentially the same trick Richard proposed to unify Nat and Natural, but with Symbol and String.

Here is the (accepted) Nat/Natural proposal

But in fact under these proposal we'd ahve

type Nat = Natural
newtype Symbol = MkSymbol String

Question: why do we need a newtype. What goes wrong if we have type Symbol = String, which would line it up with the accepted Nat/Natural proposal.

Or should we have newtype Nat = MkNat Natural?

@cdornan
Copy link
Contributor Author

cdornan commented Jan 19, 2023

@simonpj yes, we do have options — see the earlier discussion at #546

TBH I was hoping not to resurrect the discussion that cause the proosal to get bogged down the first time, but maybe we have no choice.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 19, 2023

If GHC actually does [Char] for Symbol, things will get a lot slower, so we probably don't want it to do that, and have to explicitly convert between the two.

And frankly this is not so bad because at the term level we also don't want to promote the use of String either.

I suspect at some point we we will settle on Symbol = Text, and the newtype plan keeps that future open. (Making it not apart from other types --- a real abstract type, would also help.)

@cdornan
Copy link
Contributor Author

cdornan commented Jan 19, 2023

In the hope of not burning up too much precious attention, can I suggest that folks focus on destroying this proposal as is rather than an extended discussion of all of our options and their various merits and demerits. I know that isn't what we usually do but I think it might be the best here.

@simonpj
Copy link
Contributor

simonpj commented Jan 19, 2023

I don't want to re-open the discssion. I just want a reason for the difference. John has given one. OK, I'm in support.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jan 19, 2023

Yes I was just trying to summarize it. And perhaps the apartness part (which admittedly did not come up before) is worth adding to this proposal. If making it apart from all types is too inconvenient for users, it could just be apart from String and Text.

Copy link
Contributor

@aspiwack aspiwack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see how the proposed change addresses the motivations. The Effects and interactions section doesn't address this point either. Could you elaborate?

Comment on lines +49 to +50
(Note: ``singletons-base`` demotes ``Symbol`` to ``Text`` which is
incorrect, as ``Text`` cannot represent all ``String`` values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it? Do you have a reference for this claim? I'm infinitely curious now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the haddock of text:

A Text value is a sequence of Unicode scalar values, as defined in §3.9, definition D76 of the Unicode 5.2 standard. As such, a Text cannot contain values in the range U+D800 to U+DFFF inclusive. Haskell implementations admit all Unicode code points (§3.4, definition D10) as Char values, including code points from this invalid range. This means that there are some Char values (corresponding to Surrogate category) that are not valid Unicode scalar values, and the functions in this module must handle those cases.

Within this module, many functions construct a Text from one or more Char values. Those functions will substitute Char values that are not valid Unicode scalar values with the replacement character "�" (U+FFFD). Functions that perform this inspection and replacement are documented with the phrase "Performs replacement on invalid scalar values". The functions replace invalid scalar values, instead of dropping them, as a security measure. For details, see Unicode Technical Report 36, §3.5.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

That being said, is it relevant here? I don't think we can create Symbol values with surrogate code points in them, can we?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can?

ghci> :set -XDataKinds
ghci> type A = "\xd800"
ghci> :i A
type A :: GHC.Types.Symbol
type A = "\55296" :: GHC.Types.Symbol

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tangentially related idea, but we can make type Symbol = Text if we do replacement on Symbol literals. Text is uninhabited at type level so that isn't a problem either. Indeed this is breaking, so it's not very desirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arfglbl. Indeed!

Though while Symbol = Text is technically breaking, it is probably not in a way that is depended upon in practice. So probably alright? Possibly maybe? But Text is not in base (yet?) so it's actually a pretty hard change to effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Text is not in base (yet?)

It shouldn't be hard to move the type declaration to base (GHC.Types.Text maybe?) but keep all the combinators for working on it in text. Just move these few lines of code:

data Text = Text
    {-# UNPACK #-} !Array -- bytearray encoded as UTF-8
    {-# UNPACK #-} !Int     -- offset in bytes (not in Char!), pointing to a start of UTF-8 sequence
    {-# UNPACK #-} !Int     -- length in bytes (not in Char!), pointing to an end of UTF-8 sequence

data Array = ByteArray ByteArray#

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't even need to be base; ghc-prim is enough. text also depends on ghc-prim already, so all of this can be done without CLC.

@re-xyr
Copy link

re-xyr commented Jan 20, 2023

I fail to see how the proposed change addresses the motivations. The Effects and interactions section doesn't address this point either.

I think this proposal just prepares for a future CLC proposal where we add IsString Symbol. Then we can use the same syntax for both term and type level Symbols. Probably should put that in the proposal though, otherwise it's a bit unclear.

@adamgundry
Copy link
Contributor

Yes, this proposal addresses the motivation by allowing Symbol to be used sensibly at both the type and term levels. It requires a GHC-internal module import to use at the term level, but that's because specifying the term-level API for Symbol is a matter for the CLC (and the proposal says as much).

@cdornan
Copy link
Contributor Author

cdornan commented Jan 20, 2023

@re-xyr agreed; i am going to be afk until later — like to have a stab at clarifying; your input not the draft would be great; @Ericson2314 also, of course

@goldfirere
Copy link
Contributor

My concern here echoes @aspiwack's: the proposal does not seem to address the motivation. But, I do see how, if we had IsString Symbol and -XOverloadedStrings, then things would indeed be better. I would want a loud Haddock comment on the new Symbol definition that it is subject to change. (Exporting it only from ghc-prim is not enough, as I'm sure type-lit libraries will re-export!) With that caveat, I'm in support, in the hopes that we can do even better further in the future.

@cdornan
Copy link
Contributor Author

cdornan commented Mar 3, 2023

@goldfirere I think that reflects the sum of the discussion on the proposal — it is really initiating a process. I will add some language accordingly.

@cdornan cdornan added Needs revision The proposal needs changes in response to shepherd or committee review feedback and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Mar 3, 2023
@adamgundry
Copy link
Contributor

In the light of the new consensus at haskellfoundation/tech-proposals#51, I think we should perhaps close this proposal as out-of-scope for the GHC Steering Committee? As a change to GHC internal implementation details it almost doesn't require a proposal at all. However, it will necessarily make a very slight change to the base API, namely changing Symbol from a non-empty to an empty datatype, and thus might need to be run past the CLC.

@cdornan cdornan closed this Sep 12, 2023
@JakobBruenker
Copy link
Contributor

The proposal says

The patch is very small.

Does this patch exist somewhere? Maybe even as open merge request?

@adamgundry
Copy link
Contributor

I don't know if @phadej has pushed a branch anywhere, but I believe literally the entire change envisaged here is replacing data Symbol in GHC.Types with newtype Symbol = MkSymbol String.

@phadej
Copy link
Contributor

phadej commented Dec 8, 2023

@cdornan, @adamgundry is anyone working on this? Is anyone writing a CLC proposal, which apparently this change requires after all?

EDIT: happy anniversary to this feature request.

@adamgundry
Copy link
Contributor

Sorry for the delay @phadej. I'm not working on this, and thinking it over, I'm not convinced it needs a CLC proposal. The CLC seems to be quite burdened with the workload of responding to proposals, so I think it's better not to refer things that are not strictly in scope.

Assuming the constructor is exposed only from ghc-internal/ghc-experimental, then the only visible effect if you don't depend on those packages is that some (rare) programs that previously compiled without warnings will now compile with warnings. The fact that case x :: Symbol of {} currently does not emit a warning, when Symbol is morally an abstract type, seems to me a bug, and we can fix it by making the change you propose.

@phadej
Copy link
Contributor

phadej commented Dec 18, 2023

The fact that case x :: Symbol of {} currently does not emit a warning, when Symbol is morally an abstract type, seems to me a bug, ...

Or at least a missing feature. It's impossible to hide the fact that an empty type is empty. (There is no constructors to hide in export declaration, and even if there are constructors of GADTs, the pattern match exhaustiveness check still sees the emptyness at particular indices). Similarly, there is no way to make an empty PatternSynonyms match complete, the following doesn't work

module FastFin (Fin) where

import GHC.TypeNats

newtype Fin (n :: Nat) = MkFin Integer

{-# COMPLETE :: Fin 0 #-}

@phadej
Copy link
Contributor

phadej commented Dec 18, 2023

Assuming the constructor is exposed only from ghc-internal/ghc-experimental

I'd be happy with just exporting it from ghc-prim, where the type is defined currently. I don't know what's are the implications of re-exporting it from the ghc-* packages. As far as I understand the ghc-internal is as "don't use this package" as ghc-prim and AFAIU whether Symbol could be defined in ghc-internal or ghc-prim is arbitrary, but it is in ghc-prim because implicit params machinery (defined in ghc-prim) needs it.

EDIT: In comparison, ghc-bignat is not advertised as "don't use this directly" package. Maybe it should be, but OTOH there is stuff only available from there. IMO, it's morally a part of ghc-internal.

@adamgundry
Copy link
Contributor

@phadej agreed on all counts. I wasn't meaning to imply a ghc-internal export would be preferable to ghc-prim; as you say they are both equally internal and the boundary between them is merely down to implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs revision The proposal needs changes in response to shepherd or committee review feedback
Development

Successfully merging this pull request may close these issues.