-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdavid_luposchainsky.tex
131 lines (75 loc) · 7.32 KB
/
david_luposchainsky.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
\chapter{Algebraic blindness - David Luposchainsky}
\begin{quotation}
\noindent\textit{\textbf{William Yao:}}
\textit{The motivating problem: if I have a \texttt{Maybe Int}, I only implicitly know what the two branches \textbf{mean}. \texttt{Nothing} could mean ``something bad happened, abort'', or it could mean ``I haven't found anything yet, keep going.'' Conversely, a value of \texttt{Just x} could be a useful value, or it could be a subroutine signalling that an error code occurred. The names are \textbf{too generic}; the structure only tells us what we have, not what it's for.}
\textit{Goes through how we might solve this problem by defining new datatypes that are isomorphic to, say, \texttt{Bool}, but with more useful names. Note that the problem that the article talks about with regards to not having typeclass instances for your custom types can (since GHC 8.6) be solved using \texttt{DerivingVia}.}
\vspace{\baselineskip}
\noindent\textit{Original article: \cite{algebraic_blindness}}
\end{quotation}
\section{Abstract}
Algebraic data types make data more flexible, but also prone to a form of generalized Boolean Blindness, making code more difficult to maintain. Luckily, refactoring the issues is completely type-safe.
\section{Boolean blindness}
In programming, there is a common problem known as \textit{Boolean Blindness}, what it ``means'' to be \texttt{True} depends heavily on the context, and cannot be inferred by the value alone. Consider
\begin{minted}{haskell}
main = withFile True "file.txt" (\handle -> do stuff)
\end{minted}
The Boolean parameter could distinguish between read-only\-/\-write-only, or read-only\-/\-read+write, whether the file should be truncated before opening, and countless other possibilities.
And often there is an even worse issue: we have two booleans, and we do not know whether they describe something in the same problem domain. A boolean used for read-only-vs-write-only looks just the same as one distinguishing red from blue. And then, one day, we open a file in ``blue mode'', and probably nothing good follows.
The point is: in order to find out what the \texttt{True} means, you probably have to read documentation or code elsewhere.
\section{Haskell to the rescue}
Haskell makes it very natural and cheap to define our own data types. The example code above would be expressed as
\begin{minted}{haskell}
main = withFile ReadMode "file.txt" (\handle -> do stuff)
\end{minted}
in more idiomatic Haskell. Here, the meaning of the field is obvious, and since a data type \texttt{data IOMode = ReadMode | WriteMode} has nothing to do with a data type \texttt{data Colours = Red | Blue}, we cannot accidentally pass a \texttt{Red} value to our function, despite the fact that they would both have corresponded to \texttt{True} in the Boolean-typed example.
\section{The petting zoo of blindness}
Boolean blindness is of course just a name for the most characteristic version of the issue that most standard types share. An \texttt{Int} parameter to a server might be a port or a timeout, a \texttt{String} could be a host, a route, a log prefix, and so on.
The Haskell solution is to simply wrap things in newtypes, tagging values with phantom types, or introducing type synonyms. (I'm not a fan of the latter, which you can read more about in a \href{https://github.com/quchen/articles/blob/master/tag-dont-type.md}{previous article}.)
\section{Algebraic blindness}
Haskell has a lot more ``simple, always available, nicely supported'' types than most other languages, for example \texttt{Maybe}, \texttt{Either}, tuples or lists. These types naturally extend Boolean Blindness.
\begin{itemize}
\item \texttt{Maybe} adds another distinct value to a type. \texttt{Nothing} is sometimes used to denote an error case (this is what many assume by default, implicitly given by its Monad instance), sometimes the ``nothing weird happened'' case, and sometimes something completely different.
\texttt{Maybe a} is as blind as \texttt{a}, plus one value.
\item \texttt{Either} is similar: sometimes \texttt{Left} is an exceptional case, sometimes it's just ``the other'' case.
\item \texttt{Either a b} is as blind as \texttt{a} plus as blind as \texttt{b}, plus one for the fact that \texttt{Left} and \texttt{Right} do not have intrinsic meaning.
\item Pairs have two fields, but how do they relate to each other? Does one maybe tell us about errors in the other? We cannot know.
\texttt{(a,b)} is as blind as \texttt{a} times the blindness of \texttt{b}.
\item Unit is not very blind, since even synonyms of it mostly mean the same thing: we don't really care about the result. \end{itemize}
In GHCi's source code, there is a value of type \texttt{Maybe Bool} passed around, which has three possible values:
\begin{enumerate}
\item \texttt{Nothing} means there is no more input to process when GHCi is invoked via \texttt{ghc -e}.
\item \texttt{Just True} reports success of the last command.
\item \texttt{Just False} is an error in the last command, and \texttt{ghc -e} should abort with an exit code of 1, while a GHCi session should continue.
\end{enumerate}
It is very hard to justify this over something like
\begin{minted}{haskell}
data CommandResult
= NoMoreInput -- ^ Haddock might go here!
| Success
| Failure
\end{minted}
which is just as easy to implement, has four lines of overhead (instead of 0 lines of overheadache), is easily searchable in full-text search even, and gives us type errors when we use it in the wrong context.
\section{Haskell to the rescue, for real this time}
We're lucky, because Haskell has a built-in solution here as well. We can prototype our programs not worrying about the precise meaning of symbols, use \texttt{Either () ()} instead of Bool because we might need the flexibility, and do all sorts of atrocities.
The type system allows us to repair our code: just understand what the different values of our blind values mean, and encode this in a new data type. Then pick a random use site, and just put it in there. The compiler will yell at you for a couple of minutes, but it will report every single conflicting site, and since you're introducing something entirely new, there is no way you are producing undetected clashes. I found this type of refactoring to be \textit{one of the most powerful tools Haskell has to offer}, but we rarely speak about it because it seems so normal to us.
\section{Drawbacks}
Introducing new domain types has a drawback: we lose the API of the standard types. The result is that we sometimes have to write boilerplate to get specific parts of the API back, which is unfortunate, and sometimes this makes it not worth bothering with the anti-blindness refactoring.
\section{Conclusion}
When you have lots of faceless data types in your code, consider painting them with their domain meanings. Make them distinct, make them memorable, make them maintainable. And sometimes, when you see a type that looks like
\begin{minted}{haskell}
data IOMode
= ReadMode
| WriteMode
| AppendMode
| ReadWriteMode
\end{minted}
take a moment to appreciate that the author hasn't used
\begin{minted}{haskell}
Either Bool Bool
-- ^ ^
-- | |
-- | Complex modes: append, read+write
-- |
-- Simple modes: read/write only
\end{minted}
instead.