-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathoskar_wickstrom2.tex
1164 lines (988 loc) · 43.6 KB
/
oskar_wickstrom2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Time Travelling and Fixing Bugs with Property-Based Testing - Oskar Wickström}
\begin{quotation}
\noindent\textit{\textbf{William Yao:}}
\textit{Another large-ish case study of using property-based testing, this time introducing a technique to help ensure that you’re genuinely testing enough of your code’s input space.}
\vspace{\baselineskip}
\noindent\textit{Original article: \cite{time_travelling}}
\end{quotation}
Property-based testing (PBT) is a powerful testing technique that helps
us find edge cases and bugs in our software. A challenge in applying PBT
in practice is coming up with useful properties. This tutorial is based
on a simple but realistic system under test (SUT), aiming to show some
ways you can test and find bugs in such logic using PBT. It covers
refactoring, dealing with non-determinism, testing generators
themselves, number of examples to run, and coupling between tests and
implementation. The code is written in Haskell and the testing framework
used is \href{http://hackage.haskell.org/package/hedgehog}{Hedgehog}.
This tutorial was originally written as a book chapter, and later
extracted as a standalone piece. Since I'm not expecting to finish the
PBT book any time soon, I decided to publish the chapter here.
\section{System Under Test: User Signup
Validation}
\label{system-under-test-user-signup-validation}
The business logic we'll test is the validation of a website's user
signup form. The website requires users to sign up before using the
service. When signing up, a user must pick a valid username. Users must
be between 18 and 150 years old.
Stated formally, the validation rules are:
\begin{align*} 0 < \text{length}(\text{name}) &\leq 50 \\ 18 < \text{age} &\leq 150 \end{align*} \qquad(1)
\vspace{0.8\baselineskip}
\noindent The signup and its validation is already implemented by previous
programmers. There have been user reports of strange behaviour, and
we're going to locate and fix the bugs using property tests.
Poking around the codebase, we find the data type representing the form:
\begin{minted}{haskell}
data SignupForm = SignupForm
{ formName :: Text
, formAge :: Int
} deriving (Eq, Show)
\end{minted}
And the existing validation logic, defined as \texttt{validateSignup}.
We won't dig into to the implementation yet, only its type signature:
\begin{minted}{haskell}
validateSignup
:: SignupForm -> Validation (NonEmpty SignupError) Signup
\end{minted}
It's a pure function, taking \texttt{SignupForm} data as an argument,
and returning a \texttt{Validation} value. In case the form data is
valid, it returns a \texttt{Signup} data structure. This data type
resembles \texttt{SignupForm} in its structure, but refines the age as a
\texttt{Natural} when valid:
\begin{minted}{haskell}
data Signup = Signup
{ name :: Text
, age :: Natural
} deriving (Eq, Show)
\end{minted}
In case the form data is invalid, \texttt{validateSignup} returns a
non-empty list of \texttt{SignupError} values. \texttt{SignupError} is a
union type of the possible validation errors:
\begin{minted}{haskell}
data SignupError
= NameTooShort Text
| NameTooLong Text
| InvalidAge Int
deriving (Eq, Show)
\end{minted}
\subsection{The Validation Type}
\label{the-validation-type}
The \texttt{Validation} type comes from the
\href{https://hackage.haskell.org/package/validation}{validation}
package. It's parameterized by two types:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
the type of validation failures
\item
the type of a successfully validated value
\end{enumerate}
The \texttt{Validation} type is similar to the
\href{https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Either.html\#t:Either}{Either}
type. The major difference is that it \emph{accumulates} failures,
rather than short-circuiting on the first failure. Failures are
accumulated when combining multiple \texttt{Validation} values using
\texttt{Applicative}.
Using a non-empty list for failures in the \texttt{Validation} type is
common practice. It means that if the validation fails, there's at least
one error value.
\section{Validation Property Tests}
\label{validation-property-tests}
Let's add some property tests for the form validation, and explore the
existing implementation. We begin in a new test module, and we'll need a
few imports:
\begin{minted}{haskell}
import Data.List.NonEmpty (NonEmpty (..))
import Data.Text (Text)
import Data.Validation
import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range
\end{minted}
Also, we'll need to import the implementation module:
\begin{minted}{haskell}
import Validation
\end{minted}
We're now ready to define some property tests.
\subsection{A Positive Property
Test}
\label{a-positive-property-test}
The first property test we'll add is a \emph{positive} test. That is, a
test using only valid input data. This way, we know the form validation
should always be successful. We define
\texttt{prop\_valid\_signup\_form\_succeeds}:
\begin{minted}{haskell}
prop_valid_signup_form_succeeds = property $ do
let genForm = SignupForm <$> validName <*> validAge -- (1)
form <- forAll genForm -- (2)
case validateSignup form of -- (3)
Success{} -> pure ()
Failure failure' -> do
annotateShow failure'
failure
\end{minted}
First, we define \texttt{genForm} (1), a generator producing form data
with valid names and ages. Next, we generate \texttt{form} values from
our defined generator (2). Finally, we apply the \texttt{validateSignup}
function and pattern match on the result (3):
\begin{itemize}
\item
In case it's successful, we have the test pass with \texttt{pure\ ()}
\item
In case it fails, we print the \texttt{failure\textquotesingle{}} and
fail the test
\end{itemize}
The \texttt{validName} and \texttt{validAge} generators are defined as
follows:
\begin{minted}{haskell}
validName :: Gen Text
validName = Gen.text (Range.linear 1 50) Gen.alphaNum
validAge :: Gen Int
validAge = Gen.integral (Range.linear 1 150)
\end{minted}
Recall the validation rules (eq.~1). The ranges in these generators
yielding valid form data are defined precisely in terms of the
validation rules.
The character generator used for names is \texttt{alphaNum}, meaning
we'll only generate names with alphabetic letters and numbers. If you're
comfortable with regular expressions, you can think of
\texttt{genValidName} as producing values matching
\texttt{{[}a-zA-Z0-9{]}+}.
Let's run some tests:
\vspace{\baselineskip}
\begin{minipage}[l]{\textwidth}
\noindent$\lambda$\verb|> check prop_valid_signup_form_succeeds| \\
\hspace*{1cm}\checkmark \verb|<interactive> passed 100 tests.|
\end{minipage}
\vspace{\baselineskip}
\noindent Hooray, it works.
\subsection{Negative Property Tests}
\label{negative-property-tests}
In addition to the positive test, we'll add \emph{negative} tests for
the name and age, respectively. Opposite to positive tests, our negative
tests will only use invalid input data. We can then expect the form
validation to always fail.
First, let's test invalid names.
\begin{minted}{haskell}
prop_invalid_name_fails = property $ do
let genForm = SignupForm <$> invalidName <*> validAge -- (1)
form <- forAll genForm
case validateSignup form of -- (2)
Failure (NameTooLong{} :| []) -> pure ()
Failure (NameTooShort{} :| []) -> pure ()
other -> do -- (3)
annotateShow other
failure
\end{minted}
Similar to our the positive property test, we define a generator
\texttt{genForm} (1). Note that we use \texttt{invalidName} instead of
\texttt{validName}.
Again, we pattern match on the result of applying
\texttt{validateSignup} (2). In this case we expect failure. Both
\texttt{NameTooLong} and \texttt{NameTooShort} are expected failures. If
we get anything else, the test fails (3).
The test for invalid age is similar, expect we use the
\texttt{invalidAge} generator, and expect only \texttt{InvalidAge}
validation failures:
\begin{minted}{haskell}
prop_invalid_age_fails = property $ do
let genForm = SignupForm <$> validName <*> invalidAge
form <- forAll genForm
case validateSignup form of
Failure (InvalidAge{} :| []) -> pure ()
other -> do
annotateShow other
failure
\end{minted}
The \texttt{invalidName} and \texttt{invalidAge} generators are also
defined in terms of the validation rules (eq.~1), but with ranges
ensuring no overlap with valid data:
\begin{minted}{haskell}
invalidName :: Gen Text
invalidName =
Gen.choice [mempty, Gen.text (Range.linear 51 100) Gen.alphaNum]
invalidAge :: Gen Int
invalidAge = Gen.integral (Range.linear minBound 0)
\end{minted}
Let's run our new property tests:
\begin{quote}
$\lambda$\verb|> check prop_invalid_name_fails| \\
\hspace*{1cm}\checkmark \verb|<interactive> passed 100 tests.| \\
\\
$\lambda$\verb|> check prop_invalid_age_fails| \\
\hspace*{1cm}\checkmark \verb|<interactive> passed 100 tests.|
\end{quote}
All good? Maybe not. The astute reader might have noticed a problem with
one of our generators. We'll get back to that later.
\subsection{Accumulating All
Failures}
\label{accumulating-all-failures}
When validating the form data, we want \emph{all} failures returned to
the user posting the form, rather than returning only one at a time. The
\texttt{Validation} type accumulates failures when combined with
\texttt{Applicative}, which is exactly what we want. Yet, while the hard
work is handled by \texttt{Validation}, we still need to test that we're
correctly combining validations in \texttt{validateSignup}.
We define a property test generating form data, where all fields are
invalid (1). It expects the form validation to fail, returning two
failures (2).
\begin{minted}{haskell}
prop_two_failures_are_returned = property $ do
let genForm = SignupForm <$> invalidName <*> invalidAge -- (1)
form <- forAll genForm
case validateSignup form of
Failure failures | length failures == 2 -> pure () -- (2)
other -> do
annotateShow other
failure
\end{minted}
This property is weak. It states nothing about \emph{which} failures
should be returned. We could assert that the validation failures are
equal to some expected list. But how do we know if the name is too long
or too short? I'm sure you'd be less thrilled if we replicated all of
the validation logic in this test.
Let's define a slightly stronger property. We pattern match, extract the
two failures (1), and check that they're not equal (2).
\begin{minted}{haskell}
prop_two_different_failures_are_returned = property $ do
let genForm = SignupForm <$> invalidName <*> invalidAge
form <- forAll genForm
case validateSignup form of
Failure (failure1 :| [failure2]) -> -- (1)
failure1 /== failure2 -- (2)
other -> do
annotateShow other
failure
\end{minted}
We're still not being specific about which failures should be returned.
But unlike \texttt{prop\_two\_failures\_\-are\_\-returned}, this property at
least makes sure there are no duplicate failures.
\section{The Value of a Property}
\label{the-value-of-a-property}
Is there a faulty behaviour that would slip past
\texttt{prop\_two\_different\_failures\_are\_returned}? Sure. The
implementation could have a typo or copy-paste error, and always return
\texttt{NameTooLong} failures, even if the name is too short. Does this
mean our property is bad? Broken? Useless?
In itself, this property doesn't give us strong confidence in the
correctness of \texttt{validateSignup}. In conjuction with our other
properties, however, it provides value. Together they make up a stronger
test suite.
Let's look at it in another way. What are the \emph{benefits} of weaker
properties over stronger ones? In general, weak properties are
beneficial in that they are:
\begin{enumerate}
\item
easier to define
\item
likely to catch simple mistakes early
\item
less coupled to the SUT
\end{enumerate}
A small investment in a set of weak property tests might catch a lot of
mistakes. While they won't precisely specify your system and catch the
trickiest of edge cases, their power-to-weight ratio is compelling.
Moreover, a set of weak properties is better than no properties at all.
If you can't formulate the strong property you'd like, instead start
simple. Lure out some bugs, and improve the strength and specificity of
your properties over time.
Coming up with good properties is a skill. Practice, and you'll get
better at it.
\section{Testing Generators}
\label{testing-generators}
Remember how in section
\ref{negative-property-tests} we noted that there's a problem? The issue is, we're not
covering all validation rules in our tests. But the problem is not in
our property definitions. It's in one of our \emph{generators}, namely
\texttt{genInvalidAge}. We're now in a perculiar situation: we need to
test our tests.
One way to test a generator is to define a property specifically testing
the values it generates. For example, if we have a generator
\texttt{positive} that is meant to generate only positive integers, we
can define a property that asserts that all generated integers are
positive:
\begin{minted}{haskell}
positive :: Gen Int
positive = Gen.integral (Range.linear 1 maxBound)
prop_integers_are_positive = property $ do
n <- forAll positive
assert (n >= 1)
\end{minted}
We could use this technique to check that all values generated by
\texttt{validAge} are valid. How about \texttt{invalidAge}? Can we check
that it generates values such that all boundaries of our validation
function are hit? No, not using this technique. Testing the correctness
of a generator using a property can only find problems with
\emph{individual} generated values. It can't perform assertions over
\emph{all} generated values. In that sense, it's a \emph{local}
assertion.
Instead, we'll find the generator problem by capturing statistics on the
generated values and performing \emph{global} assertions. Hedgehog, and
a few other PBT frameworks, can measure the occurences of user-defined
\emph{labels}. A label in Hedgehog is a \texttt{Text} value, declared
with an associated condition. When Hedgehog runs the tests, it records
the percentage of tests in which the condition evaluates to
\texttt{True}. After the test run is complete, we're presented with a
listing of percentages per label.
We can even have Hedgehog fail the test unless a certain percentage is
met. This way, we can declare mininum coverage requirements for the
generators used in our property tests.
\subsection{Adding Coverage Checks}
\label{adding-coverage-checks}
Let's check that we generate values covering enough cases, based on the
validation rules in eq.~1 . In \texttt{prop\_invalid\_age\_fails}, we
use \texttt{cover} to ensure we generate values outside the boundaries
of valid ages. 5\% is enough for each, but realistically they could both
get close to 50\%.
\begin{minted}{haskell}
prop_invalid_age_fails = property $ do
let genForm = SignupForm <$> validName <*> invalidAge
form <- forAll genForm
cover 5 "too young" (formAge form <= 0)
cover 5 "too old" (formAge form >= 151)
case validateSignup form of
Failure (InvalidAge{} :| []) -> pure ()
other -> do
annotateShow other
failure
\end{minted}
Let's run some tests again.
\begin{figure}[htbp]
\centering
\includegraphics[width=.95\linewidth]{./pics/hedgehog1.png}
\caption{Hedgehog fails}
\label{fig:hedgehog1}
\end{figure}
\noindent 100\% too young and 0\% too old. The \texttt{invalidAge} generator is
clearly not good enough. Let's have a look at its definition again:
\begin{minted}{haskell}
invalidAge :: Gen Int
invalidAge = Gen.integral (Range.linear minBound 0)
\end{minted}
We're only generating invalid ages between the minimum bound of
\texttt{Int} and \texttt{0}. Let's fix that, by using
\texttt{Gen.choice} and another generator for ages greater than 150:
\begin{minted}{haskell}
invalidAge :: Gen Int
invalidAge = Gen.choice
[ Gen.integral (Range.linear minBound 0)
, Gen.integral (Range.linear 151 maxBound)
]
\end{minted}
Running tests again, the coverage check stops complaining. But there's
another problem:
\begin{figure}[htbp]
\centering
\includegraphics[width=.95\linewidth]{./pics/hedgehog2.png}
\caption{Hedgehog fails}
\label{fig:hedgehog2}
\end{figure}
\noindent OK, we have an actual bug. When the age is 151 or greater, the form is
deemed valid. It should cause a validation failure. Looking closer at
the implementation, we see that a pattern guard is missing the upper
bound check:
\begin{minted}{haskell}
validateAge age' | age' > 0 = Success (fromIntegral age')
| otherwise = Failure (pure (InvalidAge age'))
\end{minted}
If we change it to
\mintinline{haskell}{age' > 0 && age' <= 150},
and rerun the tests, they pass.
\begin{figure}[htbp]
\centering
\includegraphics[width=.95\linewidth]{./pics/hedgehog3.png}
\caption{Hedgehog passes}
\label{fig:hedgehog3}
\end{figure}
\noindent We've fixed the bug.
Measuring and declaring requirements on coverage is a powerful tool in
Hedgehog. It gives us visibility into the generative tests we run,
making it practical to debug generators. It ensures our tests meet our
coverage requirements, even as implementation and tests evolve over
time.
\section{From Ages to Birth Dates}
\label{from-ages-to-birth-dates}
So far, our efforts have been successful. We've fixed real issues in
both implementation and tests. Management is pleased. They're now asking
us to modify the signup system, and use our testing skills to ensure
quality remains high.
Instead of entering their age, users will enter their birth date. Let's
suppose this information is needed for something important, like sending
out birthday gifts. The form validation function must be modified to
check, based on the supplied birth date date, if the user signing up is
old enough.
First, we import the \texttt{Calendar} module from the \texttt{time}
package:
\begin{minted}{haskell}
import Data.Time.Calendar
\end{minted}
Next, we modify the \texttt{SignupForm} data type to carry a
\texttt{formBirthDate} of type \texttt{Date}, rather than an
\texttt{Int}.
\begin{minted}{haskell}
data SignupForm = SignupForm
{ formName :: Text
, formBirthDate :: Day
} deriving (Eq, Show)
\end{minted}
And we make the corresponding change to the \texttt{Signup} data type:
\begin{minted}{haskell}
data Signup = Signup
{ name :: Text
, birthDate :: Day
} deriving (Eq, Show)
\end{minted}
We've also been requested to improve the validation errors. Instead of
just \texttt{InvalidAge}, we define three constructors for various
invalid birthdates:
\begin{minted}{haskell}
data SignupError
= NameTooShort Text
| NameTooLong Text
| TooYoung Day
| TooOld Day
| NotYetBorn Day
deriving (Eq, Show)
\end{minted}
Finally, we need to modify the \texttt{validateSignup} function. Here,
we're faced with an important question. How should the validation
function obtain \emph{today's date}?
\subsection{Keeping Things
Deterministic}
\label{keeping-things-deterministic}
We could make \texttt{validateSignup} a non-deterministic action, which
in Haskell would have the following type signature:
\begin{minted}{haskell}
validateSignup
:: SignupForm -> IO (Validation (NonEmpty SignupError) Signup)
\end{minted}
Note the use of \texttt{IO}. It means we could retrieve the current time
from the system clock, and extract the \texttt{Day} value representing
today's date. But this approach has severe drawbacks.
If \texttt{validateSignup} uses \texttt{IO} to retrieve the current
date, we can't test it with other dates. What it there's a bug that
causes validation to behave incorrectly only on a particular date? We'd
have to run the tests on that specific date to trigger it. If we
introduce a bug, we want to know about it \emph{immediately}. Not weeks,
months, or even years after the bug was introduced. Furthermore, if we
find such a bug with our tests, we can't easily reproduce it on another
date. We'd have to rewrite the implementation code to trigger the bug
again.
Instead of using \texttt{IO}, we'll use a simply technique for keeping
our function pure: take all the information the function needs as
arguments. In the case of \texttt{validateSignup}, we'll pass today's
date as the first argument:
\begin{minted}{haskell}
validateSignup
:: Day -> SignupForm -> Validation (NonEmpty SignupError) Signup
\end{minted}
Again, let's not worry about the implementation just yet. We'll focus on
the tests.
\subsection{Generating Dates}
\label{generating-dates}
In order to test the new \texttt{validateSignup} implementation, we need
to generate \texttt{Day} values. We're going to use a few functions from
a separate module called \texttt{Data.Time.Gen}, previously written by
some brilliant developer in our team. Let's look at their type
signatures. The implementations are not very interesting.
The generator, \texttt{day}, generates a day within the given range:
\begin{minted}{haskell}
day :: Range Day -> Gen Day
\end{minted}
A day range is constructed with \texttt{linearDay}:
\begin{minted}{haskell}
linearDay :: Day -> Day -> Range Day
\end{minted}
Alternatively, we might use \texttt{exponentialDay}:
\begin{minted}{haskell}
exponentialDay :: Day -> Day -> Range Day
\end{minted}
The \texttt{linearDay} and \texttt{exponentialDay} range functions are
analoguous to Hedgehog's \texttt{linear} and \texttt{exponential} ranges
for integral numbers.
To use the generator functions from \texttt{Data.Time.Gen}, we first add
an import, qualified as \texttt{Time}:
\begin{minted}{haskell}
import qualified Data.Time.Gen as Time
\end{minted}
Next, we define a generator \texttt{anyDay}:
\begin{minted}{haskell}
anyDay :: Gen Day
anyDay =
let low = fromGregorian 1900 1 1
high = fromGregorian 2100 12 31
in Time.day (Time.linearDay low high)
\end{minted}
The date range [1900-01-01,2100-12-31]
is arbitrary. We could pick any centuries we like, provided the
\texttt{time} package supports the range. But why not make it somewhat
realistic?
\subsection{Rewriting Existing
Properties}
\label{rewriting-existing-properties}
Now, it's time to rewrite our existing property tests. Let's begin with
the one testing that validating a form with all valid data succeeds:
\begin{minted}{haskell}
prop_valid_signup_form_succeeds = property $ do
today <- forAll anyDay -- (1)
let genForm = SignupForm <$> validName <*> validBirthDate today
form <- forAll genForm -- (2)
case validateSignup today form of
Success{} -> pure ()
Failure failure' -> do
annotateShow failure'
failure
\end{minted}
A few new things are going on here. We're generating a date representing
today (1), and generating a form with a birth date based on today's date
(2). Generating today's date, we're effectively time travelling and
running the form validation on that date. This means our
\texttt{validBirthDate} generator must know which date is today, in
order to pick a valid birth date. We pass today's date as a parameter,
and generate a date within the range of 18 to 150 years earlier:
\begin{minted}{haskell}
validBirthDate :: Day -> Gen Day
validBirthDate today = do
n <- Gen.integral (Range.linear 18 150)
pure (n `yearsBefore` today)
\end{minted}
We define the helper function \texttt{yearsBefore} in the test suite. It
offsets a date backwards in time by a given number of years:
\begin{minted}{haskell}
yearsBefore :: Integer -> Day -> Day
yearsBefore years = addGregorianYearsClip (negate years)
\end{minted}
The \texttt{Data.Time.Calendar} module exports the
\texttt{addGregorianYearsClip} function. It adds a number of years,
clipping February 29th (leap days) to February 28th where necessary.
Let's run tests:
\begin{quote}
$\lambda$\verb|> check prop_valid_signup_form_succeeds| \\
\hspace*{1cm}\checkmark \verb|<interactive> passed 100 tests.|
\end{quote}
Let's move on to the next property, checking that invalid birth dates do
\emph{not} pass validation. Here, we use the same pattern as before,
generating today's date, but use \texttt{invalidBirthDate} instead:
\begin{minted}{haskell}
prop_invalid_age_fails = property $ do
today <- forAll anyDay
form <- forAll (SignupForm <$> validName <*> invalidBirthDate today)
cover 5 "not yet born" (formBirthDate form > today)
cover 5 "too young" (formBirthDate form > 18 `yearsBefore` today)
cover 5 "too old" (formBirthDate form < 150 `yearsBefore` today)
case validateSignup today form of
Failure (TooYoung{} :| []) -> pure ()
Failure (NotYetBorn{} :| []) -> pure ()
Failure (TooOld{} :| []) -> pure ()
other -> do
annotateShow other
failure
\end{minted}
Notice that we've also adjusted the coverage checks. There's a new
label, ``not born yet,'' for birth dates in the future. Running tests,
we see the label in action:
\begin{figure}[htbp]
\centering
\includegraphics[width=.95\linewidth]{./pics/hedgehog4.png}
\caption{Hedgehog results}
\label{fig:hedgehog4}
\end{figure}
\noindent Good coverage, all tests passing. We're not quite done, though. There's
a particular set of dates that we should be sure to cover: ``today''
dates and birth dates that are close to, or exactly, 18 years apart.
Within our current property test for invalid ages, we're only sure that
generated birth dates include at least 5\% too old, and at least 5\% too
young. We don't know how far away from the ``18 years'' validation
boundary they are.
We could tweak our existing generators to produce values close to that
boundary. Given a date $T$, exactly 18 years before today's date, then:
\begin{itemize}
\item
\texttt{invalidBirthDate} would need to produce birth dates just after
but not equal to $T$
\item
\texttt{validBirthDate} would need to produce birth dates just before
or equal to $T$
\end{itemize}
There's another option, though. Instead of defining separate properties
for valid and invalid ages, we'll use a \emph{single} property for all
cases. This way, we only need a single generator.
\section{A Single Validation
Property}
\label{a-single-validation-property}
In \href{https://www.youtube.com/watch?v=NcJOiQlzlXQ}{Building on
developers' intuitions to create effective property-based tests}, John
Hughes talks about ``one property to rule them all.'' Similarly, we'll
define a single property \texttt{prop\_validates\_age} for birth date
validation. We'll base our new property on
\texttt{prop\_invalid\_age\_fails}, but generalize to cover both
positive and negative tests:
\begin{minted}{haskell}
prop_validates_age = property $ do
today <- forAll anyDay
form <- forAll (SignupForm <$> validName <*> anyBirthDate today) -- (1)
let tooYoung = formBirthDate form > 18 `yearsBefore` today -- (2)
notYetBorn = formBirthDate form > today
tooOld = formBirthDate form < 150 `yearsBefore` today
oldEnough = formBirthDate form <= 18 `yearsBefore` today
exactly age = formBirthDate form == age `yearsBefore` today
closeTo age =
let diff' =
diffDays (formBirthDate form) (age `yearsBefore` today)
in abs diff' `elem` [0 .. 2]
cover 10 "too young" tooYoung
cover 1 "not yet born" notYetBorn
cover 1 "too old" tooOld
cover 20 "old enough" oldEnough -- (3)
cover 1 "exactly 18" (exactly 18)
cover 5 "close to 18" (closeTo 18)
case validateSignup today form of -- (4)
Failure (NotYetBorn{} :| []) | notYetBorn -> pure ()
Failure (TooYoung{} :| []) | tooYoung -> pure ()
Failure (TooOld{} :| []) | tooOld -> pure ()
Success{} | oldEnough -> pure ()
other -> annotateShow other >> failure
\end{minted}
There are a few new things going on here:
\begin{enumerate}
\item
Instead of generating exclusively invalid or valid birth dates, we're
now generating \emph{any} birth date based on today's date
\item
The boolean expressions are used both in coverage checks and in
asserting, so we separate them in a \texttt{let} binding
\item
We add three new labels for the valid cases
\item
Finally, we assert on both valid and invalid cases, based on the same
expressions used in coverage checks
\end{enumerate}
Note that our assertions are more specific than in
\texttt{prop\_invalid\_age\_fails}. The failure cases only pass if the
corresponding label expressions are true. The \texttt{oldEnough} case
covers all valid birth dates. Any result other than the four expected
cases is considered incorrect.
The \texttt{anyBirthDate} generator is based on today's date:
\begin{minted}{haskell}
anyBirthDate :: Day -> Gen Day
anyBirthDate today =
let -- (1)
inPast range = do
years <- Gen.integral range
pure (years `yearsBefore` today)
inFuture = do
years <- Gen.integral (Range.linear 1 5)
pure (addGregorianYearsRollOver years today)
daysAroundEighteenthYearsAgo = do
days <- Gen.integral (Range.linearFrom 0 (-2) 2)
pure (addDays days (18 `yearsBefore` today))
in -- (2)
Gen.frequency
[ (5, inPast (Range.exponential 1 150))
, (1, inPast (Range.exponential 151 200))
, (2, inFuture)
, (2, daysAroundEighteenthYearsAgo)
]
\end{minted}
We defines helper functions (1) for generating dates in the past, in the
future, and close to 18 years ago. Using those helper functions, we
combine four generators, with different date ranges, using a
\texttt{Gen.frequency} distribution (1). The weights we use are selected
to give us a good coverage.
Let's run some tests (see figure \ref{fig:hedgehog5})
\begin{figure}[htbp]
\centering
\includegraphics[width=.95\linewidth]{./pics/hedgehog5.png}
\caption{Hedgehog results}
\label{fig:hedgehog5}
\end{figure}
\noindent Looks good! We've gone from testing positive and negative cases
separately, to instead have a single property covering all cases, based
on a single generator. It's now easier to generate values close to the
valid/invalid boundary of our SUT, i.e.~around 18 years from today's
date.
\section{February 29th}
\label{february-29th}
For the fun of it, let's run some more tests. We'll crank it up to
20000 (see figure \ref{fig:hedgehog6}
\begin{figure}[htbp]
\centering
\includegraphics[width=\linewidth]{./pics/hedgehog6.png}
\caption{Hedgehog results}
\label{fig:hedgehog6}
\end{figure}
\noindent Failure! Chaos! What's going on here? Let's examine the test case:
\begin{itemize}
\item
Today's date is 1956-02-29
\item
The birth date is 1938-03-01
\item
The validation function considers this \emph{valid} (it returns a
\texttt{Success} value)
\item
The test does considers this \emph{invalid} (\texttt{oldEnough} is
\texttt{False})
\end{itemize}
This means that when the validation runs on a leap day,
\href{https://en.wikipedia.org/wiki/February_29\#Born_on_February_29}{February
29th}, and the person would turn 18 years old the day after (on March
1st), the validation function incorrectly considers the person old
enough. We've found a bug.
\subsection{Test Count and Coverage}
\label{test-count-and-coverage}
Two things led us to find this bug:
\begin{enumerate}
\item
Most importantly, that we generate today's date and pass it as a
parameter. Had we used the actual date, retrieved with an IO action,
we'd only be able to find this bug every 1461 days. Pure functions are
easier to test.
\item
That we ran more tests than the default of 100. We might not have
found this bug until much later, when the generated dates happened to
trigger this particular bug. In fact, running 20000 tests does not
always trigger the bug.
\end{enumerate}
Our systems are often too complex to be tested exhaustively. Let's use
our form validation as an example. Between 1900-01-01 and 2100-12-31
there are 73,413 days. Selecting today's date and the birth date from
that range, we have more than five billion combinations. Running that
many Hedgehog tests in GHCi on my laptop (based on some quick
benchmarks) would take about a month. And this is a simple pure
validation function!
To increase coverage, even if it's not going to be exhaustive, we can
increase the number of tests we run. But how many should we run? On a
continuous integration server we might be able to run more than we do
locally, but we still want to keep a tight feedback loop. And what if
our generators never produce inputs that reveal existing bugs,
regardless of the number of tests we run?
If we can't test exhaustively, we need to ensure our generators cover
interesting combinations of inputs. We need to carefully design and
measure our tests and generators, based on the edge cases we already
know of, as well as the ones that we discover over time. PBT without
measuring coverage easily turns into a false sense of security.
In the case of our leap day bug, we can catch it with fewer tests, and
on every test run. We need to make sure we cover leap days, used both as
today's date and as the birth date, even with a low number of tests.
\subsection{Covering Leap Days}
\label{covering-leap-days}
To generate inputs that cover certain edge cases, we combine specific
generators using \texttt{Gen.frequency}:
\begin{minted}{haskell}
(today, birthDate') <- forAll
(Gen.frequency
[ (5, anyDayAndBirthDate) -- (1)
, (2, anyDayAndBirthDateAroundYearsAgo 18) -- (2)
, (2, anyDayAndBirthDateAroundYearsAgo 150)
, (1, leapDayAndBirthDateAroundYearsAgo 18) -- (3)
, (1, leapDayAndBirthDateAroundYearsAgo 150)
, (1, commonDayAndLeaplingBirthDateAroundYearsAgo 18) -- (4)
, (1, commonDayAndLeaplingBirthDateAroundYearsAgo 150)
]
)
\end{minted}
Arbitrary values for today's date and the birth date are drawn most
frequently (1), with a weight of 5. Next, with weights of 2, are
generators for cases close to the boundaries of the validation function
(2). Finally, with weights of 1, are generators for special cases
involving leap days as today's date (3) and leap days as birth date (4).
Note that these generators return pairs of dates. For most of these
generators, there's a strong relation between today's date and the birth
date. For example, we can't first generate \emph{any} today's date, pass
that into a generator function, and expect it to always generate a leap
day that occured 18 years ago. Such a generator would have to first
generate the leap day and then today's date.
Let's define the generators. The first one, \texttt{anyDayAndBirthDate},
picks any today's date within a wide date range. It also picks a birth
date from an even wider date range, resulting in some future birth dates
and some ages above 150.
\begin{minted}{haskell}
anyDayAndBirthDate :: Gen (Day, Day)
anyDayAndBirthDate = do
today <- Time.day
(Time.linearDay (fromGregorian 1900 1 1)
(fromGregorian 2020 12 31)
)
birthDate' <- Time.day
(Time.linearDay (fromGregorian 1850 1 1)
(fromGregorian 2050 12 31)
)
pure (today, birthDate')
\end{minted}
Writing automated tests with a hard-coded year 2020 might scare you.
Won't these tests fail when run in the future? No, not these tests.
Remember, the validation function is deterministic. We control today's
date. The \emph{actual} date on which we run these tests doesn't matter.
Similar to the previous generator is
\texttt{anyDayAndBirthDateAroundYearsAgo}. First, it generates any date
as today's date (1). Next, it generates an arbitrary date approximately
some number of years ago (2), where the number of years is an argument
of the generator.
\begin{minted}{haskell}
anyDayAndBirthDateAroundYearsAgo :: Integer -> Gen (Day, Day)
anyDayAndBirthDateAroundYearsAgo years = do
today <- Time.day -- (1)
(Time.linearDay (fromGregorian 1900 1 1)
(fromGregorian 2020 12 31)
)
birthDate' <- addingApproxYears (negate years) today -- (2)
pure (today, birthDate')
\end{minted}
The \texttt{addingApproxYearsAgo} generator adds a number of years to a
date, and offsets it between two days back and two days forward in time.
\begin{minted}{haskell}
addingApproxYears :: Integer -> Day -> Gen Day
addingApproxYears years today = do
days <- Gen.integral (Range.linearFrom 0 (-2) 2)
pure (addDays days (addGregorianYearsRollOver years today))
\end{minted}
The last two generators used in our \texttt{frequency} distribution
cover leap day edge cases. First, let's define the
\texttt{leapDayAndBirthDateAroundYearsAgo} generator. It generates a
leap day used as today's date, and a birth date close to the given
number of years ago.
\begin{minted}{haskell}
leapDayAndBirthDateAroundYearsAgo :: Integer -> Gen (Day, Day)
leapDayAndBirthDateAroundYearsAgo years = do