WIP: Quick check implementation #87

wismill · 2022-06-07T17:07:12Z

Implement Quick Check algorithm.

This is very much WIP; it is intended to open the discussion on the implementation.

It relies on an ongoing PR of unicode-data.

Add proper benchmarks (not/almost) normalized.
~~Release unicode-data with QuickCheck properties.~~
~~Delete cabal.project;~~ change stack.yaml ~~and s390x.yaml~~ to use unicode-data release.

Fixes #1

Data/Unicode/Internal/NormalizeStream.hs

wismill · 2022-06-19T10:25:34Z

Latest benchmark

All
  unicode-transforms-text
    NFD/AllChars:    OK (2.76s)
      5.41 ms ± 126 μs,  7% faster than baseline
    NFD/Deutsch:     OK (0.59s)
      2.27 ms ±  66 μs, 27% faster than baseline
    NFD/Devanagari:  OK (0.33s)
      5.09 ms ± 184 μs
    NFD/English:     OK (13.85s)
      1.67 ms ±  11 μs, 32% faster than baseline
    NFD/Japanese:    OK (0.42s)
      6.49 ms ± 227 μs
    NFD/Korean:      OK (1.22s)
      9.58 ms ±  99 μs
    NFD/Vietnamese:  OK (0.30s)
      4.68 ms ± 179 μs, 17% faster than baseline
    NFKD/AllChars:   OK (1.11s)
      8.68 ms ± 201 μs, 11% faster than baseline
    NFKD/Deutsch:    OK (0.63s)
      2.42 ms ±  65 μs, 32% faster than baseline
    NFKD/Devanagari: OK (0.35s)
      5.37 ms ± 207 μs
    NFKD/English:    OK (4.06s)
      2.00 ms ±  24 μs, 34% faster than baseline
    NFKD/Japanese:   OK (0.47s)
      7.51 ms ± 238 μs,  8% slower than baseline
    NFKD/Korean:     OK (0.64s)
      10.0 ms ± 292 μs
    NFKD/Vietnamese: OK (0.63s)
      4.95 ms ± 167 μs, 18% faster than baseline
    NFC/AllChars:    OK (1.24s)
      9.80 ms ±  92 μs, 10% slower than baseline
    NFC/Deutsch:     OK (0.53s)
      4.16 ms ±  99 μs, 14% slower than baseline
    NFC/Devanagari:  OK (0.82s)
      6.39 ms ±  92 μs,  8% slower than baseline
    NFC/English:     OK (0.44s)
      3.40 ms ± 121 μs, 18% slower than baseline
    NFC/Japanese:    OK (0.64s)
      10.1 ms ± 186 μs
    NFC/Korean:      OK (0.31s)
      4.89 ms ± 174 μs,  3% slower than baseline
    NFC/Vietnamese:  OK (0.39s)
      12.5 ms ± 463 μs,  6% slower than baseline
    NFKC/AllChars:   OK (0.44s)
      14.0 ms ± 368 μs
    NFKC/Deutsch:    OK (0.32s)
      5.28 ms ± 210 μs, 31% slower than baseline
    NFKC/Devanagari: OK (0.42s)
      6.52 ms ± 170 μs, 13% slower than baseline
    NFKC/English:    OK (0.47s)
      3.64 ms ± 103 μs, 41% slower than baseline
    NFKC/Japanese:   OK (0.36s)
      11.7 ms ± 464 μs
    NFKC/Korean:     OK (1.40s)
      5.53 ms ±  64 μs, 18% slower than baseline
    NFKC/Vietnamese: OK (0.40s)
      12.8 ms ± 447 μs,  6% slower than baseline

So some improvements for decomposition but composition is slower. NFKC is also failling time to time when compared to ICU.

Edited: use <summary> tag.

wismill · 2022-07-07T18:36:15Z

@harendra-kumar I managed to obtain nice improvements:

Benchmark results

All
unicode-transforms-text
    NFD/AllChars:    OK (1.31s)
      5.13 ms ±  66 μs, 9.2 MB allocated, 815 B  copied,  31 MB peak memory, 0.72x, 17% faster than baseline
    NFD/Deutsch:     OK (1.12s)
      2.19 ms ±  24 μs, 6.3 MB allocated, 807 B  copied,  31 MB peak memory, 0.62x, 33% faster than baseline
    NFD/Devanagari:  OK (1.21s)
      4.70 ms ±  44 μs, 5.7 MB allocated, 799 B  copied,  31 MB peak memory, 0.61x,  7% faster than baseline
    NFD/English:     OK (1.70s)
      1.65 ms ±  17 μs, 5.7 MB allocated, 791 B  copied,  31 MB peak memory, 0.54x, 41% faster than baseline
    NFD/Japanese:    OK (0.77s)
      6.02 ms ±  84 μs, 9.3 MB allocated, 775 B  copied,  31 MB peak memory, 0.70x,  2% faster than baseline
    NFD/Korean:      OK (1.25s)
      9.71 ms ± 154 μs,  15 MB allocated, 1.6 KB copied,  32 MB peak memory, 0.32x
    NFD/Vietnamese:  OK (1.59s)
      6.21 ms ±  44 μs, 9.3 MB allocated, 759 B  copied,  32 MB peak memory, 1.12x, 11% faster than baseline
    NFKD/AllChars:   OK (1.14s)
      8.97 ms ± 101 μs,  12 MB allocated, 1.5 KB copied,  32 MB peak memory, 0.94x,  9% faster than baseline
    NFKD/Deutsch:    OK (1.40s)
      2.71 ms ±  50 μs, 6.3 MB allocated, 743 B  copied,  32 MB peak memory, 0.77x, 23% faster than baseline
    NFKD/Devanagari: OK (0.63s)
      4.91 ms ±  96 μs, 5.7 MB allocated, 735 B  copied,  32 MB peak memory, 0.63x,  9% faster than baseline
    NFKD/English:    OK (1.14s)
      2.20 ms ±  22 μs, 5.7 MB allocated, 727 B  copied,  32 MB peak memory, 0.72x, 27% faster than baseline
    NFKD/Japanese:   OK (0.85s)
      6.64 ms ± 117 μs, 9.7 MB allocated, 729 B  copied,  32 MB peak memory, 0.70x,  3% faster than baseline
    NFKD/Korean:     OK (1.27s)
      10.0 ms ±  97 μs,  15 MB allocated, 1.5 KB copied,  32 MB peak memory, 0.33x
    NFKD/Vietnamese: OK (1.68s)
      6.57 ms ±  45 μs, 9.3 MB allocated, 703 B  copied,  32 MB peak memory, 0.62x,  8% faster than baseline
    NFC/AllChars:    OK (1.14s)
      4.51 ms ±  57 μs, 3.8 MB allocated, 347 B  copied,  32 MB peak memory, 1.32x, 54% faster than baseline
    NFC/Deutsch:     OK (1.01s)
      1.98 ms ±  38 μs, 1.9 MB allocated, 227 B  copied,  32 MB peak memory, 1.30x, 47% faster than baseline
    NFC/Devanagari:  OK (1.48s)
      5.77 ms ±  45 μs, 6.1 MB allocated, 679 B  copied,  32 MB peak memory, 0.71x,  3% faster than baseline
    NFC/English:     OK (2.04s)
      1.97 ms ±  18 μs, 1.9 MB allocated, 223 B  copied,  32 MB peak memory, 1.28x, 23% faster than baseline
    NFC/Japanese:    OK (1.60s)
      3.14 ms ±  24 μs, 1.9 MB allocated, 219 B  copied,  32 MB peak memory, 1.25x, 68% faster than baseline
    NFC/Korean:      OK (1.12s)
      4.37 ms ±  56 μs, 1.9 MB allocated, 215 B  copied,  32 MB peak memory, 1.10x,  2% faster than baseline
    NFC/Vietnamese:  OK (1.56s)
      12.3 ms ± 205 μs,  11 MB allocated, 1.4 KB copied,  32 MB peak memory, 0.86x, 25% faster than baseline
    NFKC/AllChars:   OK (1.21s)
      9.58 ms ±  86 μs,  13 MB allocated, 1.2 KB copied,  32 MB peak memory, 1.05x, 36% faster than baseline
    NFKC/Deutsch:    OK (1.33s)
      2.58 ms ±  27 μs, 5.7 MB allocated, 623 B  copied,  32 MB peak memory, 0.92x, 40% faster than baseline
    NFKC/Devanagari: OK (0.73s)
      5.73 ms ± 101 μs, 6.1 MB allocated, 615 B  copied,  32 MB peak memory, 0.71x, 10% faster than baseline
    NFKC/English:    OK (1.03s)
      1.98 ms ±  29 μs, 1.9 MB allocated, 201 B  copied,  32 MB peak memory, 1.32x, 15% faster than baseline
    NFKC/Japanese:   OK (4.59s)
      4.58 ms ±  17 μs, 6.7 MB allocated, 599 B  copied,  32 MB peak memory, 0.94x, 64% faster than baseline
    NFKC/Korean:     OK (1.36s)
      5.33 ms ±  51 μs, 5.7 MB allocated, 591 B  copied,  32 MB peak memory, 0.71x,  4% slower than baseline
    NFKC/Vietnamese: OK (0.75s)
      12.1 ms ± 234 μs,  11 MB allocated, 1.3 KB copied,  32 MB peak memory, 0.84x, 31% faster than baseline

Some remarks:

I added lots of comments to understand and reorganized the code to understand better how it works.
I used {-# SCC #-} annotations for profiling; should I keep them?
Quick Check values are encoded in this package and not unicode-data, because for NFC & NFKC they uses a different encoding than No, Maybe, Yes.
Allocated & copied memory quite reduced for Japanese & AllChars.
I do not see other optimization at the moment.
Do we want isNormalized function? If so the current encoding of QC for composition may not be usable.
isNFC_QC and isNFKC_QC encode value in a byte, but really use 2 bits. Do we need to improve that? We could encode like this: [isNFC_QC_custom,isNFC_QC_original,isNFKC_QC_original,isNFKC_QC_custom] and select the corresponding pair of bits we need, but it implies overhead.
Need to add benchmark for variants of the files: already normalized.

harendra-kumar · 2022-07-07T21:10:09Z

@wismill these results look great! Considering that we were already comparable or faster than text-icu in most cases. Now we are even better. Also we are now comparable in the cases where we were slower earlier (NFC and NFKC for Japanese, Deutsch and AllChars).

I will take a better look when I get some time.

wismill · 2022-07-08T10:48:05Z

I updated the benchmark results to include allocated memory & diff with text-icu..

harendra-kumar · 2022-07-24T08:33:50Z

I am looking at it, will update soon.

harendra-kumar

We can put Data/Unicode/Internal/Char/DerivedNormalizationProperties.hs in the unicode-data package. Even though as of now it may be custom code for consumption of unicode-transforms. If other consumers of this data arrive at some later point we can try adapting the requirements of all consumers in the same module. In the best case other consumers may also find this format useful without having to make any changes it.

We can keep it Internal/experimental to begin with.

I did not review the QC data generation. We are relying on your due diligence and the tests.

Data/Unicode/Internal/NormalizeStream.hs

harendra-kumar · 2022-07-24T10:49:46Z

Data/Unicode/Internal/NormalizeStream.hs

+            ComposeStarter s -> {-# SCC compose_YesStarter #-} case quickCheck ch of
+                -- QC = Yes, starter (includes Jamo L & Hangul syllables),
+                -- may decompose, may compose with next
+                QC.YesStarter -> {-# SCC compose_YesStarter_YesStarter #-} do


I have not looked carefully into QC. I assume that we will be never here if the two starters can combine. Perhaps that will go in the QC.combining case.

QC.YesStarter means the second starter cannot combine with the previous, but can only with the next character.

harendra-kumar · 2022-07-24T11:08:48Z

Data/Unicode/Internal/NormalizeStream.hs

+                    -- Pending decomposition
+                    | UC.isDecomposable mode s ->
+                        {-# SCC compose_YesStarter_Decomposable_decomp #-}
+                        go (UC.decompose mode s ++ UC.decompose mode ch) i ComposeNone


I am hoping that these string appends won't be costly as the strings are small and there is only one append at max.

There was similar code in the original version. I do not think it is avoidable.

harendra-kumar · 2022-07-24T11:26:27Z

Data/Unicode/Internal/NormalizeStream.hs

+                        -- Jamo L + jamo V
+                        | UC.jamoLFirst <= cp && cp <= UC.jamoLLast &&
+                            UC.jamoVFirst <= ich && ich <= UC.jamoVLast ->
+                            pure (i, composeJamoL s ch)


The where block defining cp can be moved here as this is the only use of cp.

I could not manage to find the correct syntax.

harendra-kumar · 2022-07-24T11:33:04Z

Data/Unicode/Internal/NormalizeStream.hs

+                            UC.jamoVFirst <= ich && ich <= UC.jamoVLast ->
+                            pure (i, composeJamoL s ch)
+                        -- Hangul LV + T
+                        | UC.isHangul s && UC.isHangulLV s &&


We should clarify isHangulLV documentation - it should say that the char being tested must be a Hangul char, it does not check whether it is Hangul or not.

harendra-kumar · 2022-07-24T11:43:02Z

Data/Unicode/Internal/NormalizeStream.hs


+    -- Recursive decomposition
    go [] !i !st = pure (i, st)
    go (ch : rest) i st =


We can check if making the list strict and adding a SPEC argument helps any of the benchmarks.

harendra-kumar · 2022-07-24T12:23:14Z

unicode-transforms.cabal

      Data.Text.Normalize

+      -- Internal
+      Data.Unicode.Internal.Char.DerivedNormalizationProperties


This can go in the other modules section below.

Not possible, because it is used in tests.

harendra-kumar · 2022-07-24T12:54:41Z

I see the following benchmarks getting slower:

All
  unicode-transforms-text
    NFD/Korean:      OK (0.20s)
      13.5 ms ± 983 μs, 32% slower than baseline
    NFKD/Korean:     OK (0.21s)
      13.8 ms ± 746 μs, 36% slower than baseline
    NFC/Korean:      OK (0.26s)
      8.20 ms ± 348 μs, 30% slower than baseline
    NFC/Deutsch:     OK (0.18s)
      5.63 ms ± 350 μs, 18% slower than baseline
    NFC/English:     OK (0.18s)
      5.61 ms ± 453 μs, 23% slower than baseline
    NFKC/Korean:     OK (0.27s)
      8.59 ms ± 374 μs, 25% slower than baseline
    NFKC/Deutsch:    OK (2.90s)
      5.72 ms ±  66 μs,  6% slower than baseline
    NFKC/English:    OK (1.10s)
      4.26 ms ± 234 μs, 11% slower than baseline

Mainly Korean has significant regression. We can try to review and see if there are any opportunities to correct these.

harendra-kumar · 2022-07-24T13:01:06Z

I added lots of comments to understand and reorganized the code to understand better how it works.

thanks

I used {-# SCC #-} annotations for profiling; should I keep them?

Your judgement, if you think they might be helpful keep them.

Quick Check values are encoded in this package and not unicode-data, because for NFC & NFKC they uses a different encoding than No, Maybe, Yes.

We can still have it in unicode-data. It is not necessary to use the same encoding as in the original db, but whaetver seems generally useful as per the algo should be ok.

Do we want isNormalized function? If so the current encoding of QC for composition may not be usable.

I do not know. Unless someone requests it.

harendra-kumar · 2022-07-24T17:53:16Z

We are faster than text-icu in all cases except the following:

unicode-transforms
    NFC/Korean:      OK (0.26s)
      8.20 ms ± 348 μs, 30% slower than baseline
    NFC/Deutsch:     OK (0.18s)
      5.63 ms ± 350 μs, 18% slower than baseline
    NFC/English:     OK (0.18s)
      5.61 ms ± 453 μs, 23% slower than baseline
    NFC/Japanese:    OK (0.17s)
      6.10 ms ± 383 μs, 61% faster than baseline
    NFC/AllChars:    OK (0.24s)
      7.61 ms ± 340 μs, 44% faster than baseline
    NFKC/Deutsch:    OK (2.90s)
      5.72 ms ±  66 μs,  6% slower than baseline
    NFKC/English:    OK (1.10s)
      4.26 ms ± 234 μs, 11% slower than baseline
    NFKC/AllChars:   OK (0.20s)
      13.5 ms ± 1.3 ms, 31% faster than baseline

text-icu
    NFC/Korean:      OK (0.15s)
      5.31 ms ± 366 μs
    NFC/Deutsch:     OK (0.19s)
      2.90 ms ± 197 μs
    NFC/English:     OK (0.37s)
      2.87 ms ± 211 μs
    NFC/Japanese:    OK (0.13s)
      4.02 ms ± 374 μs
    NFC/AllChars:    OK (0.14s)
      4.47 ms ± 354 μs
    NFKC/Deutsch:    OK (0.29s)
      4.56 ms ± 269 μs
    NFKC/English:    OK (0.17s)
      2.92 ms ± 184 μs
    NFKC/AllChars:   OK (0.18s)
      11.9 ms ± 987 μs

We are slower only in some NFC cases and some of those are the ones that regressed with this change. So maybe we can improve those.

wismill · 2022-09-13T18:40:19Z

I see the following benchmarks getting slower:

All
  unicode-transforms-text
    NFD/Korean:      OK (0.20s)
      13.5 ms ± 983 μs, 32% slower than baseline
    NFKD/Korean:     OK (0.21s)
      13.8 ms ± 746 μs, 36% slower than baseline
    NFC/Korean:      OK (0.26s)
      8.20 ms ± 348 μs, 30% slower than baseline
    NFC/Deutsch:     OK (0.18s)
      5.63 ms ± 350 μs, 18% slower than baseline
    NFC/English:     OK (0.18s)
      5.61 ms ± 453 μs, 23% slower than baseline
    NFKC/Korean:     OK (0.27s)
      8.59 ms ± 374 μs, 25% slower than baseline
    NFKC/Deutsch:    OK (2.90s)
      5.72 ms ±  66 μs,  6% slower than baseline
    NFKC/English:    OK (1.10s)
      4.26 ms ± 234 μs, 11% slower than baseline

Mainly Korean has significant regression. We can try to review and see if there are any opportunities to correct these.

I do not observe these results. Mine are all same or faster than baseline (ghc 9.2.4).

Results (NNNx is compared to ICU, NNN% is against baseline)

All
  unicode-transforms-te
    NFD/AllChars:    OK (0.37s)
      5.73 ms ± 207 μs, 0.76x,  7% less than baseline
    NFD/Deutsch:     OK (0.76s)
      2.94 ms ±  85 μs, 0.80x, 17% less than baseline
    NFD/Devanagari:  OK (0.33s)
      5.17 ms ± 189 μs, 0.66x,       same as baseline
    NFD/English:     OK (0.33s)
      2.49 ms ±  90 μs, 0.79x, 18% less than baseline
    NFD/Japanese:    OK (0.41s)
      6.46 ms ± 172 μs, 0.69x,  7% more than baseline
    NFD/Korean:      OK (0.62s)
      9.86 ms ± 183 μs, 0.32x,       same as baseline
    NFD/Vietnamese:  OK (0.44s)
      6.88 ms ± 216 μs, 1.21x,       same as baseline
    NFKD/AllChars:   OK (0.54s)
      8.51 ms ± 271 μs, 0.85x, 10% less than baseline
    NFKD/Deutsch:    OK (0.62s)
      2.40 ms ±  81 μs, 0.67x, 26% less than baseline
    NFKD/Devanagari: OK (0.59s)
      4.66 ms ± 109 μs, 0.60x, 13% less than baseline
    NFKD/English:    OK (0.50s)
      1.92 ms ±  45 μs, 0.61x, 30% less than baseline
    NFKD/Japanese:   OK (0.83s)
      6.56 ms ± 107 μs, 0.64x,  6% less than baseline
    NFKD/Korean:     OK (0.61s)
      9.66 ms ± 311 μs, 0.31x,       same as baseline
    NFKD/Vietnamese: OK (0.39s)
      6.06 ms ± 169 μs, 0.55x, 13% less than baseline
    NFC/AllChars:    OK (0.45s)
      3.53 ms ±  87 μs, 0.97x, 63% less than baseline
    NFC/Deutsch:     OK (0.38s)
      1.43 ms ±  48 μs, 0.88x, 62% less than baseline
    NFC/Devanagari:  OK (0.67s)
      5.23 ms ± 120 μs, 0.63x, 12% less than baseline
    NFC/English:     OK (0.38s)
      1.42 ms ±  56 μs, 0.86x, 49% less than baseline
    NFC/Japanese:    OK (0.33s)
      2.62 ms ±  90 μs, 0.98x, 75% less than baseline
    NFC/Korean:      OK (0.53s)
      4.08 ms ±  89 μs, 1.00x, 14% less than baseline
    NFC/Vietnamese:  OK (0.73s)
      11.8 ms ± 307 μs, 0.82x, 24% less than baseline
    NFKC/AllChars:   OK (0.56s)
      9.12 ms ± 202 μs, 0.97x, 36% less than baseline
    NFKC/Deutsch:    OK (0.66s)
      2.55 ms ±  56 μs, 0.87x, 37% less than baseline
    NFKC/Devanagari: OK (0.72s)
      5.55 ms ±  94 μs, 0.68x,  7% less than baseline
    NFKC/English:    OK (0.52s)
      1.98 ms ±  45 μs, 1.23x, 30% less than baseline
    NFKC/Japanese:   OK (0.47s)
      3.77 ms ± 111 μs, 0.75x, 67% less than baseline
    NFKC/Korean:     OK (0.30s)
      4.83 ms ± 177 μs, 0.64x,  6% less than baseline
    NFKC/Vietnamese: OK (0.74s)
      12.0 ms ± 255 μs, 0.83x, 22% less than baseline

You may want to use --stdev 1 as the results can be fickle.

harendra-kumar · 2022-09-13T20:41:57Z

I do not observe these results. Mine are all same or faster than baseline (ghc 9.2.4).

It may depend on the CPU as well. I tested on Linux, AWS VM - which says "model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz" in "/proc/cpuinfo". Which CPU/Hardware are you testing on?

wismill · 2022-09-14T09:25:07Z

Sure thing. I tested on a laptop equipped with AMD Ryzen 5 2500U @ 2.0GHz, on Linux.

wismill · 2022-09-20T08:35:29Z

I tested on a different machine:

Results with GHC 9.4.1 on 4 × Intel® Core™ i5-3340M CPU @ 2.70GHz

All
  unicode-transforms-text
    NFD/AllChars:    OK (2.83s)
      11.0 ms ±  62 μs,  5% less than baseline
    NFD/Deutsch:     OK (0.87s)
      6.81 ms ± 121 μs,  5% less than baseline
    NFD/Devanagari:  OK (1.56s)
      12.2 ms ± 206 μs,  2% less than baseline
    NFD/English:     OK (1.60s)
      6.23 ms ±  51 μs,  4% less than baseline
    NFD/Japanese:    OK (0.97s)
      15.3 ms ± 267 μs,       same as baseline
    NFD/Korean:      OK (1.19s)
      18.8 ms ± 270 μs,  4% more than baseline
    NFD/Vietnamese:  OK (1.50s)
      11.8 ms ±  93 μs,  4% less than baseline
    NFKD/AllChars:   OK (0.98s)
      15.4 ms ± 289 μs,  3% less than baseline
    NFKD/Deutsch:    OK (1.91s)
      7.67 ms ±  98 μs,  2% more than baseline
    NFKD/Devanagari: OK (1.59s)
      12.2 ms ± 112 μs,  2% less than baseline
    NFKD/English:    OK (0.77s)
      5.99 ms ± 102 μs, 12% less than baseline
    NFKD/Japanese:   OK (0.99s)
      15.7 ms ± 192 μs,  1% less than baseline
    NFKD/Korean:     OK (1.19s)
      18.7 ms ± 363 μs,  3% more than baseline
    NFKD/Vietnamese: OK (2.96s)
      11.6 ms ± 175 μs,  4% less than baseline
    NFC/AllChars:    OK (5.88s)
      11.5 ms ±  95 μs, 43% less than baseline
    NFC/Deutsch:     OK (2.12s)
      8.24 ms ± 111 μs, 28% less than baseline
    NFC/Devanagari:  OK (1.93s)
      15.2 ms ± 175 μs, 18% less than baseline
    NFC/English:     OK (2.09s)
      8.16 ms ± 144 μs, 21% less than baseline
    NFC/Japanese:    OK (3.00s)
      11.7 ms ± 123 μs, 50% less than baseline
    NFC/Korean:      OK (1.80s)
      14.1 ms ± 195 μs,  2% more than baseline
    NFC/Vietnamese:  OK (0.61s)
      19.5 ms ± 351 μs, 21% less than baseline
    NFKC/AllChars:   OK (0.99s)
      15.6 ms ± 220 μs, 34% less than baseline
    NFKC/Deutsch:    OK (0.56s)
      8.81 ms ± 169 μs, 22% less than baseline
    NFKC/Devanagari: OK (0.97s)
      15.2 ms ± 169 μs, 16% less than baseline
    NFKC/English:    OK (1.09s)
      8.49 ms ±  96 μs, 16% less than baseline
    NFKC/Japanese:   OK (0.80s)
      12.6 ms ± 212 μs, 45% less than baseline
    NFKC/Korean:     OK (0.91s)
      14.6 ms ± 250 μs,  2% more than baseline
    NFKC/Vietnamese: OK (1.24s)
      19.5 ms ± 218 μs, 21% less than baseline

@harendra-kumar I observe some regressions but not as close as the one you got. Could you please re-run the benchmark carefully?

harendra-kumar · 2022-10-26T00:28:40Z

@wismill can you rebase this on master?

…. [skip ci]

wismill · 2022-10-26T06:47:27Z

@harendra-kumar done

harendra-kumar · 2022-11-01T01:23:09Z

Sorry for a long delay. I benchmarked again, the benchmarks look good.

It seems tests are failing in the CIs. Can you take a look at those and make these pass?

wismill force-pushed the wip/quick_check branch from f6e236a to 916ea7c Compare June 7, 2022 17:08

harendra-kumar reviewed Jun 13, 2022

View reviewed changes

Data/Unicode/Internal/NormalizeStream.hs Outdated Show resolved Hide resolved

wismill force-pushed the wip/quick_check branch 2 times, most recently from e188506 to 6b50a63 Compare June 20, 2022 07:30

wismill force-pushed the wip/quick_check branch 3 times, most recently from ab9a608 to bf85902 Compare July 7, 2022 17:54

wismill force-pushed the wip/quick_check branch from bf85902 to 5e6912b Compare July 7, 2022 18:49

harendra-kumar reviewed Jul 24, 2022

View reviewed changes

wismill force-pushed the wip/quick_check branch from 2955acc to 91fdefb Compare September 20, 2022 07:26

wismill force-pushed the wip/quick_check branch 2 times, most recently from e601c68 to 68b5bf6 Compare September 26, 2022 07:58

wismill added 5 commits October 26, 2022 08:46

Add .editorconfig from unicode-data

e20f603

Add Quick Check data

4a6cf11

.gitignore

00e4612

Add tests

152f085

Fix benchmark

44a7a66

wismill added 8 commits October 26, 2022 08:46

Improve Text normalization

52344c8

Fixes (cabal, stack, pattern synonym)

1f09fd5

Update instructions to update Unicode standard.

192835b

Benchmark: add built-in comparison with text-icu (tasty-bench only).

e5dc78d

Fix benchmark instructions: differentiate tasty-bench & gauge options…

adee3c7

…. [skip ci]

Review fixes

9f14385

Fix s390x

50b129c

Fix benchmark for old GHC

c89cb0d

wismill force-pushed the wip/quick_check branch from 68b5bf6 to c89cb0d Compare October 26, 2022 06:46

WIP: Quick check implementation #87

Are you sure you want to change the base?

WIP: Quick check implementation #87

Uh oh!

Conversation

wismill commented Jun 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wismill commented Jun 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wismill commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harendra-kumar commented Jul 7, 2022

Uh oh!

wismill commented Jul 8, 2022

Uh oh!

harendra-kumar commented Jul 24, 2022

Uh oh!

harendra-kumar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harendra-kumar commented Jul 24, 2022

Uh oh!

harendra-kumar commented Jul 24, 2022

Uh oh!

harendra-kumar commented Jul 24, 2022

Uh oh!

wismill commented Sep 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harendra-kumar commented Sep 13, 2022

Uh oh!

wismill commented Sep 14, 2022

Uh oh!

wismill commented Sep 20, 2022

Uh oh!

harendra-kumar commented Oct 26, 2022

Uh oh!

wismill commented Oct 26, 2022

Uh oh!

harendra-kumar commented Nov 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wismill commented Jun 7, 2022 •

edited

Loading

wismill commented Jun 19, 2022 •

edited

Loading

wismill commented Jul 7, 2022 •

edited

Loading

wismill commented Sep 13, 2022 •

edited

Loading