-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathindex.html
885 lines (845 loc) · 85 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
<!DOCTYPE html>
<html>
<head>
<title>SemanticFinder - Frontend-only Semantic Search with transformers.js</title>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="description" content="Frontend-only Semantic Search with transformers.js.">
</head>
<body>
<div class="toast" id="toastMessage">
<span id="toastText"></span>
<span id="closeToastButton" class="close-button">X</span>
</div>
<div class="container mt-1">
<div class="row justify-content-left">
<div id="introContainer" class="col-sm-9">
<img id="SemanticFinderLogo" src="./SemanticFinder.svg" alt="SemanticFinder Logo">
<div class="content" id="introContentDiv">
<br>
<h4>Frontend-only live semantic search with <a href="https://xenova.github.io/transformers.js/"
target="_blank">transformers.js</a>. <a
href="https://github.com/do-me/SemanticFinder">GitHub</a></h4>
<p>
<p>Semantic search right in your browser! Calculates the embeddings and cosine similarity
client-side without server-side inferencing.
Your <b>data is private</b> and stays in your browser.<br>
Just copy & paste any text in the text area or load one more PDFs & web pages in the advanced settings and hit <b>Find</b>. Set a different chunk size for
finer or coarser search.<br>
<b>Large books</b> can be indexed too and searched in less than 2 seconds!
Open fastest <a href="https://do-me.github.io/SemanticFinder/webgpu/">WebGPU version here</a>.<br> Examples:
<a href="?hf=King_James_Bible_24f6dc4c">The Bible (en)</a>,
<a href="?hf=Les_Misérables_2239df51">Les Misérables (fr)</a>,
<a href="?hf=Das_Kapital_c1a84fba">Das Kapital (de)</a>,
<a href="?hf=Don_Quijote_14a0b44">Don Quijote (es)</a>,
<a href="?hf=Divina_Commedia_d5a0fa67">Divina Commedia (it)</a>,
<a href="?hf=Iliad_8de5d1ea">Iliad (gr)</a>,
<a href="?hf=IPCC_Report_2023_2b260928">IPCC Report 2023 (en)</a>.
Full catalogue with pre-indexed examples on <a
href="https://huggingface.co/datasets/do-me/SemanticFinder">Huggingface</a>.
<a href="https://huggingface.co/datasets/do-me/SemanticFinder#create-semanticfinder-files"
target="_blank">Contribute the indices of the documents you indexed</a>
or open a <a href="https://github.com/do-me/SemanticFinder/issues/new" target="_blank">request
on GitHub</a> with a source URL.
</p>
</p>
</div>
</div>
</div>
</div>
<div class="container mt-1">
<div class="row justify-content-center">
<div class="col-sm-9"> <!-- 80% column for text region-->
<form class="form-floating">
<div class="form-group" id="formGroupCenter">
<div class="row no-gutters">
<div class="col-md-10">
<div class="form-floating input-group mb-2">
<input type="text" id="query-text" class="form-control"
placeholder="Enter query here" value="food" />
<label for="query-text">Semantic Query</label>
<button class="btn btn-secondary" type="button" data-bs-toggle="collapse"
data-bs-target="#advancedFeaturesContent" id="settingsButton">
<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24"
fill="currentColor" class="bi bi-gear-fill" viewBox="0 0 16 16">
<path
d="M9.405 1.05c-.413-1.4-2.397-1.4-2.81 0l-.1.34a1.464 1.464 0 0 1-2.105.872l-.31-.17c-1.283-.698-2.686.705-1.987 1.987l.169.311c.446.82.023 1.841-.872 2.105l-.34.1c-1.4.413-1.4 2.397 0 2.81l.34.1a1.464 1.464 0 0 1 .872 2.105l-.17.31c-.698 1.283.705 2.686 1.987 1.987l.311-.169a1.464 1.464 0 0 1 2.105.872l.1.34c.413 1.4 2.397 1.4 2.81 0l.1-.34a1.464 1.464 0 0 1 2.105-.872l.31.17c1.283.698 2.686-.705 1.987-1.987l-.169-.311a1.464 1.464 0 0 1 .872-2.105l.34-.1c1.4-.413 1.4-2.397 0-2.81l-.34-.1a1.464 1.464 0 0 1-.872-2.105l.17-.31c.698-1.283-.705-2.686-1.987-1.987l-.311.169a1.464 1.464 0 0 1-2.105-.872l-.1-.34zM8 10.93a2.929 2.929 0 1 1 0-5.86 2.929 2.929 0 0 1 0 5.858z" />
</svg>
</button>
</div>
</div>
<div class="col-md-2">
<button onclick="onSubmit(); return false;" id="submit_button"
class="btn btn-primary submit-button btn-lg mb-2 form-control" disabled>
Loading...
</button>
</div>
</div>
<!-- settings -->
<div class="col-12">
<div id="advancedFeaturesContent" class="collapse">
<div class="card">
<!-- <div class="card-header">
<h5 class="card-title">Settings</h5>
</div>-->
<div class="card-body">
<div class="row">
<h6>Model Selection</h6>
<div class="col-md-8">
<div class="form-floating mb-2">
<!-- all models from https://github.com/do-me/trending-huggingface-models + gte-tiny (lacking right tag: https://huggingface.co/TaylorAI/gte-tiny/discussions/5) -->
<select class="form-select form-control " id="model-name">
<option selected value="TaylorAI/gte-tiny">TaylorAI/gte-tiny | 💾22.9Mb 📥1752 ❤️102</option> <!-- smallest great model as default, rest based in HF trends -->
<option value="nomic-ai/nomic-embed-text-v1.5">nomic-ai/nomic-embed-text-v1.5 | 💾548MB | 138MB 📥6487601 ❤️252</option>
<option value="Alibaba-NLP/gte-large-en-v1.5">Alibaba-NLP/gte-large-en-v1.5 | 💾1.75GB | 361MB | 873MB | 446MB | 387MB | 446MB | 446MB 📥1038888 ❤️116</option>
<option value="mixedbread-ai/mxbai-embed-large-v1">mixedbread-ai/mxbai-embed-large-v1 | 💾1.34GB | 669MB | 337MB 📥1251460 ❤️395</option>
<option value="jinaai/jina-embeddings-v2-base-zh">jinaai/jina-embeddings-v2-base-zh | 💾641MB | 321MB | 162MB 📥15757 ❤️135</option>
<option value="Snowflake/snowflake-arctic-embed-m">Snowflake/snowflake-arctic-embed-m | 💾436MB | 144MB | 218MB | 110MB | 149MB | 110MB | 110MB 📥38632 ❤️102</option>
<option value="jinaai/jina-embeddings-v2-base-code">jinaai/jina-embeddings-v2-base-code | 💾642MB | 321MB | 162MB 📥19763 ❤️35</option>
<option value="WhereIsAI/UAE-Large-V1">WhereIsAI/UAE-Large-V1 | 💾1.34GB | 669MB | 337MB 📥651948 ❤️194</option>
<option value="jinaai/jina-reranker-v1-turbo-en">jinaai/jina-reranker-v1-turbo-en | 💾151MB | 103MB | 75.8MB | 38.3MB | 104MB | 38.3MB | 38.3MB 📥9533 ❤️40</option>
<option value="Alibaba-NLP/gte-base-en-v1.5">Alibaba-NLP/gte-base-en-v1.5 | 💾556MB | 167MB | 278MB | 147MB | 174MB | 147MB | 147MB 📥223934 ❤️23</option>
<option value="Supabase/gte-small">Supabase/gte-small | 💾133MB | 66.7MB | 34MB 📥1599458 ❤️54</option>
<option value="Snowflake/snowflake-arctic-embed-xs">Snowflake/snowflake-arctic-embed-xs | 💾90.4MB | 53.9MB | 45.3MB | 23MB | 54.6MB | 23MB | 23MB 📥32228 ❤️22</option>
<option value="Xenova/all-MiniLM-L6-v2">Xenova/all-MiniLM-L6-v2 | 💾90.4MB | 45.3MB | 23MB | 23MB 📥112 ❤️42</option>
<option value="Xenova/multilingual-e5-large">Xenova/multilingual-e5-large | 💾546kB | 1.12GB | 562MB 📥13 ❤️7</option>
<option value="jinaai/jina-embeddings-v2-base-de">jinaai/jina-embeddings-v2-base-de | 💾641MB | 321MB | 162MB 📥24919 ❤️58</option>
<option value="Xenova/colbertv2.0">Xenova/colbertv2.0 | 💾436MB | 218MB | 110MB 📥2 ❤️5</option>
<option value="nomic-ai/nomic-embed-text-v1-unsupervised">nomic-ai/nomic-embed-text-v1-unsupervised | 💾548MB | 138MB 📥11051 ❤️10</option>
<option value="nomic-ai/nomic-embed-text-v1">nomic-ai/nomic-embed-text-v1 | 💾548MB | 138MB 📥1470792 ❤️394</option>
<option value="Xenova/bge-m3">Xenova/bge-m3 | 💾607kB | 1.13GB | 570MB 📥680 ❤️20</option>
<option value="mixedbread-ai/mxbai-embed-2d-large-v1">mixedbread-ai/mxbai-embed-2d-large-v1 | 💾1.34GB | 669MB | 337MB 📥6873 ❤️32</option>
<option value="Snowflake/snowflake-arctic-embed-m-long">Snowflake/snowflake-arctic-embed-m-long | 💾548MB | 158MB | 274MB | 138MB | 165MB | 138MB | 138MB 📥17826 ❤️28</option>
<option value="Snowflake/snowflake-arctic-embed-s">Snowflake/snowflake-arctic-embed-s | 💾133MB | 60.1MB | 66.7MB | 34MB | 61.4MB | 34MB | 34MB 📥35466 ❤️11</option>
<option value="Snowflake/snowflake-arctic-embed-l">Snowflake/snowflake-arctic-embed-l | 💾1.34GB | 299MB | 669MB | 337MB | 318MB | 337MB | 337MB 📥21274 ❤️73</option>
<option value="jinaai/jina-reranker-v1-tiny-en">jinaai/jina-reranker-v1-tiny-en | 💾132MB | 99.9MB | 66.3MB | 33.4MB | 101MB | 33.4MB | 33.4MB 📥3492 ❤️11</option>
<option value="Xenova/paraphrase-albert-small-v2">Xenova/paraphrase-albert-small-v2 | 💾44.6MB | 22.4MB | 39.7MB 📥1 ❤️0</option>
<option value="Xenova/paraphrase-albert-base-v2">Xenova/paraphrase-albert-base-v2 | 💾44.7MB | 22.7MB | 40MB 📥1 ❤️0</option>
<option value="Xenova/squeezebert-uncased">Xenova/squeezebert-uncased | 💾202MB | 101MB | 51.2MB 📥2 ❤️0</option>
<option value="Xenova/squeezebert-mnli">Xenova/squeezebert-mnli | 💾202MB | 101MB | 51.3MB 📥1 ❤️0</option>
<option value="Xenova/all-distilroberta-v1">Xenova/all-distilroberta-v1 | 💾326MB | 163MB | 82.1MB 📥1 ❤️0</option>
<option value="Xenova/paraphrase-multilingual-MiniLM-L12-v2">Xenova/paraphrase-multilingual-MiniLM-L12-v2 | 💾470MB | 235MB | 118MB 📥2 ❤️8</option>
<option value="Xenova/paraphrase-MiniLM-L6-v2">Xenova/paraphrase-MiniLM-L6-v2 | 💾90.4MB | 45.3MB | 23MB 📥7 ❤️0</option>
<option value="Xenova/all-mpnet-base-v2">Xenova/all-mpnet-base-v2 | 💾436MB | 218MB | 110MB 📥8 ❤️1</option>
<option value="Xenova/all-roberta-large-v1">Xenova/all-roberta-large-v1 | 💾1.42GB | 709MB | 357MB 📥1 ❤️0</option>
<option value="Xenova/bert-base-nli-mean-tokens">Xenova/bert-base-nli-mean-tokens | 💾436MB | 218MB | 110MB 📥4 ❤️0</option>
<option value="Xenova/distilbert-base-nli-mean-tokens">Xenova/distilbert-base-nli-mean-tokens | 💾266MB | 133MB | 66.9MB 📥2 ❤️0</option>
<option value="Xenova/distilbert-base-nli-stsb-mean-tokens">Xenova/distilbert-base-nli-stsb-mean-tokens | 💾266MB | 133MB | 66.9MB 📥2 ❤️0</option>
<option value="Xenova/distiluse-base-multilingual-cased-v1">Xenova/distiluse-base-multilingual-cased-v1 | 💾539MB | 270MB | 135MB 📥66 ❤️0</option>
<option value="Xenova/distiluse-base-multilingual-cased-v2">Xenova/distiluse-base-multilingual-cased-v2 | 💾539MB | 270MB | 135MB 📥5218 ❤️2</option>
<option value="Xenova/msmarco-distilbert-base-v4">Xenova/msmarco-distilbert-base-v4 | 💾266MB | 133MB | 66.9MB 📥4 ❤️0</option>
<option value="Xenova/multi-qa-MiniLM-L6-cos-v1">Xenova/multi-qa-MiniLM-L6-cos-v1 | 💾90.4MB | 45.3MB | 23MB 📥2 ❤️2</option>
<option value="Xenova/multi-qa-distilbert-cos-v1">Xenova/multi-qa-distilbert-cos-v1 | 💾266MB | 133MB | 66.9MB 📥4 ❤️0</option>
<option value="Xenova/multi-qa-mpnet-base-cos-v1">Xenova/multi-qa-mpnet-base-cos-v1 | 💾436MB | 218MB | 110MB 📥1 ❤️0</option>
<option value="Xenova/multi-qa-mpnet-base-dot-v1">Xenova/multi-qa-mpnet-base-dot-v1 | 💾436MB | 218MB | 110MB 📥2 ❤️1</option>
<option value="Xenova/nli-mpnet-base-v2">Xenova/nli-mpnet-base-v2 | 💾436MB | 218MB | 110MB 📥1 ❤️0</option>
<option value="Xenova/paraphrase-MiniLM-L3-v2">Xenova/paraphrase-MiniLM-L3-v2 | 💾69MB | 34.6MB | 17.5MB 📥1 ❤️0</option>
<option value="Xenova/paraphrase-mpnet-base-v2">Xenova/paraphrase-mpnet-base-v2 | 💾436MB | 218MB | 110MB 📥0 ❤️0</option>
<option value="Xenova/paraphrase-multilingual-mpnet-base-v2">Xenova/paraphrase-multilingual-mpnet-base-v2 | 💾1.11GB | 555MB | 279MB 📥1635 ❤️2</option>
<option value="Xenova/xlm-r-100langs-bert-base-nli-stsb-mean-tokens">Xenova/xlm-r-100langs-bert-base-nli-stsb-mean-tokens | 💾1.11GB | 555MB | 279MB 📥2 ❤️0</option>
<option value="Xenova/all-MiniLM-L12-v2">Xenova/all-MiniLM-L12-v2 | 💾133MB | 66.7MB | 34MB 📥2 ❤️3</option>
<option value="Xenova/scibert_scivocab_uncased">Xenova/scibert_scivocab_uncased | 💾438MB | 219MB | 111MB 📥1 ❤️0</option>
<option value="Xenova/spanbert-large-cased">Xenova/spanbert-large-cased | 💾1.33GB | 666MB | 335MB 📥2 ❤️0</option>
<option value="Xenova/spanbert-base-cased">Xenova/spanbert-base-cased | 💾431MB | 216MB | 109MB 📥2 ❤️0</option>
<option value="sdan/simple-embeddings">sdan/simple-embeddings | 💾90.4MB | 23MB 📥7 ❤️0</option>
<option value="Xenova/sentence_bert">Xenova/sentence_bert | 💾436MB | 218MB | 110MB 📥2 ❤️0</option>
<option value="Xenova/e5-small-v2">Xenova/e5-small-v2 | 💾133MB | 66.7MB | 34MB 📥2 ❤️3</option>
<option value="Xenova/SapBERT-from-PubMedBERT-fulltext">Xenova/SapBERT-from-PubMedBERT-fulltext | 💾436MB | 218MB | 110MB 📥4 ❤️0</option>
<option value="Xenova/indobert-base-p1">Xenova/indobert-base-p1 | 💾496MB | 248MB | 125MB 📥0 ❤️0</option>
<option value="Xenova/UMLSBert_ENG">Xenova/UMLSBert_ENG | 💾436MB | 218MB | 110MB 📥2 ❤️1</option>
<option value="Xenova/rubert-base-cased">Xenova/rubert-base-cased | 💾709MB | 355MB | 178MB 📥1 ❤️0</option>
<option value="Xenova/kobert">Xenova/kobert | 💾367MB | 184MB | 92.8MB 📥2 ❤️0</option>
<option value="Xenova/e5-small">Xenova/e5-small | 💾133MB | 66.7MB | 34MB 📥2 ❤️0</option>
<option value="Xenova/e5-large">Xenova/e5-large | 💾1.34GB | 669MB | 337MB 📥2 ❤️0</option>
<option value="Xenova/e5-large-v2">Xenova/e5-large-v2 | 💾1.34GB | 669MB | 337MB 📥3 ❤️5</option>
<option value="Xenova/e5-base">Xenova/e5-base | 💾436MB | 218MB | 110MB 📥2 ❤️0</option>
<option value="Xenova/e5-base-v2">Xenova/e5-base-v2 | 💾436MB | 218MB | 110MB 📥2 ❤️0</option>
<option value="Xenova/multilingual-e5-base">Xenova/multilingual-e5-base | 💾1.11GB | 555MB | 279MB 📥2 ❤️2</option>
<option value="Xenova/instructor-base">Xenova/instructor-base | 💾552MB | 552MB | 140MB | 139MB | 495MB | 125MB | 439MB | 110MB 📥15 ❤️0</option>
<option value="Xenova/instructor-large">Xenova/instructor-large | 💾1.74GB | 1.74GB | 439MB | 438MB | 1.54GB | 388MB | 1.34GB | 337MB 📥8 ❤️1</option>
<option value="Xenova/sentence-t5-large">Xenova/sentence-t5-large | 💾1.74GB | 1.74GB | 439MB | 438MB | 1.54GB | 388MB | 1.34GB | 337MB 📥7 ❤️0</option>
<option value="Xenova/multilingual-e5-small">Xenova/multilingual-e5-small | 💾470MB | 235MB | 118MB 📥3 ❤️2</option>
<option value="Xenova/mms-300m">Xenova/mms-300m | 💾1.26GB | 632MB | 318MB 📥8 ❤️0</option>
<option value="Xenova/mms-1b">Xenova/mms-1b | 💾1.13MB | 1.93GB | 969MB 📥2 ❤️0</option>
<option value="Supabase/e5-small-v2">Supabase/e5-small-v2 | 💾133MB | 66.7MB | 34MB 📥4 ❤️1</option>
<option value="Supabase/all-MiniLM-L6-v2">Supabase/all-MiniLM-L6-v2 | 💾90.4MB | 45.3MB | 23MB 📥21 ❤️2</option>
<option value="Xenova/gte-small">Xenova/gte-small | 💾133MB | 66.7MB | 34MB 📥4 ❤️12</option>
<option value="Xenova/gte-base">Xenova/gte-base | 💾436MB | 218MB | 110MB 📥1 ❤️0</option>
<option value="Xenova/gte-large">Xenova/gte-large | 💾1.34GB | 669MB | 337MB 📥2 ❤️1</option>
<option value="Xenova/bge-small-en">Xenova/bge-small-en | 💾133MB | 66.7MB | 34MB 📥2 ❤️0</option>
<option value="Xenova/bge-base-en">Xenova/bge-base-en | 💾436MB | 218MB | 110MB 📥3 ❤️0</option>
<option value="Xenova/bge-large-en">Xenova/bge-large-en | 💾1.34GB | 669MB | 337MB 📥2 ❤️0</option>
<option value="ggrn/bge-small-en">ggrn/bge-small-en | 💾133MB | 66.7MB | 34MB 📥1 ❤️1</option>
<option value="Supabase/bge-small-en">Supabase/bge-small-en | 💾133MB | 66.7MB | 34MB 📥2 ❤️1</option>
<option value="Xenova/bge-base-zh">Xenova/bge-base-zh | 💾407MB | 204MB | 103MB 📥3 ❤️0</option>
<option value="Xenova/bge-large-zh">Xenova/bge-large-zh | 💾1.3GB | 650MB | 327MB 📥2 ❤️0</option>
<option value="Xenova/bge-large-zh-noinstruct">Xenova/bge-large-zh-noinstruct | 💾1.3GB | 650MB | 327MB 📥3 ❤️0</option>
<option value="Xenova/bge-small-zh">Xenova/bge-small-zh | 💾94.9MB | 47.5MB | 24MB 📥3 ❤️1</option>
<option value="Xenova/ClinicalBERT">Xenova/ClinicalBERT | 💾909MB | 455MB | 229MB 📥3 ❤️0</option>
<option value="Xenova/LaBSE">Xenova/LaBSE | 💾1.88GB | 941MB | 472MB 📥1 ❤️0</option>
<option value="Xenova/wavlm-base">Xenova/wavlm-base | 💾378MB | 189MB | 95.4MB 📥1 ❤️0</option>
<option value="Xenova/wavlm-base-plus">Xenova/wavlm-base-plus | 💾378MB | 189MB | 95.4MB 📥1 ❤️0</option>
<option value="Xenova/wavlm-large">Xenova/wavlm-large | 💾1.26GB | 632MB | 318MB 📥1 ❤️1</option>
<option value="Xenova/sentence-camembert-large">Xenova/sentence-camembert-large | 💾1.34GB | 672MB | 339MB 📥3 ❤️0</option>
<option value="Xenova/herbert-base-cased">Xenova/herbert-base-cased | 💾496MB | 248MB | 125MB 📥3 ❤️0</option>
<option value="Xenova/herbert-large-cased">Xenova/herbert-large-cased | 💾1.42GB | 709MB | 357MB 📥3 ❤️0</option>
<option value="Xenova/bge-large-en-v1.5">Xenova/bge-large-en-v1.5 | 💾1.34GB | 669MB | 337MB 📥4407 ❤️4</option>
<option value="Xenova/bge-base-en-v1.5">Xenova/bge-base-en-v1.5 | 💾436MB | 218MB | 110MB 📥5 ❤️6</option>
<option value="Xenova/bge-small-en-v1.5">Xenova/bge-small-en-v1.5 | 💾133MB | 66.7MB | 34MB 📥2 ❤️8</option>
<option value="Xenova/bge-large-zh-v1.5">Xenova/bge-large-zh-v1.5 | 💾1.3GB | 650MB | 327MB 📥7 ❤️3</option>
<option value="Xenova/bge-base-zh-v1.5">Xenova/bge-base-zh-v1.5 | 💾407MB | 204MB | 103MB 📥1 ❤️1</option>
<option value="Xenova/bge-small-zh-v1.5">Xenova/bge-small-zh-v1.5 | 💾94.9MB | 47.5MB | 24MB 📥2 ❤️1</option>
<option value="leolee9086/text2vec-base-chinese">leolee9086/text2vec-base-chinese | 💾103MB 📥1 ❤️0</option>
<option value="Xenova/long-t5-encodec-tglobal-base">Xenova/long-t5-encodec-tglobal-base | 💾922MB | 463MB | 291MB 📥1 ❤️0</option>
<option value="Xenova/jina-embeddings-v2-small-en">Xenova/jina-embeddings-v2-small-en | 💾130MB | 65MB | 32.8MB 📥23044 ❤️1</option>
<option value="Xenova/jina-embeddings-v2-base-en">Xenova/jina-embeddings-v2-base-en | 💾547MB | 274MB | 138MB 📥4846 ❤️7</option>
<option value="do-me/jina-embeddings-v2-base-en">do-me/jina-embeddings-v2-base-en | 💾547MB 📥1 ❤️0</option>
<option value="do-me/jina-embeddings-v2-small-en">do-me/jina-embeddings-v2-small-en | 💾130MB 📥1 ❤️0</option>
<option value="ProsESportu/mpweb">ProsESportu/mpweb | 💾436MB 📥1 ❤️0</option>
<option value="Xenova/clap-htsat-unfused">Xenova/clap-htsat-unfused | 💾619MB | 312MB | 161MB 📥1 ❤️0</option>
<option value="Xenova/tiny-random-ClapModel">Xenova/tiny-random-ClapModel | 💾14.8MB | 9.3MB | 6.34MB 📥1 ❤️0</option>
<option value="Xenova/larger_clap_general">Xenova/larger_clap_general | 💾783MB | 395MB | 205MB 📥1 ❤️0</option>
<option value="Xenova/larger_clap_music_and_speech">Xenova/larger_clap_music_and_speech | 💾783MB | 395MB | 205MB 📥1 ❤️2</option>
<option value="Xenova/conv-bert-base">Xenova/conv-bert-base | 💾423MB | 212MB | 107MB 📥1 ❤️0</option>
<option value="Xenova/conv-bert-medium-small">Xenova/conv-bert-medium-small | 💾70.4MB | 35.6MB | 18.6MB 📥1 ❤️0</option>
<option value="Xenova/electra-base-discriminator">Xenova/electra-base-discriminator | 💾436MB | 218MB | 110MB 📥1 ❤️0</option>
<option value="Xenova/conv-bert-small">Xenova/conv-bert-small | 💾53MB | 26.9MB | 14.1MB 📥2 ❤️0</option>
<option value="Xenova/electra-small-discriminator">Xenova/electra-small-discriminator | 💾54.2MB | 27.3MB | 14.2MB 📥1 ❤️0</option>
<option value="Xenova/nucleotide-transformer-500m-human-ref">Xenova/nucleotide-transformer-500m-human-ref | 💾1.92GB | 958MB | 482MB 📥1 ❤️0</option>
<option value="Xenova/nucleotide-transformer-500m-1000g">Xenova/nucleotide-transformer-500m-1000g | 💾1.92GB | 958MB | 482MB 📥1 ❤️0</option>
<option value="Xenova/hubert-base-ls960">Xenova/hubert-base-ls960 | 💾378MB | 189MB | 95.6MB 📥2 ❤️0</option>
<option value="Xenova/UAE-Large-V1">Xenova/UAE-Large-V1 | 💾1.34GB | 669MB | 337MB 📥2 ❤️2</option>
<option value="Todai/robbert-2022-dutch-sentence-transformers-onnx">Todai/robbert-2022-dutch-sentence-transformers-onnx | 💾473MB | 119MB 📥3 ❤️0</option>
<option value="odunola/UAE-Large-VI">odunola/UAE-Large-VI | 💾1.34GB | 337MB 📥8 ❤️0</option>
<option value="Xenova/tiny-random-RoFormerModel">Xenova/tiny-random-RoFormerModel | 💾6.69MB | 3.47MB | 1.88MB 📥2 ❤️0</option>
<option value="karrar-alwaili/UAE-Large-V1">karrar-alwaili/UAE-Large-V1 | 💾1.34GB | 337MB 📥2 ❤️0</option>
<option value="Cohee/jina-embeddings-v2-base-en">Cohee/jina-embeddings-v2-base-en | 💾434MB | 217MB | 109MB 📥3 ❤️1</option>
<option value="aurantium/clip-ViT-B-32-multilingual-v1">aurantium/clip-ViT-B-32-multilingual-v1 | 💾541MB | 136MB 📥1 ❤️1</option>
<option value="Xenova/w2v-bert-2.0">Xenova/w2v-bert-2.0 | 💾988kB | 1.16GB | 586MB 📥2 ❤️0</option>
<option value="Xenova/jina-embeddings-v2-base-zh">Xenova/jina-embeddings-v2-base-zh | 💾641MB | 321MB | 162MB 📥4 ❤️0</option>
<option value="Xenova/jina-embeddings-v2-base-de">Xenova/jina-embeddings-v2-base-de | 💾641MB | 321MB | 162MB 📥4 ❤️3</option>
<option value="Xenova/nomic-embed-text-v1">Xenova/nomic-embed-text-v1 | 💾548MB | 274MB | 138MB 📥2 ❤️0</option>
<option value="Xenova/nomic-embed-text-v1-unsupervised">Xenova/nomic-embed-text-v1-unsupervised | 💾548MB | 274MB | 138MB 📥1 ❤️0</option>
<option value="Xenova/nomic-embed-text-v1-ablated">Xenova/nomic-embed-text-v1-ablated | 💾548MB | 274MB | 138MB 📥1 ❤️0</option>
<option value="koxy-ai/gte-small">koxy-ai/gte-small | 💾133MB | 66.7MB | 34MB 📥1 ❤️0</option>
<option value="Xenova/tiny-random-ErnieModel">Xenova/tiny-random-ErnieModel | 💾450kB | 343kB | 316kB | 273kB | 348kB | 273kB | 273kB 📥1 ❤️0</option>
<option value="Xenova/ernie-2.0-large-en">Xenova/ernie-2.0-large-en | 💾1.34GB | 299MB | 669MB | 337MB | 318MB | 337MB | 337MB 📥1 ❤️0</option>
<option value="Xenova/ernie-2.0-base-en">Xenova/ernie-2.0-base-en | 💾436MB | 144MB | 218MB | 110MB | 149MB | 110MB | 110MB 📥1 ❤️0</option>
<option value="Xenova/ernie-health-zh">Xenova/ernie-health-zh | 💾411MB | 120MB | 206MB | 104MB | 125MB | 104MB | 104MB 📥1 ❤️0</option>
<option value="Xenova/ernie-3.0-mini-zh">Xenova/ernie-3.0-mini-zh | 💾107MB | 70.9MB | 53.8MB | 27.2MB | 71.5MB | 27.2MB | 27.2MB 📥1 ❤️0</option>
<option value="Xenova/ernie-3.0-nano-zh">Xenova/ernie-3.0-nano-zh | 💾71.4MB | 55.3MB | 35.8MB | 18.1MB | 55.6MB | 18.1MB | 18.1MB 📥2 ❤️0</option>
<option value="Xenova/ernie-3.0-micro-zh">Xenova/ernie-3.0-micro-zh | 💾93.1MB | 68.8MB | 46.6MB | 23.5MB | 69.2MB | 23.5MB | 23.5MB 📥1 ❤️0</option>
<option value="Xenova/ernie-gram-zh">Xenova/ernie-gram-zh | 💾397MB | 105MB | 199MB | 100MB | 111MB | 100MB | 100MB 📥1 ❤️0</option>
<option value="Xenova/text2vec-base-chinese-paraphrase">Xenova/text2vec-base-chinese-paraphrase | 💾470MB | 178MB | 235MB | 119MB | 183MB | 119MB | 119MB 📥1 ❤️1</option>
<option value="Xenova/text2vec-base-chinese-sentence">Xenova/text2vec-base-chinese-sentence | 💾470MB | 178MB | 235MB | 119MB | 183MB | 119MB | 119MB 📥1 ❤️0</option>
<option value="Xenova/tiny-random-ErnieMModel">Xenova/tiny-random-ErnieMModel | 💾32.3MB | 32.2MB | 16.2MB | 8.23MB | 32.2MB | 8.23MB | 8.23MB 📥1 ❤️0</option>
<option value="Xenova/GIST-small-Embedding-v0">Xenova/GIST-small-Embedding-v0 | 💾133MB | 60.1MB | 66.7MB | 34MB | 61.4MB | 34MB | 34MB 📥0 ❤️0</option>
<option value="lightbird-ai/nomic">lightbird-ai/nomic | 💾548MB | 138MB 📥10 ❤️0</option>
<option value="sirius422/multilingual-e5-large-onnx">sirius422/multilingual-e5-large-onnx | 💾546kB | 562MB 📥1 ❤️0</option>
<option value="corto-ai/nomic-embed-text-v1">corto-ai/nomic-embed-text-v1 | 💾548MB | 138MB 📥9058 ❤️0</option>
<option value="michaelfeil/jina-embeddings-v2-base-code">michaelfeil/jina-embeddings-v2-base-code | 💾642MB | 321MB | 162MB 📥22 ❤️0</option>
<option value="bdx33/stella-mrl-large-zh-v3.5-1792d">bdx33/stella-mrl-large-zh-v3.5-1792d | 💾1.3GB | 327MB 📥2 ❤️0</option>
<option value="aseovic/all-mpnet-base-v2">aseovic/all-mpnet-base-v2 | 💾436MB | 218MB | 110MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-hopper-expert">onnx-community/decision-transformer-gym-hopper-expert | 💾4.6MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-hopper-medium">onnx-community/decision-transformer-gym-hopper-medium | 💾4.6MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-hopper-medium-replay">onnx-community/decision-transformer-gym-hopper-medium-replay | 💾4.6MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-hopper-expert-new">onnx-community/decision-transformer-gym-hopper-expert-new | 💾4.6MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-halfcheetah-expert">onnx-community/decision-transformer-gym-halfcheetah-expert | 💾4.61MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-halfcheetah-medium">onnx-community/decision-transformer-gym-halfcheetah-medium | 💾4.61MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-halfcheetah-medium-replay">onnx-community/decision-transformer-gym-halfcheetah-medium-replay | 💾4.61MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-walker2d-expert">onnx-community/decision-transformer-gym-walker2d-expert | 💾4.61MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-walker2d-medium">onnx-community/decision-transformer-gym-walker2d-medium | 💾4.61MB 📥0 ❤️0</option>
<option value="onnx-community/decision-transformer-gym-walker2d-medium-replay">onnx-community/decision-transformer-gym-walker2d-medium-replay | 💾4.61MB 📥0 ❤️0</option>
</select>
<label for="model-name">Model:</label>
</div>
</div>
<div class="col-md-2">
<div class="form-check form-switch mb-2">
<input class="form-check-input" type="checkbox" role="switch"
id="quantized" checked>
<label class="form-check-label" for="quantized"
title="Use ~4x smaller & faster quantized but less accurate models (good for low bandwidth)">Quantized</label>
</div>
</div>
<div class="col-md-6">
<h6>Chunking Settings</h6>
<div class="row">
<div class="col-md-6">
<div class="form-group">
<div class="form-floating">
<select class="form-select form-control"
id="split-type">
<option value="Sentence">Sentence</option>
<option value="Chars" selected># Chars</option>
<option value="Words"># Words</option>
<option value="Tokens"># Tokens</option>
<option value="Regex">Regex</option>
<option value="JinaAI">JinaAI Segmenter API</option>
</select>
<label for="split-type">Split by</label>
</div>
</div>
</div>
<div class="col-md-6">
<div class="form-group">
<div class="form-floating mb-2">
<input id="split-param" class="form-control"
type="number" min="1" value="100" />
<label for="split-param"># Chars</label>
</div>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>App Settings</h6>
<div class="row">
<div class="col-md-4">
<div class="form-group">
<div class="form-floating mb-2">
<input type="number" id="threshold" class="form-control"
value="20" min="1" step="1" />
<label for="threshold"># Results</label>
</div>
</div>
</div>
<div class="col-md-4">
<div class="form-group">
<div class="form-floating mb-2">
<input id="update-rate" class="form-control"
type="number" min="1" value="5" />
<label for="update-rate"># Updates</label>
</div>
</div>
</div>
<div class="col-md-4">
<div class="form-check form-switch mb-2">
<input class="form-check-input" type="checkbox" role="switch"
id="autoScrollIntoView">
<label class="form-check-label" for="autoScrollIntoView"
title="Turn on automatic scrolling to most relevant result">Autoscroll</label>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Include Words</h6>
<div class="row">
<div class="col-md-6">
<div class="form-group">
<div class="form-floating mb-2">
<input id="wordsToCheckAny" class="form-control"
type="text" value="" />
<label for="wordsToCheckAny">Any of</label>
</div>
</div>
</div>
<div class="col-md-6">
<div class="form-group">
<div class="form-floating mb-2">
<input id="wordsToCheckAll" class="form-control"
type="text" value="" />
<label for="wordsToCheckAll">All of</label>
</div>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Exclude Words</h6>
<div class="row">
<div class="col-md-6">
<div class="form-group">
<div class="form-floating mb-2">
<input id="wordsToAvoidAny" class="form-control"
type="text" value="" />
<label for="wordsToAvoidAny">Any of</label>
</div>
</div>
</div>
<div class="col-md-6">
<div class="form-group">
<div class="form-floating mb-2">
<input id="wordsToAvoidAll" class="form-control"
type="text" value="" />
<label for="wordsToAvoidAll">All of</label>
</div>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Import one or multiple PDF File(s)</h6>
<div class="row">
<div class="col-md-8">
<div class="d-flex mb-2 align-items-center">
<input type="file" class="form-control mr-2"
id="pdf-upload" multiple accept="application/pdf"/>
</div>
</div>
<div class="col-md-4">
<div class="d-flex mb-2 align-items-center">
<button type="button" class="btn btn-primary"
id="confirm-pdf-upload">
📂 Import</button>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Import Remote PDF file(s) & web pages space-separated using <a href="https://corsproxy.io/" target="_blank">corsproxy.io</a></h6>
<div class="row">
<div class="col-md-8">
<div class="form-floating mb-2">
<input id="importPdfURL" class="form-control w-100" type="text"
value="https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf" />
<label for="importPdfURL">Import URL(s)</label>
</div>
</div>
<div class="col-md-4">
<div class="d-flex mb-2 align-items-center">
<button type="button" class="btn btn-primary"
id="confirm-remote-pdf-upload">
⬇️ Import</button>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Import Local Index File</h6>
<div class="row">
<div class="col-md-8">
<div class="d-flex mb-2 align-items-center">
<input type="file" class="form-control mr-2"
id="file-upload" />
</div>
</div>
<div class="col-md-4">
<div class="d-flex mb-2 align-items-center">
<button type="button" class="btn btn-primary"
id="confirm-upload">
📂 Import</button>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Import Remote Index File (<a href="https://huggingface.co/datasets/do-me/SemanticFinder#catalogue" target="_blank">Examples</a>)</h6>
<div class="row">
<div class="col-md-8">
<div class="form-floating mb-2">
<input id="importURL" class="form-control w-100" type="text"
value="https://huggingface.co/datasets/do-me/SemanticFinder/resolve/main/Hansel_and_Gretel_4de079eb.json.gz" />
<label for="importURL">Import URL</label>
</div>
</div>
<div class="col-md-4">
<div class="d-flex mb-2 align-items-center">
<button type="button" class="btn btn-primary"
id="confirm-remote-upload">
⬇️ Import</button>
</div>
</div>
</div>
</div>
<h6>Export Index File</h6>
<div class="col-md-3">
<div class="form-floating mb-2">
<input id="textTitle" class="form-control w-100" type="text"
value="Hansel and Gretel" />
<label for="textTitle">Title</label>
</div>
</div>
<div class="col-md-3">
<div class="form-floating mb-2">
<input id="textAuthor" class="form-control w-100" type="text"
value="Brothers Grimm" />
<label for="textAuthor">Author</label>
</div>
</div>
<div class="col-md-2">
<div class="form-floating mb-2">
<input id="textYear" class="form-control w-100" type="number"
min="1" value="1812" />
<label for="textYear"># Year</label>
</div>
</div>
<div class="col-md-2">
<div class="form-floating mb-2">
<input id="textLanguage" class="form-control w-100" type="text"
value="en" />
<label for="textLanguage">Language (en)</label>
</div>
</div>
<div class="col-md-2">
<div class="d-flex mb-2 align-items-center">
<button type="button" class="btn btn-primary" id="resetMetadata">🔄
Metadata</button>
</div>
</div>
<div class="col-md-4">
<div class="form-floating mb-2">
<input id="textSourceURL" class="form-control w-100" type="text"
value="https://www.grimmstories.com/en/grimm_fairy-tales/hansel_and_gretel" />
<label for="textSourceURL">Source URL</label>
</div>
</div>
<div class="col-md-4">
<div class="form-floating mb-2">
<input id="textNotes" class="form-control w-100" type="text"
value="" />
<label for="textNotes">Notes</label>
</div>
</div>
<div class="col-md-2">
<div class="form-floating mb-2">
<input id="exportDecimals" class="form-control w-100" type="number"
min="1" value="5" />
<label for="exportDecimals"># Emb. Decimals</label>
</div>
</div>
<!-- Button for downloading only the index. However, considering
that the index makes up for 99% file size, I don't see a good use case yet...-->
<div class="col-md-3" hidden>
<button type="button" id="exportEmbeddingsDict"
class="btn btn-primary w-100">Index only</button>
</div>
<div class="col-md-2">
<div class="d-flex mb-2 align-items-center">
<button type="button" class="btn btn-primary"
id="exportEmbeddingsDictWithText">💾 Export</button>
</div>
</div>
<div class="col-md-6">
<h6>Style Preferences</h6>
<div class="row">
<div class="col-md-8"
title="Monospace | Calibri | Open Sans | Arial | Arial | Arial Black | Comic Sans MS | Courier New | Georgia | Impact | Times New Roman | Trebuchet MS | Verdana | Tahoma | Lucida Console | Lucida Sans Unicode | Palatino Linotype | Book Antiqua | Palatino | Symbol | Wingdings">
<div class="form-floating mb-2">
<input id="font-family" class="form-control w-100" type="text"
value="Open Sans" />
<label for="font-family">Font-Family</label>
</div>
</div>
<div class="col-md-4">
<div class="form-floating mb-2">
<input id="font-size" class="form-control w-100" type="number"
min="1" value="15" />
<label for="font-size"># Font-Size</label>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<h6>Experimental Expert Settings (best leave defaults)</h6>
<div class="row">
<div class="col-md-4">
<div class="form-check form-switch mb-2">
<input class="form-check-input" type="checkbox" role="switch"
id="inferencingActive" checked>
<label class="form-check-label" for="inferencingActive"
title="Use only existing embeddings from index and do not inference anything but the user query">Inferencing</label>
</div>
</div>
<div class="col-md-4">
<div class="form-check form-switch mb-2">
<input class="form-check-input" type="checkbox" role="switch"
id="firstOnly">
<label class="form-check-label" for="firstOnly"
title="Highlight only first match">First match only</label>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- progress bar -->
<div class="col-12">
<div class="progress" role="progressbar" aria-label="Loading model" aria-valuenow="0"
aria-valuemin="0" aria-valuemax="100">
<div class="progress-bar bg-secondary" id="loading-progress" style="width: 0;">""</div>
</div>
</div>
<div>
<div class="progress" role="progressbar" aria-label="Progress Bar" aria-valuenow="0"
aria-valuemin="0" aria-valuemax="100" style="height: 75%">
<div id="progressBarProgress" class="progress-bar progress-bar-striped"
style="width: 0%;">""
</div>
</div>
</div>
<!-- text box -->
<div class="form-floating">
<textarea id="input-text" class="form-control"
placeholder="Enter">Near a great forest there lived a poor woodcutter and his wife, and his two children; the boy's name was Hansel and the girl's Grethel. They had very little to bite or to sup, and once, when there was great dearth in the land, the man could not even gain the daily bread. As he lay in bed one night thinking of this, and turning and tossing, he sighed heavily, and said to his wife, "What will become of us? we cannot even feed our children; there is nothing left for ourselves."
"I will tell you what, husband," answered the wife; "we will take the children early in the morning into the forest, where it is thickest; we will make them a fire, and we will give each of them a piece of bread, then we will go to our work and leave them alone; they will never find the way home again, and we shall be quit of them."
"No, wife," said the man, "I cannot do that; I cannot find in my heart to take my children into the forest and to leave them there alone; the wild animals would soon come and devour them." - "O you fool," said she, "then we will all four starve; you had better get the coffins ready," and she left him no peace until he consented. "But I really pity the poor children," said the man.
The two children had not been able to sleep for hunger, and had heard what their step-mother had said to their father. Grethel wept bitterly, and said to Hansel, "It is all over with us."
"Do be quiet, Grethel," said Hansel, "and do not fret; 1 will manage something." And when the parents had gone to sleep he got up, put on his little coat, opened the back door, and slipped out. The moon was shining brightly, and the white flints that lay in front of the house glistened like pieces of silver. Hansel stooped and filled the little pocket of his coat as full as it would hold. Then he went back again, and said to Grethel, "Be easy, dear little sister, and go to sleep quietly; God will not forsake us," and laid himself down again in his bed. When the day was breaking, and before the sun had risen, the wife came and awakened the two children, saying, "Get up, you lazy bones; we are going into the forest to cut wood." Then she gave each of them a piece of bread, and said, "That is for dinner, and you must not eat it before then, for you will get no more." Grethel carried the bread under her apron, for Hansel had his pockets full of the flints. Then they set off all together on their way to the forest. When they had gone a little way Hansel stood still and looked back towards the house, and this he did again and again, till his father said to him, "Hansel, what are you looking at? take care not to forget your legs."
"O father," said Hansel, "lam looking at my little white kitten, who is sitting up on the roof to bid me good-bye." - "You young fool," said the woman, "that is not your kitten, but the sunshine on the chimney-pot." Of course Hansel had not been looking at his kitten, but had been taking every now and then a flint from his pocket and dropping it on the road. When they reached the middle of the forest the father told the children to collect wood to make a fire to keep them, warm; and Hansel and Grethel gathered brushwood enough for a little mountain; and it was set on fire, and when the flame was burning quite high the wife said, "Now lie down by the fire and rest yourselves, you children, and we will go and cut wood; and when we are ready we will come and fetch you."
So Hansel and Grethel sat by the fire, and at noon they each ate their pieces of bread. They thought their father was in the wood all the time, as they seemed to hear the strokes of the axe: but really it was only a dry branch hanging to a withered tree that the wind moved to and fro. So when they had stayed there a long time their eyelids closed with weariness, and they fell fast asleep.
When at last they woke it was night, and Grethel began to cry, and said, "How shall we ever get out of this wood? "But Hansel comforted her, saying, "Wait a little while longer, until the moon rises, and then we can easily find the way home." And when the full moon got up Hansel took his little sister by the hand, and followed the way where the flint stones shone like silver, and showed them the road. They walked on the whole night through, and at the break of day they came to their father's house. They knocked at the door, and when the wife opened it and saw that it was Hansel and Grethel she said, "You naughty children, why did you sleep so long in the wood? we thought you were never coming home again!" But the father was glad, for it had gone to his heart to leave them both in the woods alone.
Not very long after that there was again great scarcity in those parts, and the children heard their mother say at night in bed to their father, "Everything is finished up; we have only half a loaf, and after that the tale comes to an end. The children must be off; we will take them farther into the wood this time, so that they shall not be able to find the way back again; there is no other way to manage." The man felt sad at heart, and he thought, "It would better to share one's last morsel with one's children." But the wife would listen to nothing that he said, but scolded and reproached him. He who says A must say B too, and when a man has given in once he has to do it a second time.</textarea>
</div>
<!-- navigation buttons -->
<div class="text-center" id="submitGroup">
<button id="prev" class="btn btn-md btn-secondary mb-2 nav-button" disabled>👈
prev
</button>
<button id="next" class="btn btn-md btn-secondary mb-2 nav-button" disabled>next
👉
</button>
</div>
</div>
</form>
</div>
<div class="col-sm-3"> <!-- 20% column -->
<div id="results">
<ul id="results-list"></ul>
</div>
</div>
<div class="row justify-content-center">
<div class="col-12">
<div>
<hr />
<h4>Dimensionality Reduction (New🔥) </h4>
<p>Run a search as usual or load an index. Then hit "Dim-Reduction" in the advanced settings.
More iterations yield better results but take more time to compute. If the points are too
small increase the radius. Using a fast wasm implementation of Barnes-Hut tSNE (<a
href="https://github.com/Lv-291/wasm-bhtsne" target="_blank">wasm-bhtSNE</a>).</p>
<div class="d-flex flex-row">
<button type="button" class="btn btn-md btn-primary mb-2 nav-button"
id="dimensionalityReduction" disabled>🌌 Dim-Reduction</button>
<div class="row">
<div class="col-md-3">
<div class="form-floating mb-2">
<input id="dimReductionIterations" class="form-control w-100" type="number"
min="1" value="1000" />
<label for="dimReductionIterations"># Iterations</label>
</div>
</div>
<div class="col-md-3">
<div class="form-floating mb-2">
<input id="scatterplotRadius" class="form-control w-100" type="number" min="1"
value="800" />
<label for="scatterplotRadius"># Radius</label>
</div>
</div>
<div class="col-md-4">
<div class="form-floating mb-2">
<input id="dimensionalityReductionSimilarityThreshold" class="form-control w-100" type="number" min="0" max="1" step="0.01" value="0" />
<label for="dimensionalityReductionSimilarityThreshold"># Similarity Threshold</label>
</div>
</div>
</div>
</div>
<div id="plot-container">
<canvas id="deckgl"></canvas>
</div>
</div>
</div>
</div>
<div class="row justify-content-center">
<div class="col-12">
<div>
<hr />
<h4>Chat</h4>
<p>Enter a question to be answered and use the placeholders <b>SEARCH_RESULTS</b> or <b>FULL_TEXT</b> for context (Retrieval Augmented Generation, RAG).<br> If you encounter errors, the input is probably
too long (either too many or too
long results or too long prompt). Also, make
sure to check the right prompting style! Xenova/Qwen1.5-1.8B-Chat is by far the best
quantized
model currently available and delivers good results. At some point <a href="https://github.com/xenova/transformers.js/pull/379"
target="_blank">Falcon & Mistral/Zephyr models</a> will probably become available
here.<br><b>Attention</b>: Loads very large models with
more than 1.5Gb (!) of resources.</p>
<div class="form-floating input-group mb-2">
<input id="chat_query" class="form-control"
value="Based on the following context, answer the question: What are these paragraphs about? Context: SEARCH_RESULTS">
<label for="chat_query">Chat Query</label>
</div>
<div class="d-flex flex-row">
<button id="get_chat" class="btn btn-md btn-primary mb-2 nav-button">💬 Chat</button>
<div class="row">
<div class="col-md-8">
<div class="form-floating mb-2">
<select class="form-select form-control" id="chat-model-name">
<option selected value="Xenova/Qwen1.5-0.5B-Chat">Xenova/Qwen1.5-0.5B-Chat (0.482Gb)</option>
<option value="Xenova/Qwen1.5-1.8B-Chat">Xenova/Qwen1.5-1.8B-Chat (1.87Gb)</option>
<option value="Xenova/LaMini-Flan-T5-783M">
Xenova/LaMini-Flan-T5-783M | 💾1.65Gb 📥17 ❤️20</option>
<option value="Xenova/t5-small">Xenova/t5-small | 💾81Mb 📥2
❤️2</option>
<option value="Xenova/flan-t5-small">Xenova/flan-t5-small</option>
<option value="Xenova/LaMini-Flan-T5-783M">Xenova/LaMini-Flan-T5-783M
</option>
<option value="Xenova/LaMini-Flan-T5-248M">Xenova/LaMini-Flan-T5-248M
</option>
<option value="Xenova/LaMini-Flan-T5-77M">Xenova/LaMini-Flan-T5-77M</option>
<option value="Xenova/LaMini-T5-61M">Xenova/LaMini-T5-61M</option>
<option value="Xenova/LaMini-T5-738M">Xenova/LaMini-T5-738M</option>
<option value="Xenova/LaMini-T5-223M">Xenova/LaMini-T5-223M</option>
<option value="Xenova/mt5-small">Xenova/mt5-small</option>
<option value="Xenova/mt5-base">Xenova/mt5-base</option>
<option value="Xenova/t5-base">Xenova/t5-base</option>
<option value="Xenova/t5-v1_1-base">Xenova/t5-v1_1-base</option>
<option value="Xenova/flan-t5-base">Xenova/flan-t5-base</option>
<option value="Xenova/t5-v1_1-small">Xenova/t5-v1_1-small</option>
<option value="Xenova/blenderbot-400M-distill">
Xenova/blenderbot-400M-distill</option>
<option value="Xenova/blenderbot_small-90M">Xenova/blenderbot_small-90M
</option>
<option value="Xenova/long-t5-tglobal-base">Xenova/long-t5-tglobal-base
</option>
<option value="Xenova/long-t5-local-base">Xenova/long-t5-local-base</option>
<option value="Xenova/long-t5-tglobal-base-16384-book-summary">
Xenova/long-t5-tglobal-base-16384-book-summary</option>
</select>
<label for="model-name">Model:</label>
</div>
</div>
<div class="col-md-4">
<div class="form-floating mb-2">
<input type="number" id="chat_max_new_tokens" class="form-control" value="10000"
min="1" step="1" />
<label for="max_new_tokens"># max new tokens</label>
</div>
</div>
</div>
</div>
<div id="chat_text" class="ml-2"></div>
<!-- progress bar -->
<div class="col-12">
<div class="progress" role="progressbar" aria-label="Loading model" aria-valuenow="0"
aria-valuemin="0" aria-valuemax="100">
<div class="progress-bar bg-secondary" id="chat-progress" style="width: 0;">""</div>
</div>
</div>
<div>
<div class="progress" role="progressbar" aria-label="Progress Bar" aria-valuenow="0"
aria-valuemin="0" aria-valuemax="100" style="height: 75%">
<div id="progressBarChat" class="progress-bar progress-bar-striped" style="width: 0%;">
""
</div>
</div>
</div>
</div>
</div>
<div class="col-12">
<div>
<hr />
<h4>Ollama Chat Integration (New🔥)</h4>
<p>Enter a question to be answered and use the placeholders <b>SEARCH_RESULTS</b> or <b>FULL_TEXT</b> for context.<br> Install <a href="https://ollama.com/" target="_blank">Ollama</a> locally on macOS, Linux or Windows and connect your server (currently only default http://localhost:11434 supported).<br>
Make sure to set the environment variable so that requests from SemanticFinder are allowed:<br>
- on Windows Powershell: <code>$env:OLLAMA_ORIGINS="https://do-me.github.io"; ollama serve</code><br>
- on Ubuntu: <code>OLLAMA_ORIGINS="https://do-me.github.io" ollama serve</code><br>
Due to <a href="https://github.com/ollama/ollama/issues/669" target="_blank">CORS issues</a> currently only working on Chromium-based browsers like Chrome and Edge.</p>
<div class="form-floating input-group mb-2">
<input id="ollama_chat_query" class="form-control"
value="Based on the following context, answer the question: What is this text about? Context: FULL_TEXT">
<label for="ollama_chat_query">Chat Query</label>
</div>
<div class="d-flex flex-row">
<button id="ollama_get_chat" class="btn btn-md btn-primary mb-2 nav-button">🦙 Chat</button>
<div class="row">
<div class="col-md-4" hidden>
<div class="form-floating mb-2">
<input type="text" id="ollama_chat_server" class="form-control" value="http://localhost:11434"/>
<label for="ollama_chat_server">Ollama Server</label>
</div>
</div>
<div class="col-md-8">
<div class="form-floating mb-8">
<input type="text" id="ollama_chat_model" class="form-control" value="llama2"/>
<label for="ollama_chat_model">Model</label>
</div>
</div>
<div class="col-md-4" hidden>
<div class="form-floating mb-2">
<input type="number" id="ollama_chat_max_new_tokens" class="form-control" value="100"
min="1" step="1" />
<label for="ollama_chat_max_new_tokens"># max new tokens</label>
</div>
</div>
</div>
</div>
<div id="ollama_chat_text" class="ml-2"></div>
</div>
</div>
<div class="col-12">
<div>
<hr />
<h4>Summary (Retrieval Augmented Generation, RAG)</h4>
<p>Summarizes the top search results. Works best with non-fictional texts and longer text
chunks (>200 chars).<br><b>Attention</b>: Loads very large models with hundreds of MB!</p>
<br>
<div class="d-flex flex-row">
<button id="get_summary" class="btn btn-md btn-primary mb-2 nav-button"
disabled>📝 Summarize</button>
<div class="row">
<div class="col-md-8">
<div class="form-floating mb-2">
<select class="form-select form-control" id="summary-model-name">
<option value="Xenova/distilbart-cnn-6-6">Xenova/distilbart-cnn-6-6</option>
<option value="Xenova/bart-large-cnn">Xenova/bart-large-cnn</option>
<option value="ahmedaeb/distilbart-cnn-6-6-optimised">
ahmedaeb/distilbart-cnn-6-6-optimised</option>
<option value="Xenova/distilbart-xsum-12-1">Xenova/distilbart-xsum-12-1
</option>
<option value="Xenova/distilbart-xsum-6-6">Xenova/distilbart-xsum-6-6
</option>
<option value="Xenova/distilbart-xsum-12-3">Xenova/distilbart-xsum-12-3
</option>
<option value="Xenova/distilbart-xsum-9-6">Xenova/distilbart-xsum-9-6
</option>
<option selected value="Xenova/distilbart-xsum-12-6">
Xenova/distilbart-xsum-12-6</option>
<option value="Xenova/distilbart-cnn-12-3">Xenova/distilbart-cnn-12-3
</option>
<option value="Xenova/distilbart-cnn-12-6">Xenova/distilbart-cnn-12-6
</option>
<option value="Xenova/bart-large-xsum">Xenova/bart-large-xsum</option>
</select>
<label for="model-name">Model:</label>
</div>
</div>
<div class="col-md-4">
<div class="form-floating mb-2">
<input type="number" id="summary_max_new_tokens" class="form-control"
value="100" min="1" step="1" />
<label for="max_new_tokens"># max new tokens</label>
</div>
</div>
</div>
</div>
<div class="d-flex align-items-center flex-row">
<p id="summary_text" class="ml-2"></p>
</div>
<!-- progress bar -->
<div class="col-12">
<div class="progress" role="progressbar" aria-label="Loading model" aria-valuenow="0"
aria-valuemin="0" aria-valuemax="100">
<div class="progress-bar bg-secondary" id="summary-progress" style="width: 0;">""</div>
</div>
</div>
<div>
<div class="progress" role="progressbar" aria-label="Progress Bar" aria-valuenow="0"
aria-valuemin="0" aria-valuemax="100" style="height: 75%">
<div id="progressBarSummary" class="progress-bar progress-bar-striped"
style="width: 0%;">""
</div>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<div id="tooltip" class="tooltip"></div>
</body>
</html>