You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two strings that are being embedded, the first one goes thru, but the second one fails.
Here is the output of the program:
/usr/local/go/bin/go tool test2json -t /home/alok/.cache/JetBrains/GoLand2024.2/tmp/GoLand/___1TestEmbedding_in_tokenizerissue.test -test.v=test2json -test.paniconexit0 -test.run ^\QTestEmbedding\E$
2024/09/17 15:19:17 INFO: CachedDir="/home/alok/.cache/tokenizer"
=== RUN TestEmbedding
AS YOU LIKE IT
DRAMATIS PERSONAE
DUKE SENIOR living in banishment.
DUKE FREDERICK his brother, an usurper of his dominions.
AMIENS | | lords attending on the banished duke. JAQUES |
LE BEAU a courtier attending upon Frederick.
CHARLES wrestler to Frederick.
OLIVER | | JAQUES (JAQUES DE BOYS:) | sons of Sir Rowland de Boys. | ORLANDO |
ADAM | | servants to Oliver. DENNIS |
TOUCHSTONE a clown.
SIR OLIVER MARTEXT a vicar.
CORIN |
| shepherds.
SILVIUS |
WILLIAM a country fellow in love with Audrey.
A person representing HYMEN. (HYMEN:)
ROSALIND daughter to the banished duke.
CELIA daughter to Frederick.
673
AS YOU LIKE IT
DRAMATIS PERSONAE
DUKE SENIOR living in banishment.
DUKE FREDERICK his brother, an usurper of his dominions.
AMIENS | | lords attending on the banished duke. JAQUES |
LE BEAU a courtier attending upon Frederick.
CHARLES wrestler to Frederick.
OLIVER | | JAQUES (JAQUES DE BOYS:) | sons of Sir Rowland de Boys. | ORLANDO |
ADAM | | servants to Oliver. DENNIS |
TOUCHSTONE a clown.
SIR OLIVER MARTEXT a vicar.
CORIN |
| shepherds.
SILVIUS |
WILLIAM a country fellow in love with Audrey.
A person representing HYMEN. (HYMEN:)
ROSALIND daughter to the banished duke.
CELIA daughter to Frederick.
673
[[-0.03914698 -0.016538322 0.023178136 0.011642908 0.0564912 0.007894647 0.025433742 0.035716955 -0.019091763 -0.020351494 -0.01514089 0.012821367 -0.054411728 0.032000016 -0.018131282 0.08451082 0.049022898 0.017406626 0.017041788 -0.04623328 0.035323188 0.032879945 -0.00037763309 0.06970637 0.043458235 0.016123137 0.018936096 0.02063325 -0.05700099 -0.025971899 0.035846584 -0.011900318 -0.019333359 -0.02900332 -0.008514504 -0.04624489 -0.014379387 -0.020486495 0.011699142 0.006409424 -0.07082859 -0.013937539 0.005432031 -0.005396163 -0.06794449 -0.010348638 -0.036275588 0.017526548 -0.0058838953 -0.024287518 -0.048917953 0.039226435 -0.009778961 0.012310327 -0.023821943 0.022571076 -0.032029998 -0.04378382 0.028337928 -0.05277187 0.0017444869 0.006742158 0.0372529 0.0010056419 -0.00060814136 0.006685288 1.7267339e-05 0.06339974 -0.059310623 -0.021789877 -0.015668927 -0.0010967483 0.0025227757 -0.008613313 0.027283266 -0.063864276 -0.004863343 0.027290193 0.056868467 0.023454076 -0.016763298 -0.0054399003 0.006188793 0.048575606 -0.026876703 -0.03610761 -0.00043335767 0.0003789369 -0.081276685 0.039349385 -0.008940972 -0.049587313 0.03761981 0.042609006 -0.0079110265 -0.0035721026 0.051356286 -0.0044129933 0.0015475124 0.021560663 -0.031360585 -0.038248923 0.016443029 -0.00034044942 -0.09568959 -0.01081435 -0.009083789 0.03689808 0.0075237486 0.027368983 -0.005172214 -0.009384759 0.009782874 0.021921514 -0.045917857 0.06161481 0.043410208 -0.013445491 -0.0077494937 0.023404986 0.06215935 0.020469893 -0.0072953156 0.10336666 0.024090728 0.020731067 -0.011793335 0.05349762 -0.013003309 -0.06273616 0.005801809 0.05778524 -0.01770478 -0.010948908 -0.001877506 -0.042103637 0.014173885 -0.0043255757 0.035545927 0.010555955 0.0022608016 -0.01118194 0.013429064 0.020862108 0.043351796 -0.030716933 -0.017455425 -0.041980993 0.012616989 0.048666902 -0.012770984 0.031477388 0.030343024 -0.036949676 0.012917157 0.058238946 0.029963208 0.026458332 -0.021398345 0.02069551 0.03828373 0.027362816 0.01144157 0.01773172 0.022156883 -0.010513427 0.0060478807 0.01718434 -0.04091684 0.021870496 -0.072977014 -0.016520886 0.06187403 -0.041432407 -0.018332282 0.039845582 0.09559854 0.041445937 0.04684613 -0.0069881086 -0.07639642 0.040348083 0.02335101 -0.0046012397 0.026520465 -0.007890131 0.052469924 0.010194702 0.014858035 0.03759209 -0.06054353 -0.064428255 -0.02382746 0.0030103163 0.045369398 -0.019959413 -0.004659652 0.03708061 -0.0038971528 0.06279973 -0.012340761 0.020642685 0.04060641 -0.006453256 -0.061737575 0.018255593 -0.001301643 0.024874456 0.032203175 -0.011828143 -0.03851288 -0.012062547 0.0664155 -0.049623474 0.02326029 -0.015502906 0.052531365 -0.037508376 0.017440705 -0.0735822 0.025554365 -0.012990718 -0.041517846 0.023062894 -0.024853319 0.100043885 0.056865305 -0.05963884 -0.04027259 0.024926204 -0.01888787 -0.025096748 -0.0013074251 -0.01325122 -0.010748644 0.011728527 0.004855801 -0.046975892 0.03411985 -0.056537498 0.0056181317 0.053715814 0.011858979 0.079618104 0.017376544 0.01665108 0.034709867 0.0006663871 -0.056170613 -0.02711519 -0.0014543701 -0.03524299 0.0075247423 -0.022341667 0.008779559 -0.05686332 -0.032249086 0.049802165 0.03996286 0.05161114 -0.042233754 0.014376176 -0.016475571 -0.018463984 -0.013941526 -0.036131732 -0.037772164 -0.012133741 0.033861097 -0.005092063 0.02904495 -0.002741557 -0.0012500169 0.004346772 0.005123095 8.7455184e-05 0.072036505 0.00032354923 0.024320055 -0.039208207 -0.01390895 0.074305646 -0.047137924 -0.03887461 0.001901088 -0.10452757 0.03541344 -0.051450636 -0.039202023 0.0037135687 0.038421314 0.037239667 0.030913997 0.00741533 0.03195537 0.00699422 0.0046604634 0.035519995 -0.015194695 -0.0059102173 -0.0125123635 -0.0060820356 0.013914759 -0.0015158656 0.02122563 -0.02741586 0.0085247895 -0.031034654 -0.26160786 0.0021040207 0.034374084 -0.040845644 0.049236394 -0.019883346 0.040674936 -0.03596126 -0.05063188 0.035107706 -0.029123846 -0.0457412 0.010796176 0.042915713 0.039227314 0.015756665 -0.018854285 -0.045812745 -0.009172312 0.037980657 -0.021215655 -0.054714758 -0.030052204 0.024671923 0.025940834 0.059799552 -0.050287012 -0.0030677565 -0.086370826 -0.03388499 0.0021782645 0.0038881723 0.033259008 -0.015950117 -0.0035676898 -0.041105423 0.04366649 -0.0068972823 -0.021965034 0.0011830716 -0.02629944 -0.044561606 -0.023651939 0.009472122 0.05867902 -0.016693924 -0.06414829 -0.0066306265 -0.03840866 0.06758065 -0.02788127 -0.011591923 -0.005063631 0.002926456 -0.0056525413 0.0028303913 -0.0055221547 0.01315445 -0.06290255 -0.04398332 -0.012094841 -0.04480738 -0.041383084 -0.023820942 -0.008420631 -0.057843395 -0.04899028 -0.01342248 0.09446904 0.038170658 -0.040533535 -0.007015521 0.01192337 -0.08365915 0.0017968907 -0.0025380466 -0.009427974 -0.009932006 0.00026163805 0.02594476 -0.030752674 -0.026657093 0.021098124 0.008863274 -0.006488929 -0.03697985 0.023044562 -0.02419331 -0.036591124 -0.024120301 0.06960363 -0.010372081 -0.025158368 -0.013693026 0.01300504 0.02227767 -0.0015247545 -0.015730513 0.0238872 0.01825556 0.0370508 -0.074274346 0.033484608 -0.0060399654 0.0067823497 -0.0060035777 -0.05207626 0.029591309 0.03991352 0.017776724 0.056803543 -0.0036727912 0.034457386 -0.046009373 0.00023433224 -0.071260884 0.02851205 0.07166555 0.0063079665 0.038949873 -0.05573132 0.041894786 -0.036953613 -0.020935554 -0.0922639 -0.012961587 0.009381917 -0.011597907 -0.019261444 -0.00639877 -0.004511787 -0.0033551345 0.027393656 -0.024261534 0.017353045 0.00080475234 -0.05555794 -0.052705985 -0.0014381633 0.0018840844 0.021800129 0.010827761 -0.0063026166 0.03353285 0.044599503 0.0077848737 -0.0029283045 -0.00039049625 0.018483976 0.035987716 0.005219433 0.003641071 0.030400632 -0.059704714 -0.021531524 -0.032892182 0.013581656 -0.006007797 0.008786557 -0.02286594 -0.02111237 -0.04407928 -0.025530605 -0.0068782447 0.0074550346 0.062660806 -0.010601268 -0.010685531 -0.015256402 0.019312108 0.025710458 0.014006963 -0.045301154 0.01740028 -0.009736621 0.0066993353 -0.022136973 0.013612366 0.05849686 0.029680526 0.001417695 -0.03254062 -0.0018819447 0.0041718297 0.06276969 0.035705727 0.005127659 -0.06511382 -0.0036923448 -0.0047796667 -0.0006886609 0.028202135 -0.03349943 0.013126994 -0.057374008 -0.07305299 0.02789206 -0.0026524563 0.024118802 0.010876676 0.016884591 -0.006562245 -0.04699496 0.028407542 0.043413766 -0.072359815 0.061121542 0.0023021614 -0.009506745 0.017742652 -0.011882974 -0.051569894 -0.0032277745 0.013072393 0.0252644 -0.06367772 -0.012006346 -0.039752934 0.016992357 -0.01946568 0.017556485 -0.039766937 -0.015146741 0.0043553817 -0.03300536 0.041409392 -0.029696869 -0.034427825 0.03265753 -0.033445444 0.029599441 -0.015332254 0.0038055116 0.04395136 -0.019857742 -0.0037471876 -0.019987168 -0.027075827 0.0051693665 0.057406757 0.033968635 0.018858982 -0.032702416 -0.02568262 -0.015521807 0.02559059 0.011727608 -0.017817227 0.0022101407 0.04306708 0.0001521992 -0.002650939 -0.021742256 -0.012054737 0.068472214 -0.047306042 -0.014674873 0.017066197 -0.051577978 0.030212536 0.002544334 0.02917181 -0.019093212 0.02930066 0.05152553 0.009152614 0.029787736 0.0011963875 0.052472897 -0.0361598 0.00010058674 -0.06904818 0.016232267 -0.0039677448 0.011245551 0.013937295 -0.015575298 -0.046503574 0.06782438 -0.08391851 -0.026548455 0.04568559 -0.030084113 0.010012481 0.020641306 -0.069049835 0.0027308327 0.021092122 -0.03908603 0.0064549767 0.014999664 0.052215375 0.0031571654 0.02453982 0.015449896 -0.009599123 0.054865893 0.038270622 0.008379506 0.05169393 -0.0635431 0.05361065 0.027451267 -0.02504078 -0.0318296 0.021326253 -0.008771796 -0.07166529 0.0046098814 0.008210814 -0.012494197 -0.07983677 0.0322951 0.016638167 -0.027372014 -0.04498509 -0.0115331495 -0.026469693 -0.03370635 0.000676141 0.011307931 -0.011655599 0.06414379 0.018598035 0.025064886 0.063107245 -0.017471809 0.037015863 -0.0041355346 0.09167845 0.06278827 0.049575448 -0.032504965 0.094415836 -0.0070365896 -0.06828078 0.03029201 0.03385621 -0.023417555 -0.019534213 0.008425382 0.058012586 0.0021701755 0.050336093 -0.013609865 -0.011643509 -0.0058129276 -0.0142343035 0.04619372 0.015765378 0.028137436 0.038674865 0.018905077 -0.06938297 0.039243255 0.020575562 -0.027785309 0.0044124466 -0.041977398 0.033078786 0.0023755538 0.0013827555 0.080165684 0.021713875 -0.008895852 0.010854239 0.030240793 0.010076886 -0.0068736626 -0.010659401 0.0091342125 -0.016192537 -0.03269065 0.0015859033 0.014045188 -0.005773467 0.025777139 -0.03233787 0.0020606334 0.022983052 0.036939822 -0.043826174 -0.04531051 -0.052388918 -0.048537176 -0.05221436 -0.023132278 -0.008065607 -0.041005827 -0.048821874 -0.018616289 -0.036834672 -0.0131818615 0.00032311416 -0.0608724 -0.0473172 0.017388172 0.03620469 0.016872536 0.009612658 0.06283182 0.0266591 -0.0407606 -0.018680993 0.009808718 0.045869667 0.0017224478 0.020221831 -0.106909215 0.032913286 0.045634817 -0.011272518 -0.07594389 0.03301969 -0.014931814 -0.03439635 0.051964276 0.014607602 -0.0019748472 -0.031476032 -0.014223328 0.0025003545 0.010445406 0.049866706 -0.060485397 0.08876377 0.033138666 0.01942703 -0.052508734 0.015518047 0.0050181053 0.023438185 -0.06435748 -0.007261127 -0.009940068 -0.08559045 -0.02445086 0.01683098 -0.041163374 -0.044273637 0.017937073 -0.023909848 0.0026623239 0.019933624 -0.022201682 -0.029950371 -0.032257035 -0.0068081166 -0.043268044 0.032621004 0.02144448 -0.0013739939 0.019817922 -0.052019957 -0.0036603028 -0.009124586 -0.009007775 0.01633006 0.0038869274 0.010353903]]
AS YOU LIKE IT
DRAMATIS PERSONAE
DUKE SENIOR living in banishment.
DUKE FREDERICK his brother, an usurper of his dominions.
AMIENS | | lords attending on the banished duke. JAQUES |
LE BEAU a courtier attending upon Frederick.
CHARLES wrestler to Frederick.
OLIVER | | JAQUES (JAQUES DE BOYS:) | sons of Sir Rowland de Boys. | ORLANDO |
ADAM | | servants to Oliver. DENNIS |
TOUCHSTONE a clown.
SIR OLIVER MARTEXT a vicar.
CORIN |
| shepherds.
SILVIUS |
WILLIAM a country fellow in love with Audrey.
A person representing HYMEN. (HYMEN:)
ROSALIND daughter to the banished duke.
CELIA daughter to Frederick.
PHEBE a shepherdess.
AUDREY a country wench.
Lords, pages, and attendants, &c. (Forester:) (A Lord:) (First Lord:) (Second Lord:) (First Page:) (Second Page:)
835
AS YOU LIKE IT
DRAMATIS PERSONAE
DUKE SENIOR living in banishment.
DUKE FREDERICK his brother, an usurper of his dominions.
AMIENS | | lords attending on the banished duke. JAQUES |
LE BEAU a courtier attending upon Frederick.
CHARLES wrestler to Frederick.
OLIVER | | JAQUES (JAQUES DE BOYS:) | sons of Sir Rowland de Boys. | ORLANDO |
ADAM | | servants to Oliver. DENNIS |
TOUCHSTONE a clown.
SIR OLIVER MARTEXT a vicar.
CORIN |
| shepherds.
SILVIUS |
WILLIAM a country fellow in love with Audrey.
A person representing HYMEN. (HYMEN:)
ROSALIND daughter to the banished duke.
CELIA daughter to Frederick.
PHEBE a shepherdess.
AUDREY a country wench.
Lords, pages, and attendants, &c. (Forester:) (A Lord:) (First Lord:) (Second Lord:) (First Page:) (Second Page:)
835
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x730074]
goroutine 35 [running]:
github.com/sugarme/tokenizer.(*Encoding).GetIds(...)
/path/to/go/pkg/mod/github.com/sugarme/[email protected]/encoding.go:215
github.com/sugarme/tokenizer.TruncateEncodings(0xc0009d2d00, 0x0, 0xc0009d2c30?)
/path/to/go/pkg/mod/github.com/sugarme/[email protected]/util.go:108 +0x54
github.com/sugarme/tokenizer.(*Tokenizer).PostProcess(0xc000465680, 0xc0009d2d00?, 0x0?, 0x1)
/path/to/go/pkg/mod/github.com/sugarme/[email protected]/tokenizer.go:602 +0xe5
github.com/sugarme/tokenizer.(*Tokenizer).Encode(0xc000465680, {0x7a1e20, 0xc0003121e0}, 0x1)
/path/to/go/pkg/mod/github.com/sugarme/[email protected]/tokenizer.go:464 +0x2e5
github.com/sugarme/tokenizer.(*Tokenizer).EncodeBatch.func1(0x0)
/path/to/go/pkg/mod/github.com/sugarme/[email protected]/tokenizer.go:647 +0x90
created by github.com/sugarme/tokenizer.(*Tokenizer).EncodeBatch in goroutine 34
/path/to/go/pkg/mod/github.com/sugarme/[email protected]/tokenizer.go:644 +0xf5
Process finished with the exit code 1
The first chunk has 634 characters and the embedding is successful. The next chunk has 835 characters (ie the first 634 characters and an additional 201 characters beyond that) and it fails with the tokenizer nil pointer dereference error.
Has anybody faced this before, is it a known issue, and if so is there a way to work around it?
Please let me know if any additional information is required.
Since there was a similar issue reported (and closed via a code change / PR) on the tokenizer side I just forked both tokenizer and fastemebed-go and published the latest master / main branch and used them as dependency, and the error is gone.
Perhaps all that's needed to be done is to publish the latest versions of both?
@alkuma, I'd recommend you keep your project dependent on your fork. It gives you the flexibility to add any changes.
As I can see, both fastembed-go and https://github.com/sugarme/tokenizer aren't under active maintenance.
I am getting a nil pointer error with specific texts, I created a test at https://github.com/alkuma/tokenizerissue to demonstrate the issue.
There are two strings that are being embedded, the first one goes thru, but the second one fails.
Here is the output of the program:
The first chunk has 634 characters and the embedding is successful. The next chunk has 835 characters (ie the first 634 characters and an additional 201 characters beyond that) and it fails with the tokenizer nil pointer dereference error.
Has anybody faced this before, is it a known issue, and if so is there a way to work around it?
Please let me know if any additional information is required.
To execute the tests, follow these steps
git clone
the https://github.com/alkuma/tokenizerissue repositoryONNX_PATH
to the correct valueTestEmbedding
which is present in the fileembed_test.go
and you should get the errorThe text was updated successfully, but these errors were encountered: