Conversation
…alidation processes
…in import processes
…nts and adding a composite primary key
…SV price processing The db object is properly initialized from settings.get_db()
|
Hvala na PR-u, ali molim te razbi ovo u nekoliko malih nezavisnih, jer reviewanje 1300 linija promjena sa masu refactora nije baš ugodno. Također bih cijenio da refactore i/ili promjenu baze prvo dogovorimo u issueu prije, jer je ovako rizik da dobijem PR u koji uložiš puno truda jer nam se vizije ne poklapaju. Konkretno za samu paralelizaciju mislim da je dovoljno ne awaitati nego spremiti future u array i onda await all, tj 2-3 linije promjene (+ još desetak ako to želimo kao nondefault opciju). |
|
Imas pravo, ovo je bilo djelomicno eksplorativno, da vidim sta se moze napraviti. Ono sto sam pronasao je race condition kod EAN procesiranja u paralelnom radu. Tocnije deadlockove na nivou baze. Mislim da je problem bio da EAN kod ne bi postojao u bazi a da bi dodavali proizvod ili tako nekako. Tako da su ovdje dvije faze, prva je sekvencijalno procesiranje svih EAN-a i dictionary koji dijele svi paralelni proces, a druga faza je paralelno procesiranje cijena. EANi mutiraju samo u prvoj fazi. Pokusao sam i sa DB lockovima i sa semaforima, ali su se stvari previse zakomplicirale s vremenom, kod je postao prekompleksan. Malo sam zaboravio tocne detalje problema. BTW. pokusao sam se maknuti od Probati cu razbiti u nezavisne PR-ove. |
This pull request aims to significantly improve performance and reduce the time required to import large datasets. The changes primarily affect the import logic and related modules.
The previous sequential import process was a bottleneck for large data files. By parallelizing the workload, we can leverage multi-core CPUs and achieve much faster import times, making the system more scalable and responsive. There are other approaches as well, CSV imports, temporary table copy, code optimizations, race conditions evasion, and probably more.
Key Changes
Daily import speed comparison
On MacBook Pro M2:
Before optimization: ~350 seconds for 20 stores
After optimization: ~100 seconds for 20 stores
I've tested the data for consistency with previous import. Please, review that part as well.
I've tried extracting anchor_price into a new table because that value never changes and is always the same in every import. Currently, it is duplicated per row per day. However, whatever I tried, it would slow down the import process, so I've postponed that optimization for later. It would shrink the
pricestable, andanchor_priceinsert should be 99% skippable.