Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC C static shared string HEKPOOL API #23042

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from
Open

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Feb 27, 2025

POC C static shared string HEKPOOL API.

Related/semi potential fix to #22872

Slightly related to my inter-ithread malloc ShHEK code in #14725

fails a couple tests, fails asserts on perl_destruct, has bugs where hvkvsplit isn't working correctly on HV* PL_strtab and not sorting collisions with malloc ShHEKs ontop of the LL and GShHEKs on the bottom of the LL

note all ithreads use/see the same GShHEKs ptrs. but all ithreads keep independent HV* PL_strtab

parts of this patch aren't fully baked, lazy loading/computing the hash keys which are per OS proc is done "lazy", but there are plenty of HEKs/strings, which are involuntarily forced by P5/P5P onto users perl -e"0;"

not baked, initiallly i planned on duplicate hash num len bytes malloc HEKs and GHEKs both living at the same HvARRAY slot in PL_strtab as collisions, later GHEKs are sorted below malloc HEKs, later prohibiting creation of malloc HEKs that conflict/identical to GHEKs and vivi the GHEK instead, this search code, half baked, tried some optimizations like a bitfield of know lengths vs GCC's horrible multi KB C switch jump tables, and MSVC's massive if else trees (msvc -O1 doesn't include C switch jumptables (fn ptrs half way into a fn), msvc -O2 as a switch is too broken to ever use in production without using profile guided optimization and data training/profiling, but my rapid bitfield reject mask doesn't really reject very much, since | ing all of the per length, ascii UC A Z chars at each char position, 3 of 4 times or 1 of 2 created 0x5F the full AZ range, rejecting nothing

the malloc HEKs conflict/identical to GHEKs code needs a binary search algo char by char but it doesnt rn, b/c the token list is really long per len, but I cant find a single use of a binary search algo in the p5p repo in a .xs or .c except for win32/perlhost.h and Compress::Raw::* libs.


  • This set of changes does not require a perldelta entry.
  • WIP POC not prod quality

All the code in gv.c, is very old and has gotten zero optimizing since
5.000 alpha.  SV*s are instantly turned into PVNs on the front end
instantly loosing and chance of [future] SVPV COW Shared HEK key
string optimization.  HEK*s are unknown to gv_* API. All inputs are
continously parsed for ' and ::  without exception, even if they are
read only (SEGV) C literals or PP SvREADONLY() SvPROTECT() read only
literals or API contract read only HEK* PV buffers. Returned from
hv_store*() hv_fetch*(), HE*s, aren't exploited to pass the shared HEK*
onto gv_init_*() or gv_name_set(), and gv_name_set() on front end only
understands PVNs, but on backend, in the GP struct and GV body struct,
ONLY understands HEK*s. Therefore no RC++, and looking up the ShHEK again
in PL_strtab.

The large amount of tiny extern exported symbols wrapper funs added over
the years also causes C dbg call stacks even at -O1/-O2, to be 2-5 call
frames deep of 3 line shims/stub functions before reaching the main
logic. I can't tell what is a mathom and what isn't.

So to lay provisions needed for future commits, that add proper
SV*/HEK*/U32 hash precalculation, not to mention the memcmp() in
hv_common() is skipped if left and right ptr addr are equal. The front
end of gv_* needs cleanup.

-move U32 flags to the start of the the func, so flags can encode details
what void * Perl#1 means, and if vararg void * Perl#2 exists
(PVN with N as size_t is only 2nd arg user right now). Since
gv_stashpvs() is very common on core and CPAN, and called over and over
in 1 proc, since most interp core and CPAN XS devs don't know GV*s have
an RC that can be ++ed and stored in a MY_CXT struct. Also nobody knows
"stashes" are HV*s or PP packages/classes are implimented with HV*s.
So there is reason to pay extra attention to gv_stashpvs() b/c of its
high usage/call sites per library. So if the STRLEN can be CC constant
folded, and fits in a U8, store the length in the flags arg. Saves on
CPU ops in all the callers to push 2 args, vs 3. Public API
gv_stashpvs(str, create)'s create arg [flags in reality] can't be
optimized away or removed, so combine the 2 CC time constant args, so
they fold/optimize into 1 cpu op.

-at some point perl core needs to cache/create/move around C level
arrays of RC++ed ShHEKs to pass to the gv_*() APIs. SVPVs aren't exactly
the right format for storing sanitized (no */::/'/SUPER/main/UNIVERSAL)
and pre-parsed/splitted "package tokens", since SVs easily wind up or
escape into PP-state, and SV RO flags/COW flags aren't the most honored
and respected parts of the API by CPAN XS/maybe core.

ShHEKs escaping into PP-state is rarer than "generic SVs" escaping into
PP-state or CPAN XS state.  All legacy XS code any quality and
entry/beginner XS people, will pick "char *" getter macros vs an unknown
opaque "HEK" type (and newSVpvn() to capture/move those char *s).
Users who know what a HEK* is and how to RC++ it, know not to write to it.
Also a bad write to a ShHEK will cause more PP or SEGV breakage/panics
or proc exits, alot faster than a bad write to a SVfRO  "SVPV" buffer.
Hash doesn't match char string in a ShHEK will term the prc faster.
So vararg on gv_*() is a provision for a future prototype, that accepts
1, 2, 3 or more HEK*s passed array style, that already were sanitized
to not have ::s.

0xFF length was picked b/c there was bitfield space, shaving to
32/64/128 chars for gv_stashpvs(str, create) is possible if the bits
are needed b/c a terminal is 80 chars, would fit almost all absolute
("::") C string package names, and everthing in core and CPAN.

-the stubs remain as exported stub funcs, on purpose for now, it makes
 certain diag tools I use slightly easier to use vs optimized out inlines
 or macros. In 5.43 or 5.45 the exported stub funcs can be converted to
 macros no static inline, which is intent of this commit. The vararg
 is the 1 and only entry point to all of gv_stash* logic.

-flipping I32 flags to the front requires "_p" suffixes for private for
 ABI reasons, public API still thinks I32 flags is always the last arg
-since all front end wrappers, are 1-away from instead of multiple frames
 away, they are more likely to LTO inline away inside of libperl (not XS)
 on any CC. CCs have cost/benefit/wall time cut offs for scoring
 potential inlines opportunities. Going 2 layers, or 3+ layers of small
 inlines, is asking alot from a CC, that has to traverse a tree of nodes
 to do each inline, and the cut off could be as low as 1 inline fn and no
 more unrolling or folding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant