Skip to content

Hide function prototyes from unauthorized callers #23570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

0351a62 extended hiding private functions from callers into the gcc world.

Some functions are allowed only in extensions; so can not be marked as hidden; this commit discourages their use however, by hiding their prototypes to all but the core and extensions.

It turns out that three functions were being used in modules we ship with that were marked as extensions-only; so they had to be made globally accessible.

  • This set of changes does not require a perldelta entry.

@khwilliamson khwilliamson requested a review from xenu August 13, 2025 17:30
regen/embed.pl Outdated
Comment on lines 145 to 152
my $count = 0;
$count++ while $flags =~ /[ACX]/g;
die_at_end "$plain_func: A, C, and X flags are mutually exclusive"
if $count > 1;
$count = 0;
$count++ while $flags =~ /[ACE]/g;
die_at_end "$plain_func: A, C, and E flags are mutually exclusive"
if $count > 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be simplified:

die_at_end "..."
    if $flags` =~ tr/ACX// > 1;

I thought the code was duplicated at first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were several pre-existing places in the code where this could be used. I've added a commit to change those.

@tonycoz
Copy link
Contributor

tonycoz commented Aug 14, 2025

This warning from building pretty much every XS with -DPERL_RC_STACK

../../inline.h: In function ‘Perl_rpp_invoke_xs’:
../../inline.h:1181:9: warning: implicit declaration of function ‘Perl_xs_wrap’; did you mean ‘Perl_runops_wrap’? [-Wimplicit-function-declaration]
 1181 |         Perl_xs_wrap(aTHX_ CvXSUB(cv), cv);
      |         ^~~~~~~~~~~~
      |         Perl_runops_wrap

and the cpphdrcheck hard failing for pretty much the same cause:

# cmd: c++ -c -DPERL_RC_STACK -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/home/runner/work/perl5/perl5/t/.. -I/home/runner/work/perl5/perl5/t/../lib/CORE source.cpp 2>&1
# out: In file included from /home/runner/work/perl5/perl5/t/../perl.h:7956,
#                  from source.cpp:4:
# /home/runner/work/perl5/perl5/t/../inline.h: In function ‘void Perl_rpp_invoke_xs(CV*)’:
# /home/runner/work/perl5/perl5/t/../inline.h:1181:9: error: ‘Perl_xs_wrap’ was not declared in this scope
#  1181 |         Perl_xs_wrap(aTHX_ CvXSUB(cv), cv);
#       |         ^~~~~~~~~~~~

@bulk88
Copy link
Contributor

bulk88 commented Aug 14, 2025

TLDR:

60% chance this PR is a good idea with zero side effects and zero risk of current or future bug reports, and shaves a couple milliseconds off each CPAN .so/.dll build

40% chance bad idea, and there will be formal SEGV/fatal erroring bug tickets by 2030 linking to this commit's SHA1, or Perl ecosystem social media complaints and criticism published by October/November 2026 about this commit.

Long story:

If embed.fnc entry has the character X in it, its long name full name extern "C" declaration can't be removed on a .c/.i level.

If its C lang extern "C" tagged/decled, but doesn't appear in perl5xx.dll's export table, because its some function that is between toke.c TU and op.c TU for example and is 100% LTO optimization safe/eligible on all current, former, and future "brands" of C compilers.

Yeah, then seeing it at .i file level in a a CPAN .xs converted to a .i file, is 100% bloat/CPU waste/CC and C linker wall time wasting. The C compiler at .i->.S->.o phase always has to lex/parse/arena pool || malloc() up a couple C structs and a string hashing tree node for that extern "C" declaration to exist in machine code land AKA C language VM state. The C toolchain stack won't know until the very last moment, if that extern "C" symbol has a body/definition or if its a C linker phase fatal error.

How else would the C compiler flood STDERR with warnings for undeclared functions?

On ELF, I think b/c of its lazy load symbol resolution algorithm, end users won't know until they inject the .so into address space with DynaLoader.pm, or make the CPU attempt to deref/read/write/execute that C lang ASCII symbol, if that C lang ASCII symbol has a body/definition or time for a SEGV.

My slight concern here is X character is the final chance to flip the light-switch or the final """judge""" that has absolute control over the export symbol table entries of the files perl.exe/perl-static.exe/perl5xx.dll.

There are a dozen+ reasons for an entry to be marked with X to exist. The primary reason (75%) is if a symbol X because its an undocumented/private API of 1 public API macro.

A bunch of other infrequent reasons its X marked:

  • alpha-grade no-support-offered
  • perma-experimental API whose only consumers are CPAN modules starting with B::* or Devel::* and the perma-experimental API breaks all its CPAN consumers every 24-36 months, for a couple weeks during normal unstable-by-design blead perl code evolution
  • part of a blead-only experiment that is scheduled to be #define disabled before the next stable perl
  • no POD by p5p, officially undocumented, can disappear at any time, re.pm/re.dll and stuff in p5p/.git/ext are the only users of it
  • some function symbol that only appears as part of a rarely used always off -DDEBUGGING_TURNED_TO_LEVEL_ELEVEN or a -D50_MEGABYTES_PER_SEC_OF_TRACING_LOGS build option, and the always #defined off function symbol is exported for P5P-ppl only, so it can be used from Inline::C vs relinking/recompiling/waiting 3 minutes to re-LTO a private hacked up libperl.so.dll
  • basically its a convenience export for hacking libperl/perl.bin the binary, its not a statement of "public API" status, or a statement on political correctness of "CPAN authorized" status or officially "CPAN unauthorized" but has a long time "no-comment" status from P5P

if its really a convenience export for hacking libperl/perl.bin the binary, the fn body should have a comment like

/* not for CPAN, will be removed at any time, without notice, private to CORE */

or

 /* Not for CPAN and is a private CORE-only API that is now deprecated/mathomed.
     It will be removed in the near future after future refactoring removes all usage. */

All of these kinds of X functions need to keep their ASCII linker names in the export table, and C lang full name decls in the .i/.c files made when compiling CPAN XS shared libs.

Note, dropping the X is sometimes a good idea, because that allows more LTO opportunity, but atleast on Win32/Win64 ABI, I often see real, PE exported function symbols, extern "C" declared by P5P, get automatically inlined 100% of the time inside of perl5xx.dll, leaving 0 call sites to that C function.

The 1 and only reference inside perl5xx.dll to that C function body, is the function pointer in perl5xx.dll's export table exposing that C function symbol to CPAN. LTO magical auto-inlining herustics. IDK the rules for ELF visibility decl tag vs LD_PRELOAD vs LTO eligibility/legality to say if what I described above is legal or engineering wise impossible and illegal on ELF OSes.

I also am starting to smell a tiny bit of smoke coming from unusual RISC CPUs/unusual commercial Unix OS/unusual and very rare non-Unix OSes. But this is all future-proof thinking, not middle of Aug 2025 thinking.

Most likely real world failure mode for Perl dropping C .i decls for inter-disk file symbol table visible extern symbols in 2025, is breaking Perl on OSX + some optimization feature in Mach-O .pdf spec nobody knows about except the full time Apple employee paid to add new errata entries every 12 months to Apple's official Mach-O .pdf document.

Historical reasons would've been the "ordinal" (aka 0 based number) or .map file, shared library symbol linking systems of OS/2, Win16, maybe Symbian. Where somehow, the .i level function declarations even if you don't invoke them in your shared library, are used as input to create U32 hash-es that are needed to find ANY and ALL C lang symbols inside libperl.dll.so that need to cross disk files to another .dll/.so, or somehow, each inter-disk file extern "C" declaration, affects the de-name-mangling or re-name-mangling of the previous C Lang linker symbol, or the next C Lang linker symbol in the symbol table.

More 2025-ish tech reason failure modes are "API versioning" or separating Swift vs Objective C vs C99 vs C++ vs JVM vs Go vs Fortran vs Rust symbols, all of which have the same source code ASCII identifier in their respective source code. Probably not a Perl problem because Perl VM uses PL_ and Perl_ and PerlIO and win32_ prefixes on everything.

But thinking in 2025-2030 era, what if a downstream consumer/sponsor of LLVM project, such as Apple or Google or Microsoft permanently removed the concept of .a/.lib files from the official Clang/LLVM toolchain for the Apple/Google/Microsoft platforms?

Now theoretical concept of Swift vs Objective C vs C99 vs C++ vs JVM vs Go vs Fortran vs Rust symbols and C++-style ASCII string symbol name mangling isn't enough to stop a collision at process runtime, there would be 9 months-3 years of short-lived chaos for main stream private or FOSS C23/C++23 code bases if Clang/LLVM/iOS/Android/OSX dropped the concept of .a/.lib files from their toolchain.

There is a real risk that by 2030, one of Perl's major OSes, WILL NOT support execution of homebrew ARM64 machine code. The insides of the executable ELF/Mach-o disk file called /usr/bin/perl, only contains JVM/WASM/LLVM IR machine code, not ARM64 machine code. That JVM/WASM/LLVM IR machine code inside the /usr/bin/perl file gets Rosetta-ed/JIT-ed/partial AOT-ed as needed along with an intensive automated algorithmic "security analysis" of some kind, then cryto-signed/cryto-secure chunks of /usr/bin/perl now converted to 4KB blocks of ARM64 code, get cached in some DRM/"Secure Computing Cache" by the OSX/iOS/Android kernel.

All ISO C lang code and all ISO C++ lang code and the Perl interp should never notice this 180 degree OS/CPU design change happened, but I can promise, there will be very visible Perl C level bugs, and Pure Perl level bugs when it happens.

I want to say the last 3 paragraphs "This will never happen! Don't waste time planning for nonsense!"

It already happened in 2024/2025 on a Production Grade Perl install on a real Production Grade OS with the highest Security, Uptime, and Performance in the world that money can buy.

See this ticket: #23399

There have also have been 4 failed attempts in my life over the last 20 years to permanently kill off C/C++/Asssembly code once and forever, on the CPU silicon level.

https://en.wikipedia.org/wiki/Jazelle
https://en.wikipedia.org/wiki/Intel_MPX (strongest removal of C/C++ attempt ever, fastest failure ever)
https://en.wikipedia.org/wiki/Transmeta (partial)
https://en.wikipedia.org/wiki/Google_Native_Client (this CPU/ISA only has forward compatibility with i386/x64, not back compatibility)

WASM Perl + full C POSIX API access +full normal POSIX user perms, might be the only Perl build config for OSX by 2030.

@@ -2063,7 +2069,7 @@ Cp |const char *|moreswitches \
Adp |void |mortal_destructor_sv \
|NN SV *coderef \
|NULLOK SV *args
CRTXip |char * |mortal_getenv |NN const char *str
CRTip |char * |mortal_getenv |NN const char *str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of libperl's *getenv*() are messy, especially WinPerl's sandwich of functions and macros. 3 identical SV mortals/Newx blocks are made each time on WinPerl for no reason right now since Perl mid-5.30s.

I think but im not sure unless i do a deep dive study, that this function is one of the functions that will become semi-public CPAN API if WinPerl becomes MS UTF16-aware in the near future.

|int flags
EXop |bool |try_amagic_un |int method \
Xop |bool |try_amagic_un |int method \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B::C and B::CC style modules may might want these symbols, but I have wanted to not-delete them and do major plastic surgery on their C prototypes/parameter order/parameter meanings for many months now.

The branch is 40% done and not syntax error-ing on 1 on these overload.pm getter callouts, I'm targeting -X file test operators first in my branch, because they have the worst abstraction split, between what the pp_*() func is responsible for and what the Public-to-CORE-only front-end C functions of overload.pm API are responsible for.

its mostly about this string/expression PL_op->op_type and is it caller function (the pp_*()s) or callee functions (overload.pm front ends) that derefs my_perl->Iop->op_type and do a tree of if/else branch logic on the U16 inside op_type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B::C and B::CC style modules may might want these symbols, but I have wanted to not-delete them and do major plastic surgery on their C prototypes/parameter order/parameter meanings for many months now.

Would there is any interest from Unix only P5P ppl, to make the Win32/Win64 GH CI runners, dump the real symbol table of perl5xx.dll to STDERR/STDOUT inside a Win32/Win64 GH CI runner on each build?

Its 1279 lines (C symbols) long on perl543.dll right now.

Move these to a place common to both
I, and previous authors had forgotten that tr returns a count of
matching characters, so can tell you if more than one character in the
input string matches; as well as if none do.
It makes no sense to specify more than one of A C X, nor more than one
of A C E.  This commit enforces that, showing one entry in embed.fnc
that needed to be changed.
0351a62 extended hiding private
functions from callers into the gcc world.

Some functions are allowed only in extensions; so can not be marked as
hidden; this commit discourages their use however, by hiding their
prototypes to all but the core and extensions.

It turns out that four functions were being used in modules we ship
with that were marked as extensions-only; so they had to be made
globally accessible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants