Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up, simplify, and re-work Perl_seed(). Fixes #22968 #23014

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 30 additions & 39 deletions util.c
Original file line number Diff line number Diff line change
Expand Up @@ -4683,36 +4683,10 @@ U32
Perl_seed(pTHX)
{
/*
* This is really just a quick hack which grabs various garbage
* values. It really should be a real hash algorithm which
* spreads the effect of every input bit onto every output bit,
* if someone who knows about such things would bother to write it.
* Might be a good idea to add that function to CORE as well.
* No numbers below come from careful analysis or anything here,
* except they are primes and SEED_C1 > 1E6 to get a full-width
* value from (tv_sec * SEED_C1 + tv_usec). The multipliers should
* probably be bigger too.
* Attempt to read from /dev/urandom to generate a pseudo-random number.
* If that does not work, or it is unavailable, we fall back to gathering
* several state variables and hashing them into a seed value.
*/
#if RANDBITS > 16
# define SEED_C1 1000003
#define SEED_C4 73819
#else
# define SEED_C1 25747
#define SEED_C4 20639
#endif
#define SEED_C2 3
#define SEED_C3 269
#define SEED_C5 26107

#ifndef PERL_NO_DEV_RANDOM
int fd;
#endif
U32 u;
#ifdef HAS_GETTIMEOFDAY
struct timeval when;
#else
Time_t when;
#endif

/* This test is an escape hatch, this symbol isn't set by Configure. */
#ifndef PERL_NO_DEV_RANDOM
Expand All @@ -4727,7 +4701,9 @@ Perl_seed(pTHX)
# define PERL_RANDOM_DEVICE "/dev/urandom"
# endif
#endif
fd = PerlLIO_open_cloexec(PERL_RANDOM_DEVICE, 0);
U32 u;

int fd = PerlLIO_open_cloexec(PERL_RANDOM_DEVICE, 0);
if (fd != -1) {
if (PerlLIO_read(fd, (void*)&u, sizeof u) != sizeof u)
u = 0;
Expand All @@ -4737,20 +4713,35 @@ Perl_seed(pTHX)
}
#endif

/* We only get this far if /dev/urandom is not available or the read fails.
* Grab several state variables and hash those for randomness instead.
*/

#ifdef HAS_GETTIMEOFDAY
struct timeval when;

PerlProc_gettimeofday(&when,NULL);
u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C2 * when.tv_usec;
/* Milliseconds */
UV uptime = (when.tv_sec * 1000000) + when.tv_usec;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't make this change. I'd prefer to see it as

u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C4 * when.tv_usec;

That is id avoid SEED_C2 (3 is prime but its tiny) in favour of SEED_C4. Consider:

$ perl -le'printf "%032b\n", 1000003'
00000000000011110100001001000011
$ perl -le'printf "%032b\n", 73819'
00000000000000010010000001011011

The purpose of multiplying by a prime is to cause single bit changes to result in multi-bit changes. Simply summing the raw time data will not have this affect. Also remember that multiplication by a prime (actually any odd number) is a reversible operation, meaning that it is collision free.

Copy link
Contributor Author

@scottchiefbaker scottchiefbaker Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal on this line isn't to make a good random number with lots of entropy, it's to calculate milliseconds.

The previous constant multipliers were a hack because we didn't have hashing in core. Now that we do, conceptually things are a lot simpler: gather some semi-random state variables and hash them to generate better entropy.

The hashing will handle all the bit spreading better than any simple multiplication ever did.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing the point, before we were multiplying before we added. Now you are adding and then doing the ptr hash. Its not the same thing.

perl -le'printf "%10d - %032b\n", ($_)x2 for 1*73819,2*1000003,1*73819+2*1000003
> '
     73819 - 00000000000000010010000001011011
   2000006 - 00000000000111101000010010000110
   2073825 - 00000000000111111010010011100001

If you want use the ptr_hash() on each element of when. Or multiply them by different primes. Whatever. Just dont add them first and then do ptr_hash().

Copy link
Contributor Author

@scottchiefbaker scottchiefbaker Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm purposely not trying to do the same thing here.

This particular line calculates milliseconds of uptime, that's all. This happens to require multiplication but I am not using it to generate entropy like we were previously.

Previously we were trying to generate entropy by multiplying a bunch of state variables together. In this PR I'm hashing a bunch of variables together to generate entropy. Hashing should provide better entropy and bit spreading than multiplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perl's GTOD() portability macro, has no guarantees on update frequency or granularity. Just b/c the milli/micro/ns portion/precision is there as a data type, Means nothing if the low half is never updated by Perl's OS specific port code, or by that OS'es libc.

GTOD() has stung (bugged broken) WinPerl and WinPerl CPAN eco system (NYTProf/p5ps Time::HiRES) many times in history. Specific reasons are Metaconfig picking ancient 1 SECOND fallbacks, Or Window kernels 15 ms rule, for updating/polling wall time from the HW wall time clock. 15 ms can be changed to 1 ms if process has semi-root privilege, but its very un-user friendly to ever turn that setting. Battery life/IPC sluggishness reasons.

Instead of GTOD, you probably want RDTSC/QueryPerformanceCounter/or the unix analogs in Time::HiRes'es src code.

See also Mingw's static linked into user binds code for the Win32/MSVCs stack cookie sentinal feature

mingw-w64/mingw-w64@e6ac7e4#diff-de3c44bfb45c9896cfe8a93c1e3b87001659d644876201678f07044edc8e714dL62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely chose four different state variables so that a failure in any one (or two?) should not affect the randomness we generate from the hash of all the variables. I venture a guess that at least 80% of Perl installations these days have /dev/urandom and will never hit the GTOD code path anyway.

Even without using time of any kind we still use PID, a stack pointer, and the local variable pointer which should be psuedo-random enough to generate a simple seed.

If we didn't have /dev/urandom and GTOD was broke, AND the local pointer was compromised we might start to have a problem.

Blead today uses these same state variables (and has for 20+ years now), but used multiplication to generate entropy instead of a hash. In other words, if it hasn't been broken for the last 20 years, nothing is changing in this implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As one small follow-up point. This code has one purpose: generating a single 32bit psuedo-random number. That number should only be used as a seed for a real PRNG. As long as this number is mostly-sorta-kinda good then the PRNG will handle generating "good" random numbers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, i do not feel comfortable with this part of your patch. Especially given @bulk88 's comments. I like the general direction you want to go, but I feel like this patch could do better, and I feel like adding the components of the time together like that without any kind of difference magnification could have negative consequences, so I could not in good conscience approve the patch with this change.

All I know is that historically using time for initializing RNG's has been problematic, and id feel more more confident we weren't doing any harm if we were a bit more aggressive about magnifying differences in the time data before we mixed it together for the final initialization.

For instance I would feel much better if the code in question was reformulated as:

 UV uptime = ptr_hash(when.tv_sec) + ptr_hash(when.tv_usec);

Multiplying by different primes is also an option.

If some other committer, especially one who uses Win32 or one of the other affected platforms (VMS maybe?) felt differently then that would be their call and on their conscience. As I wont be affected by this patch, and as I feel uncomfortable about it, I wont be approving it in its current state.

Honestly I do not understand the vehemence of your objection to being more careful with this data. It doesn't make sense to me. Being more aggressive about it can do no harm, and might do some good, so why object so strongly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm misunderstanding the goal of changing things as you requested.

We're mixing up a bunch of state variables to generate a PRNG seed that is neither predictable or repeatable. The hashing should be doing all the bit mixing of the variables, but you're talking about doing some pre-mixing of the variables by doing arbitrary multiplication. I don't think multiplying the variables before we mix them will make things any more secure or random. The "difference magnification" in this case is handled by the ptr_hash(uptime) on line 4739 after we have calculated milliseconds.

It's like we are making a milkshake and one of the steps is pre-mixing the milk, chocolate, and ice cream with a fork in a bowl before you put it in the blender. It's just another step that would be done better by the blender anyway.

It won't hurt anything, but it seems unnecessary to me. It's not a hill I want to die on though, so I'm happy to capitulate and make the change you suggested. For the sake of moving things forward I will implement:

 UV uptime = ptr_hash(when.tv_sec) + ptr_hash(when.tv_usec);

in the PR if this is something you're will to buy off on and we can move forward with getting this landed.

#else
Time_t when;

(void)time(&when);
u = (U32)SEED_C1 * when;
#endif
u += SEED_C3 * (U32)PerlProc_getpid();
u += SEED_C4 * (U32)PTR2UV(PL_stack_sp);
#ifndef PLAN9 /* XXX Plan9 assembler chokes on this; fix needed */
UV ptruv = PTR2UV(&when);
u += SEED_C5 * ptr_hash(ptruv);
/* Seconds */
UV uptime = when;
#endif
return u;

UV pid = PerlProc_getpid();
UV stack_ptr = PTR2UV(PL_stack_sp);
UV time_ptr = PTR2UV(&when);

/* Mix all the states together with XOR and then hash them */
U32 ret = ptr_hash(uptime) ^ ptr_hash(pid) ^ ptr_hash(stack_ptr) ^ ptr_hash(time_ptr);

/* PerlIO_printf(Perl_debug_log, "XXXX: TIME:%lu PID:%lu Stack:%lu PTR:%lu\n", uptime, pid, stack_ptr, time_ptr); */
/* PerlIO_printf(Perl_debug_log, "SEED: %u\n", ret); */

return ret;
}

void
Expand Down