-
Notifications
You must be signed in to change notification settings - Fork 349
Replace uses of strtok() with g_strsplit() and friends #3967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This is pretty big regardless. So depending on the speed of reviewing it, we could break off the fencer commits into a later PR, and maybe work on sharing code with I don't love the idea of pulling this parsing code into |
|
One Edit: Added a commit to fix this in the test |
bf6eb20 to
b936817
Compare
| * into fewer than two mappings avoids a "Malformed mapping" error message | ||
| * below. | ||
| */ | ||
| if (g_strv_length(mappings) < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. There could be a single mapping -- though if there are no delimiters, we don't want to throw an error if it's an interval instead of a mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need to be addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| g_string_append_c(buf, '"'); | ||
| g_string_append(buf, value); | ||
| g_string_append(buf, *value); | ||
| g_string_append_c(buf, '"'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These calls could be replaced with:
g_string_append_printf(buf, "\"%s\"", *value);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they can. It's a question of whether we want to.
If you look around at all the places where we build GStrings, we almost never use g_string_append_printf(). We basically use it only when we need to stringify a number or something. Otherwise, we use g_string_append(), g_string_append_c(), and pcmk__g_strcat() (a macro that chains together g_string_append() calls).
The reason? The usual. Performance. When we first decided to start moving to GStrings to avoid static buffer size limitations, we obsessively ran performance tests using crm_simulate and pcmk_simtimes. We found that using string formatting functions like g_string_append_printf() or even crm_strdup_printf() carried a lot of overhead. They reliably made a difference of a few percentage points in the runtimes of simulations on certain scheduler inputs. So we decided to avoid them wherever possible. A string or a character can be appended to a GString without any formatting.
With all of that said... I'm not convinced that a 5% slowdown (for example) is important enough to matter, if the printf() alternatives are more readable. The flip side of course is that if we make several changes that each cause a 5% slowdown, it will start to matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should explicitly say: let me know your thoughts on the above.
In any case, performance shouldn't matter much when displaying options. I continued to follow that "rule" here more for consistency.
|
|
||
| for (char *dir = strtok(dirs, ":"); !found && (dir != NULL); | ||
| dir = strtok(NULL, ":")) { | ||
| dirs = g_strsplit(PCMK__OCF_RA_PATH, ":", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if PCMK__OCF_RA_PATH is NULL, this will assert instead of return false like we previously did. I have not gone back and checked previous patches to see if this is taken into consideration in those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like it can be NULL, since it's set by the configure script. I can add a comment to that effect. I'm afraid Coverity will complain about an always-true condition if I check whether it's NULL.
Edit: On second thought it feels silly to add a comment, since we generally assume our macros are defined in a reasonable way. I can add assertions for dirs here and elsewhere though.
| * into fewer than two mappings avoids a "Malformed mapping" error message | ||
| * below. | ||
| */ | ||
| if (g_strv_length(mappings) < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need to be addressed?
| } | ||
| } | ||
| value[k] = '\0'; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs (doc/sphinx/Pacemaker_Explained/fencing.rst) say this about pcmk_host_map:
- A mapping of node names to ports for devices that do not understand the
node names. For example, ``node1:1;node2:2,3`` tells the cluster to use
port 1 for ``node1`` and ports 2 and 3 for ``node2``. If
``pcmk_host_check`` is explicitly set to ``static-list``, either this or
``pcmk_host_list`` must be set. The port portion of the map may contain
special characters such as spaces if preceded by a backslash *(since 2.1.2)*.
I'm unclear on what a "port" is in this context, and what characters are valid in the definition of a port. I'd like to make sure that if we are expected to support characters such as spaces in port names, that the new parsing code handles that without backslashes. Also, since this has been present since 2.1.2, I'm a little concerned about breaking whoever is using this right now regardless of how little sense it makes or how it is currently mangled in the XML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Port" comes from a time when it was more common for fencing to target a shared power supply or network switch. For example:
the nodes each have two independent Power Supply Units (PSUs) connected to two independent Power Distribution Units (PDUs) reachable at 198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
It has since been overloaded -- for example, fence_vmware_rest uses it for the VM name. Ultimately, the meaning of a port, and thus the characters that make sense, are up to the fence agent. If the fence agent advertises support for a --plug or --port option, and pcmk_host_map is configured, then the "ports" get passed as that option by default (or as pcmk_host_argument if set).
I'd like to make sure that if we are expected to support characters such as spaces in port names, that the new parsing code handles that without backslashes
Previously, the backslash was just getting ignored. Wouldn't hurt to do some testing (or re-testing... not sure what I did, if anything) to be sure. But the reason I felt comfortable making this change in the first place, is that it doesn't appear to change the end result; we simply no longer skip a backslash if present.
FWIW Oyvind has no idea why e95198f was done.
Also, since this has been present since 2.1.2, I'm a little concerned about breaking whoever is using this right now regardless of how little sense it makes or how it is currently mangled in the XML.
Hopefully no one... but I understand your concern. This is a case where I'd be willing to take my chances in order to make the code look sane. But I'm not the one who has to do the z-stream builds.
|
|
||
| if (pcmk__str_empty(nvpair[0]) || pcmk__str_empty(nvpair[1])) { | ||
| crm_err(PCMK_FENCING_HOST_MAP ": Malformed mapping '%s'", *mapping); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're changing behavior... I wonder if it makes sense to return an empty mapping in the case where it's malformed. I can see going either way with it. If there's a parse error, who knows what they screwed up so maybe the entire thing is invalid. On the other hand, maybe we should try to use what we can from before the error occurred (what we're doing right now) to implement the user's configuration as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not even just "before the error occurred," but also after the error occurred. This is a sanity check to catch mappings with an empty name (node) or value (port). Note that this is happening after splitting on semicolons and whitespace. So consider the following (colons are equivalent to equal signs):
node1:port1 node2: node3:port3 :port4 node5:port5 :
The mappings for node1, node3, and node5 would get used. The mappings node2:, :port4, and : would get ignored because either the name or value was empty. I lean against throwing away the entire set of mappings in that case. But I am open to it.
For readability. Similarly for the use of pcmk__str_eq() -- using strcmp() in add_possible_values_default() was not a bug in this case, as both arguments were guaranteed to be non-NULL, but this was not obvious. Signed-off-by: Reid Wahl <[email protected]>
For readability and reliability. strtok() modifies its argument string, but according to the C11 standard (per the following Stack Overflow answer), the setlocale() return value should not be modified by the program. * https://stackoverflow.com/a/29116455 Signed-off-by: Reid Wahl <[email protected]>
The name is overly long and uses the public API prefix ("services_" with
a single underscore). Also rename the lpc variable to "i" and scope it
to the loop, rename the "root" argument to "dir", and use bool instead
of gboolean.
I'm guessing that the "os" prefix, as well as the separate
"services_linux.c" file, were meant to allow different implementations
of certain functions depending on the OS. GLib does similar things.
However, the commit that introduced it all (cc2110d) sheds no light on
this question.
Signed-off-by: Reid Wahl <[email protected]>
* namelist was not being freed when entries == 0, indicating an empty directory. * A namelist entry was not being freed if stat() failed. Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
For readability Signed-off-by: Reid Wahl <[email protected]>
The name is overly long and uses the public API prefix ("services_" with
a single underscore).
Use bool instead of gboolean, and rename the "root" argument to "dirs".
"dirs" is a colon-delimited list of directories.
I'm guessing that the "os" prefix, as well as the separate
"services_linux.c" file, were meant to allow different implementations
of certain functions depending on the OS. GLib does similar things.
However, the commit that introduced it all (cc2110d) sheds no light on
this question.
Signed-off-by: Reid Wahl <[email protected]>
For readability versus strtok(). Signed-off-by: Reid Wahl <[email protected]>
Previously, the libcrmservice code supported initdir being a colon-delimited list of directories. However, this was not documented in the configure help text or comments. All of the documentation referred to initdir being a single directory. The default behavior, if not building with "--with-initdir=<dir>", was to check a list of directories and set INITDIR to the first one that existed. Signed-off-by: Reid Wahl <[email protected]>
Its sole remaining caller did nothing except call it. Move the code there. Signed-off-by: Reid Wahl <[email protected]>
The resources_os_list_ocf_providers() is listing top-level subdirectories of each directory in PCMK__OCF_RA_PATH. Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
...up. * Give it a shorter and clearer name that doesn't look like public API. * Drop arguments that always take the same value. (There is only a single caller.) * Avoid a fixed-size buffer where we don't control the input. * Use g_strsplit() for readability instead of strtok(). Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
...in get_directory_list(). To do this, copy the current services__list_dir() function into a static helper called gdl_helper(). This is the only call to services__list_dir() that may take a false value for the executable argument. Signed-off-by: Reid Wahl <[email protected]>
It's always true Signed-off-by: Reid Wahl <[email protected]>
Previously, files of type other than directory and regular file would be listed unconditionally. Now, list only executable regular files if exec_files is true, and list only directories if exec_files is false. Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
To services__list_ocf_providers(), in order to follow the library's naming conventions. Signed-off-by: Reid Wahl <[email protected]>
To services__list_ocf_agents(), in order to follow the library's naming conventions. Also defunctionize list_provider_agents(). It seems more straightforward to have it inline. Signed-off-by: Reid Wahl <[email protected]>
And return bool instead of gboolean. More to come. Signed-off-by: Reid Wahl <[email protected]>
For readability Signed-off-by: Reid Wahl <[email protected]>
And assert on memory errors Signed-off-by: Reid Wahl <[email protected]>
Nothing uses this yet Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
And compare strchr() return value against NULL instead of 0. Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
It's still confusing, but hopefully a little bit less so. When delay_base_s is set, it means we've found our mapping. So we use that as a loop exit condition. We no longer need valptr, since we never change what value points to. Now that value isn't changing, it will be easier to feel confident about using g_strsplit_set() in place of strtok(). Note the similarity to pcmk__scan_nvpair(). Signed-off-by: Reid Wahl <[email protected]>
Instead of strtok(). This seems clearer. Skip empty mappings so that we don't log a "malformed" error. strtok() skipped contiguous sequences of delimiters, so we wouldn't try to parse an empty mapping. g_strsplit_set() returns an empty string between contiguous delimiters. Strip leading and trailing whitespace before parsing the param value, and don't try to parse as a sequence of mappings (delimited by ';', ' ', and '\t') if there are no more delimiters after stripping. This avoids logging an error when the value is formatted like a single interval: no colon, no semicolon, no spaces or tabs except optionally at the beginning or end of the string. It seems reasonable not to strip leading or trailing semicolons. We can ignore leading or trailing whitespace, but a leading or trailing semicolon seems like a clear indicator that this is supposed to be a mapping. This will result an error for values like ";3s " or " 3s;" but not for " 3s ". Signed-off-by: Reid Wahl <[email protected]>
To reduce some nesting Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
Seems cleaner since it isolates the "mappings" variable and g_strsplit_set() call. Signed-off-by: Reid Wahl <[email protected]>
An ISO 8601 interval can include a colon character. I can't think of any valid reason why a user would specify an interval this way for a fencing delay, but there doesn't seem to be any reason to exclude mapping values that have a colon. The "make sure there's no colon" check seemed to be an artifact of the way the parsing code worked before we started using g_strsplit(), etc. Signed-off-by: Reid Wahl <[email protected]>
Commit f057778 fixed an off-by-one error that a test relied on to avoid randomness in the output. Also add a new test to ensure that pcmk_delay_base is capped by pcmk_delay_max. Signed-off-by: Reid Wahl <[email protected]>
...to i. Also rename max to len. Signed-off-by: Reid Wahl <[email protected]>
This is a static function with only one caller, and that caller always passes the address of a struct member. Signed-off-by: Reid Wahl <[email protected]>
Signed-off-by: Reid Wahl <[email protected]>
It's about to go out of scope, so it doesn't get reused. It also doesn't get freed because the aliases table has taken ownership of it. Signed-off-by: Reid Wahl <[email protected]>
This should make absolutely no difference but uses the correct semantics. Signed-off-by: Reid Wahl <[email protected]>
This reverts e95198f. Backslash escapes are useful in C and on the command line, but it's not clear how they would be useful in a pcmk_host_map value. Backslash escapes are not used in XML, where the configuration is stored. The removed code did the following upon finding a backslash: * If we're not inside the value part of a host-to-port mapping, skip both a backslash and the character after it. * Otherwise, skip the backslash and keep the character after it (unless that character is also a backslash). This doesn't seem to make sense. Any characters that are special in XML attribute values would need to be escaped using XML entities. Other characters, including backslashes, are allowed as literals. * https://www.w3.org/TR/REC-xml/#NT-AttValue Signed-off-by: Reid Wahl <[email protected]>
Previously, if we hit a parsing error in pcmk_host_map, we logged a debug message and continued trying to parse the rest of the value. It's hard to be confident about the resulting behavior. Now, we stop parsing if we encounter an error. Also as part of this commit, we use g_strsplit_set() twice. At the first level, we use it to split the list of node-name-to-port mappings in pcmk_host_map, using ';', ' ', and '\t' as delimiters. At the second level, we use it to split each mapping, using '=' and ':' as delimiters. Support for '=' is likely to be removed in a future release. I greatly struggled to reason about the existing code based on "last = i + 1". It felt as if there were lots of potential edge cases. The behavior may not be identical as of this commit, but it should be more robust and predictable, and the code should be easier to maintain. Signed-off-by: Reid Wahl <[email protected]>
b936817 to
ec51073
Compare
|
Rebased on main, no other changes yet |
Most of this seems fairly straightforward, but the fencer changes related to
pcmk_delay_baseandpcmk_host_mapcould use some testing. The existing code was unintelligible to me at first, so it required a LOT of small, incremental changes.