-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cmd line parser #219
base: main
Are you sure you want to change the base?
Add cmd line parser #219
Conversation
e057db9
to
70fcfad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I really like this PR.
I had a number of comments, but only two actual concerns.
Address comments or ping me on slack and we can discuss.
NIce work :)
dataplane/src/args.rs
Outdated
huge_worker_stack: u32, | ||
#[arg(long, value_name = "socket memory")] | ||
socket_mem: Option<String>, | ||
#[arg(long, value_name = "enable/disable telemetry", default_value_t = true)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should remove this argument as I forcibly disabled DPDK telemetry in a prior commit to dpdk-sys
(I understand that you are just mirroring the prior, also outdated, parameters here)
dataplane/src/args.rs
Outdated
no_telemetry: bool, | ||
#[arg(long, value_name = "iova mode(va|pa)")] | ||
iova_mode: Option<String>, | ||
#[arg(long, value_name = "loglevel for a specific component")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this argument is about. Are we plumbing this into the tracing crate in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into DPDK documentation and invoked the dataplane passing --log-level=help and this is what I got.
Log type is a pattern matching items of this list (plugins may be missing):
bus.auxiliary
bus.pci
lib.eal
lib.ethdev
lib.hash
lib.hash.crc
lib.hash.fbk
lib.hash.gfni
lib.hash.thash
lib.hash.thash_poly
lib.mbuf
lib.mempool
lib.net
lib.rcu
lib.ring
lib.telemetry
pmd.common.mlx5
pmd.net.mlx5
user1
user2
user3
user4
user5
user6
user7
user8
Syntax using globbing pattern: --log-level pattern:level
Syntax using regular expression: --log-level regexp,level
Syntax for the global level: --log-level level
Logs are emitted if allowed by both global and specific levels.
Log level can be a number or the first letters of its name:
1 emergency
2 alert
3 critical
4 error
5 warning
6 notice
7 info
8 debug
I understood that with this command you can set the log levels specifically for certain modules.
So, I thought that this could be interesting. However, you're right, will the binding actually honor this ?
I'd say they should ...so, I just added the command as a direct pass-through to the EAL.
dataplane/src/args.rs
Outdated
main_lcore: u8, | ||
#[arg(long, value_name = "map lcore set to cpu set")] | ||
lcores: Option<String>, | ||
#[arg(long, value_name = "in-memory flag", default_value_t = true)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that we should "hard code" the --in-memory
option.
Otherwise we risk leaking truly massive amounts of hugepage memory if DPDK gets a kill -9
or some other event which prevents it from cleaning up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that. Mind it defaults to true. But if you think that it should always be there, we can remove the option.
dataplane/src/args.rs
Outdated
pub(crate) struct CmdArgs { | ||
#[arg(long, value_name = "core-id used as main", default_value_t = 2)] | ||
main_lcore: u8, | ||
#[arg(long, value_name = "map lcore set to cpu set")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with allowing this as a command line flag for the moment.
That said, in the long run, the set of lcores used for workers (and symmetrically [and inversely] the set of cores used for everything else) needs to be a function of linux kernel command line parameters like isolcpus, or tickless configurations like no_hz=full
.
There is also NUMA to consider.
Given that these parameters are either hardware specific or boot time configuration, I'm not sure we can assume the command line is a great place to get this data.
That said, we could self-inspect for these parameters at launch (ideally in a parent process which invokes execs the dataplane). That process could hand us this information so maybe command line is the best place to get it.
dataplane/src/args.rs
Outdated
// To be removed | ||
if self.allow.len() == 0 { | ||
out.push("--allow".to_string()); | ||
out.push("0000:01:00.0,dv_flow_en=1".to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good default for the moment, but we need to be careful here. 0000:01:00.0
is the PCI address of the CX7 card in the dev machine. We can't count on it in general and we should let our deployment process depend on it.
That said, this is still a step in the right direction since the parameter is, at minimum, configurable now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just hardcoded it so that we'd get the same behavior without needing to pass it. The alternative would have been to add a bash (or just) script so that we don't have to write --allow .... each time we run it at this stage.
Note that the hardcoded value will only be used if no --allow is specified.
dataplane/src/args.rs
Outdated
|
||
/* IOVA mode */ | ||
out.push(format!( | ||
"--iova-mode={}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is another place where we need to be careful. The CX7 wants virtual memory, but other cards will require physical memory (virtual memory bypass). Ideally we will set this as a function of the driver of the network card we are going to poll.
But again, this is as good as we are going to do for the moment
dataplane/Cargo.toml
Outdated
@@ -7,6 +7,7 @@ license = "Apache-2.0" | |||
|
|||
[dependencies] | |||
|
|||
clap = { version = "4.5.27", features = ["derive"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move clap
to the workspace level Cargo.toml
(with default-features
disabled and features = []
).
Then change this line to read
clap = { workspace = true, features = ["derive"] }
This strategy has the advantage of centralizing version numbers and enforcing that all crates in this worspace are on a common version and using common compile options (features).
cargo deny
will bounce us if we fail this check, but only on a per-crate basis.
I find that debugging can become enormously confusing if you end up in a situation where you ship or depend on multiple versions of the same package.
It is also far easier on our (future) operations and sec-ops teams when we can give them a single, unambiguous, and exact answer to what version of any given thing is in production at any given time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Thanks for the explanation @daniel-noland .
This is a good question. On thinking about it, I suggest that we convert the I am a big believer in thin applications and fat libraries. It makes testing easier, and who knows when we might want to recycle dataplane logic for some other purpose. |
Thanks ... Fine to discuss if you're around! |
.. so that the GW dataplane can accept also non-EAL related params. Signed-off-by: Fredi Raspall <[email protected]>
This commit is to be reverted, but allows starting the dataplane without any arg, as it was before. Also, printed the config, which should be replaced by logs. Signed-off-by: Fredi Raspall <[email protected]>
Produce a binary called "gateway" instead of dataplane. Signed-off-by: Fredi Raspall <[email protected]>
70fcfad
to
1ffb54d
Compare
@daniel-noland I've pushed some changes and also renamed the crate to "gateway". |
Signed-off-by: Fredi Raspall <[email protected]>
1ffb54d
to
cd5313a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for renaming dataplane
to gateway
?
[EDIT: Sorry I had missed part of the conversation. I like Daniel's suggestion of a dataplane library + gateway binary, for what it's worth.]
[[bin]] | ||
name = "gateway" | ||
path = "src/main.rs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require some additional changes in the Dockerfile
, as well as in the justfile
for targets sterile-build
and build-container
where we reference binaries or Rust packages called dataplane
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Will make the changes if everyone is fine with the change.
goal: allow dataplane process to get other params, non-EAL related.
The crate used, clap, allows deriving parser functions automatically and defining default values.
However, that does not seem to work for certain types. We should also decide which args we can default and which ones are mandatory. We could default ALL (and then it would behave as it was, with stuff hardcoded).
We should also decide if we want to call the main proces "dataplane" or gateway?