Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial / Pre Parsing a CLI #1880

Closed
2 tasks done
kbknapp opened this issue Apr 29, 2020 · 16 comments
Closed
2 tasks done

Partial / Pre Parsing a CLI #1880

kbknapp opened this issue Apr 29, 2020 · 16 comments
Labels
A-parsing Area: Parser's logic and needs it changed somehow. C-enhancement Category: Raise on the bar on expectations S-wont-fix Status: Closed as there is no plan to fix this

Comments

@kbknapp
Copy link
Member

kbknapp commented Apr 29, 2020

Comment on closing:

In the short term, we now have IgnoreErrors.

Longer term, I feel like #1404 is a more general solution and has existing precedence (Python's argparse at least) compared to a get_partial_matches(arg_name) which only works in a narrow set of cases.


Make sure you completed the following tasks

Describe your use case

A way to declare a subset of arguments to be parsed, then return from parsing early. With the expectation that parsing will be called again, and run to full completion.

This most often comes up when you want some argument to handle some aspect of the CLI itself (such as --color flags to control error messages during parsing, or in the above --arg-file which needs to determine a file to load prior to parsing the rest of the CLI).

Describe the solution you'd like

From a user stand point, nothing changes. The run the CLI and things just work. From the a clap consumer stand point, it essentially allows you to pull out some values from the CLI (ignoring everything else), change something about the CLI and continue parsing.

// .. already have some App struct
let pre_matches = app.get_partial_matches("color"); // Only parse the `--color` option, ignore all else
app = if let Some("never") = pre_matches.value_of("color") {
    app.setting(AppSettings::ColorNever)
} else {
    app
};

let matches = app.get_matches();

Maybe it's biting off more than we want to chew for now, but this issue seems to come up time and time again where we want to parse some particular subset of arguments, and use those values to inform how the CLI itself behaves.

Alternatives, if applicable

You can already reparse your CLI twice.

let matches = app.get_matches();
app = if let Some("never") = matches.value_of("color") {
    app.setting(AppSettings::ColorNever)
} else {
    app
};

let matches = app.get_matches();

But this doesn't help if there are errors in the initial parse, or options included included in the initial parse that the initial app isn't aware of. For example, an iptables style CLI where -m <foo> enables many other arguments and options.

Additional context

This could potentially be used in #1693 #1089 #1232 #568

@kbknapp kbknapp added T: new feature A-parsing Area: Parser's logic and needs it changed somehow. labels Apr 29, 2020
@alerque
Copy link
Contributor

alerque commented May 5, 2020

This feature would also be immensely useful for implementing #380, localizing a CLI interface.

You can already reparse your CLI twice.

This is not quite true. You already mentioned some exceptions such as in the case of parse errors, but there is one more road block that makes that technique unworkable. A number of things happen in get_matches() that have the potential to short circuit the rest of the application life-cycle. For example having a help subcommand. If this is detected it immediately outputs stuff to the terminal and quits. This means you don't get a chance to reliably parse the CLI a second time.

This makes the technique useless for localization because by the time we get information about a potential language flag the App has already spit out strings that may have needed localization.

@CreepySkeleton
Copy link
Contributor

As a minimal viable solution, we could simply allow ignoring unknown or missing positionals and options.

// first "probe" app
let preprocess = App::new("app")
    .settings(IgnoreErrors) // also disables --help, version, and others
    .arg("--color <color> 'coloring policy: never/always'")
    .get_matches(); // will succeed no matter what

let mut app = App::new("main_app");

match preprocess.value_of("color") {
    Some("always") => app = app.setting(ColorAlways),
    Some("never") => app = app.setting(ColorNever),
    _ => {}
}

// real parsing starts here
app.get_matches()

This kind or partial parsing would allow for most, if not all, cases and would be fairly easy to implement.

@alerque
Copy link
Contributor

alerque commented Jun 2, 2020

Yes that does indeed sound like a MVS, and something along the lines of what I expected to be able to hack together already and failed. It's not the most elegant solution maybe, but it would get me by for my use case of localization.

@wabain
Copy link
Contributor

wabain commented Jun 7, 2020

@alerque It sounds like get_matches_safe could be what you're looking for? That would address the immediate problem of clap prematurely exiting with help or an error.

@CreepySkeleton
Copy link
Contributor

@wabain But it wouldn't give you the partial mach, see the color example above

@acheronfail
Copy link

I am building an app which spawns another process, and needs to "sniff" if certain flags were passed to the subprocess. Having an AppSettings::IgnoreErrors would be invaluable, since I only want to match some specific args, and leave the rest alone.

Right now I believe this is impossible with clap, and I have to parse the args manually.


For some more detail, my app is used like this:

app <args sent to subprocess>

The subprocess supports hundreds of arguments, and I don't want to have to write code to match all of them if I'm just interested in 2 or 3 flags.

The get_matches_safe doesn't work here for me though, since that bails out and completely fails.

@kenoss
Copy link

kenoss commented Jul 15, 2020

Let me have some examples because my English is poor and I might miss points.

The original problem is "lax position of arguments."

# Good.  `--project` option belongs to `Arg("gcloud")` and `--zone` to `Arg("ssh")`.
$ gcloud --project=hoge compute ssh --zone=asia-northeast1-a instance-name

# Good.  `--project` option can descend to subcommands.
$ gcloud compute --project=hoge ssh --zone=asia-northeast1-a instance-name

# Also good.
$ gcloud compute ssh --project=hoge --zone=asia-northeast1-a instance-name
 
# Bad.  `--zone` option belongs to `ssh` subcommand.
$ gcloud --project=hoge compute --zone=asia-northeast1-a ssh instance-name

I think this is solved by adding option to Arg. No tricks.
There's possibility the implementation of parsing become little complicated, but developer using clap can use intuitively.
This is also glad with the viewpoint of #1232 . Tricks like alternative App and get_matches_safe make the systematic completion harder.

This approach will be match with the @acheronfail 's case. I guess there're two types of app:

time othercommand ...
cargo run -- othercommand-or-options ...

The former is very limited. time do not have subcommands because subcommands conflict with othercommand. It can have only root App and parse will terminate with othercommand. Hence the above approach has no ambiguity.

The later is similar. Parsing will terminate with first "--".

@CreepySkeleton
Copy link
Contributor

@kenoss Thanks for looking into this, we appreciate that. I didn't quite get your comment, so let's try to get on the same page first.

The original problem is "lax position of arguments."

Could you please elaborate? I think this is where the misunderstanding comes from. The original problem in my understanding is to "loosely parse flags, options, and subcommands but ignore positional arguments entirely". The -- thing is another topic; I'm not sure why you brought it up.

Good. --project option belongs to Arg("gcloud") and --zone to Arg("ssh").

You have introduced "belongs to" relationship between two arguments. I can't wrap my head around it - arguments can't belong to each other. An argument can "belong" to a subcommand or the top-level app.

Maybe you meant App?


The complexity of implementation... Guys, I think we've all been missing the elephant in the room. The current parsing algorithm can't just "ignore" arguments, especially positional ones. Well, It can ignore, but it can't recover. I think this issue belongs to "postponed until after I finally rewrite the parser algorithm" category.

@kenoss
Copy link

kenoss commented Aug 5, 2020

Sorry for late reply.

The original problem in my understanding is to "loosely parse flags, options, and subcommands but ignore positional arguments entirely".

I read the issue again and I've got the point. Thanks to catch me up the argument.

The -- thing is another topic; I'm not sure why you brought it up.

Sorry. I misunderstood by get_matches_safe and app <args sent to subprocess>.

Maybe you meant App?

Yes.

I've found that this is a critical blocker of #1232 and I should parsing implementation.

kolloch added a commit to kolloch/clap that referenced this issue May 13, 2021
Implemented as AppSetting::Ignore errors as suggested by @CreepySkeleton
in clap-rs#1880 (comment).

This is not a complete implementation but it works already in
surprisingly many situations.

clap-rs#1880
kolloch added a commit to kolloch/clap that referenced this issue May 13, 2021
Implemented as AppSetting::Ignore errors as suggested by
@CreepySkeleton in
clap-rs#1880 (comment).

This is not a complete implementation but it works already in
surprisingly many situations.

clap-rs#1880
kolloch added a commit to kolloch/clap that referenced this issue Jun 12, 2021
Implemented as AppSetting::Ignore errors as suggested by
@CreepySkeleton in
clap-rs#1880 (comment).

This is not a complete implementation but it works already in
surprisingly many situations.

clap-rs#1880
kolloch added a commit to kolloch/clap that referenced this issue Jun 14, 2021
Implemented as AppSetting::Ignore errors as suggested by
@CreepySkeleton in
clap-rs#1880 (comment).

This is not a complete implementation but it works already in
surprisingly many situations.

clap-rs#1880
@epage
Copy link
Member

epage commented Dec 9, 2021

In the short term, we now have IgnoreErrors.

Longer term, I feel like #1404 is a more general solution and has existing precedence (Python's argparse at least) compared to a get_partial_matches(arg_name) which only works in a narrow set of cases.

If there is any concern with this, feel free to comment and we can re-open this.

@epage epage closed this as completed Dec 9, 2021
@alerque
Copy link
Contributor

alerque commented Dec 14, 2021

I don't see anything in #1404 covering the use cases I commented on above. An option would need to be added to stop the default short-circuit behavior where the app prints something and exits when encountering some flags, e.g. --help.

@epage
Copy link
Member

epage commented Dec 14, 2021

I don't see anything in #1404 covering the use cases I commented on above. An option would need to be added to stop the default short-circuit behavior where the app prints something and exits when encountering some flags, e.g. --help.

So if I understand, you would define one App with the localization flag(s), parse that to know how to handle it, and then do a full parse.

For the partial parse, couldn't you either set NoAutoHelp / NoAutoVersion or DisableHelpFlag / DisableVersionFlag / DisableHelpSubcommand?

@alerque
Copy link
Contributor

alerque commented Dec 14, 2021

For the partial parse, couldn't you either set NoAutoHelp / NoAutoVersion or DisableHelpFlag / DisableVersionFlag / DisableHelpSubcommand?

Honestly, I hadn't considered that — but it seems like quite a song and dance for what I want. First I'd have to set both of those (since they aren't the default), do one parsing pass, then unset both of them again for the second pass. I would suggest that should be an option on the parsing function (or an alternate function) to disable all current & future Auto____ functionality during a parse.

Should I open a new issue for that?

@epage
Copy link
Member

epage commented Dec 14, 2021

For the partial parse, couldn't you either set NoAutoHelp / NoAutoVersion or DisableHelpFlag / DisableVersionFlag / DisableHelpSubcommand?

Honestly, I hadn't considered that — but it seems like quite a song and dance for what I want. First I'd have to set both of those (since they aren't the default), do one parsing pass, then unset both of them again for the second pass. I would suggest that should be an option on the parsing function (or an alternate function) to disable all current & future Auto____ functionality during a parse.

Should I open a new issue for that?

You can, it would be a better place to continue to discuss it. With that said, I am concerned about the number of parsing functions we already have today and would be preferring we find ways to reduce the number, rather than increase it. As long as we provide the building blocks, that is my main concern. How nice we make things past that is dependent on how prevalent the workflow is.

Already, you either have to be setting IgnoreErrors or calling the partial parsing proposed in #1404 with subset of arguments you want verified. This seems like something that is already going to be a specialized parse pass, so the level of burden of setting a couple more flags seems low.

@epage epage added the S-wont-fix Status: Closed as there is no plan to fix this label Jan 11, 2022
@scottidler
Copy link

fn extract(item: (ContextKind, &ContextValue)) -> Option<&ContextValue> {                                                                                         
    let (k, v) = item;                                                                                                                                            
    if k == ContextKind::InvalidArg {                                                                                                                             
        return Some(v);                                                                                                                                           
    }                                                                                                                                                             
    None                                                                                                                                                          
}                                                                                                                                                                 
                                                                                                                                                                  
fn parse_known_args(cmd: &Command) -> Result<(ArgMatches, Vec<String>), Error> {                                                                                  
    let mut rem: Vec<String> = vec![];                                                                                                                            
    let mut args: Vec<String> = env::args().collect();                                                                                                            
    loop {                                                                                                                                                        
        match cmd.clone().try_get_matches_from(&args) {                                                                                                           
            Ok(matches) => {                                                                                                                                      
                return Ok((matches, rem));                                                                                                                        
            },                                                                                                                                                    
            Err(error) => {                                                                                                                                       
                match error.kind() {                                                                                                                              
                    ErrorKind::UnknownArgument => {                                                                                                               
                        let items = error.context().find_map(extract);                                                                                            
                        match items {                                                                                                                             
                            Some(x) => {                                                                                                                          
                                match x {                                                                                                                         
                                    ContextValue::String(s) => {                                                                                                  
                                        rem.push(s.to_owned());                                                                                                   
                                        args.retain(|a| a != s);                                                                                                  
                                    },                                                                                                                            
                                    _ => {                                                                                                                        
                                        return Err(error);                                                                                                        
                                    },                                                                                                                            
                                }                                                                                                                                 
                            },                                                                                                                                    
                            None => {                                                                                                                             
                                return Err(error);                                                                                                                
                            },                                                                                                                                    
                        }                                                                                                                                         
                    },                                                                                                                                            
                    _ => {                                                                                                                                        
                        return Err(error);                                                                                                                        
                    },                                                                                                                                            
                }                                                                                                                                                 
            },                                                                                                                                                    
        }                                                                                                                                                         
    }                                                                                                                                                             
}   
fn main() {                                                                                                                                                       
    let cmd = Command::new("cmd")                                                                                                                                 
        .allow_hyphen_values(true)                                                                                                                                
        .trailing_var_arg(true)                                                                                                                                   
        .disable_help_flag(true)                                                                                                                                  
        .disable_version_flag(true)                                                                                                                               
        .arg(                                                                                                                                                     
            Arg::new("config")                                                                                                                                    
                .takes_value(true)                                                                                                                                
                .short('c')                                                                                                                                       
                .long("config")                                                                                                                                   
                .help("path to config"),                                                                                                                          
        );                                                                                                                                                        
    let result = parse_known_args(&cmd);                                                                                                                          
    let (matches, rem) = result.unwrap();                                                                                                                         
    println!("matches={:#?}", matches);                                                                                                                           
    println!("rem={:#?}", rem);   
    let matches = cmd.get_matches_from(&rem);
}

For anyone who comes across this and is looking for an equivalent to Python Argparse's parse_known_args, I hacked this code together. I am sure it could be improved as I am new to Rust. It would be nice to have some ergonomics in the Clap code base for this use case, but until then maybe this provides some usefulness to someone.

For anyone who doesn't understand why you would want this, it is a VERY common pattern for me to supply the config file in the first pass, that will either effect the default values if not alter the program's options entirely.

chooglen added a commit to chooglen/jj that referenced this issue Dec 1, 2022
Clap bails parsing when an "error" is encountered, e.g. a subcommand is missing,
"--help" is passed, or the "help" subcommand is invoked. This means that the
current approach of parsing args does not handle flags like `--no-pager` or
`--color` when an error is encountered.

Fix this by separating early args into their own struct and preprocessing them
using `ignore_errors` (per clap-rs/clap#1880).

The early args are in a new `EarlyArgs` struct because of a known bug where
`ignore_errors` causes default values not to be respected
(clap-rs/clap#4391 specifically calls out bool, but
strings may also be ignored), so when `ignore_errors` is given, the default
values will be missing and parsing will fail unless the right arg types are used
(e.g`. Option`). By parsing only early args (using the new struct) we only need
to adjust `no_pager`, instead of adjusting all args with a default value.
chooglen added a commit to chooglen/jj that referenced this issue Dec 12, 2022
Clap bails parsing when an "error" is encountered, e.g. a subcommand is missing,
"--help" is passed, or the "help" subcommand is invoked. This means that the
current approach of parsing args does not handle flags like `--no-pager` or
`--color` when an error is encountered.

Fix this by separating early args into their own struct and preprocessing them
using `ignore_errors` (per clap-rs/clap#1880).

The early args are in a new `EarlyArgs` struct because of a known bug where
`ignore_errors` causes default values not to be respected
(clap-rs/clap#4391 specifically calls out bool, but
strings may also be ignored), so when `ignore_errors` is given, the default
values will be missing and parsing will fail unless the right arg types are used
(e.g`. Option`). By parsing only early args (using the new struct) we only need
to adjust `no_pager`, instead of adjusting all args with a default value.
chooglen added a commit to chooglen/jj that referenced this issue Dec 13, 2022
Clap bails parsing when an "error" is encountered, e.g. a subcommand is missing,
"--help" is passed, or the "help" subcommand is invoked. This means that the
current approach of parsing args does not handle flags like `--no-pager` or
`--color` when an error is encountered.

Fix this by separating early args into their own struct and preprocessing them
using `ignore_errors` (per clap-rs/clap#1880).

The early args are in a new `EarlyArgs` struct because of a known bug where
`ignore_errors` causes default values not to be respected
(clap-rs/clap#4391 specifically calls out bool, but
strings may also be ignored), so when `ignore_errors` is given, the default
values will be missing and parsing will fail unless the right arg types are used
(e.g`. Option`). By parsing only early args (using the new struct) we only need
to adjust `no_pager`, instead of adjusting all args with a default value.
chooglen added a commit to chooglen/jj that referenced this issue Dec 13, 2022
Clap bails parsing when an "error" is encountered, e.g. a subcommand is missing,
"--help" is passed, or the "help" subcommand is invoked. This means that the
current approach of parsing args does not handle flags like `--no-pager` or
`--color` when an error is encountered.

Fix this by separating early args into their own struct and preprocessing them
using `ignore_errors` (per clap-rs/clap#1880).

The early args are in a new `EarlyArgs` struct because of a known bug where
`ignore_errors` causes default values not to be respected
(clap-rs/clap#4391 specifically calls out bool, but
strings may also be ignored), so when `ignore_errors` is given, the default
values will be missing and parsing will fail unless the right arg types are used
(e.g`. Option`). By parsing only early args (using the new struct) we only need
to adjust `no_pager`, instead of adjusting all args with a default value.
chooglen added a commit to chooglen/jj that referenced this issue Dec 13, 2022
Clap bails parsing when an "error" is encountered, e.g. a subcommand is missing,
"--help" is passed, or the "help" subcommand is invoked. This means that the
current approach of parsing args does not handle flags like `--no-pager` or
`--color` when an error is encountered.

Fix this by separating early args into their own struct and preprocessing them
using `ignore_errors` (per clap-rs/clap#1880).

The early args are in a new `EarlyArgs` struct because of a known bug where
`ignore_errors` causes default values not to be respected
(clap-rs/clap#4391 specifically calls out bool, but
strings may also be ignored), so when `ignore_errors` is given, the default
values will be missing and parsing will fail unless the right arg types are used
(e.g`. Option`). By parsing only early args (using the new struct) we only need
to adjust `no_pager`, instead of adjusting all args with a default value.
chooglen added a commit to jj-vcs/jj that referenced this issue Dec 13, 2022
Clap bails parsing when an "error" is encountered, e.g. a subcommand is missing,
"--help" is passed, or the "help" subcommand is invoked. This means that the
current approach of parsing args does not handle flags like `--no-pager` or
`--color` when an error is encountered.

Fix this by separating early args into their own struct and preprocessing them
using `ignore_errors` (per clap-rs/clap#1880).

The early args are in a new `EarlyArgs` struct because of a known bug where
`ignore_errors` causes default values not to be respected
(clap-rs/clap#4391 specifically calls out bool, but
strings may also be ignored), so when `ignore_errors` is given, the default
values will be missing and parsing will fail unless the right arg types are used
(e.g`. Option`). By parsing only early args (using the new struct) we only need
to adjust `no_pager`, instead of adjusting all args with a default value.
@novafacing
Copy link

novafacing commented Sep 9, 2023

Sorry for the long time bump, I liked @scottidler's answer here so I've made it a little more generic. Slap this at the top of your cli code and you're good to go. My use case is for wrapping clang for LibAFL use -- I want to extract some arguments and pass everything else down to the clang command I'm wrapping.

use anyhow::{anyhow, bail, Result};
use clap::{
    error::{ContextKind, ContextValue, ErrorKind},
    ArgMatches, Command, CommandFactory, FromArgMatches, Parser,
};
use std::env::args;

#[derive(Debug, Clone)]
struct Known<T>
where
    T: FromArgMatches,
{
    matches: T,
    rest: Vec<String>,
}

impl<T> Known<T>
where
    T: FromArgMatches,
{
    pub fn new<I, S>(matches: ArgMatches, rest: I) -> Result<Self>
    where
        I: IntoIterator<Item = S>,
        S: AsRef<str>,
    {
        Ok(Self {
            matches: T::from_arg_matches(&matches)?,
            rest: rest.into_iter().map(|a| a.as_ref().to_string()).collect(),
        })
    }
}

trait ParseKnown: FromArgMatches {
    fn parse_known() -> Result<Known<Self>>;
}

impl<T> ParseKnown for T
where
    T: CommandFactory + FromArgMatches,
{
    fn parse_known() -> Result<Known<T>> {
        let command = Self::command();
        let mut rest = Vec::new();
        let mut args = args().collect::<Vec<_>>();

        loop {
            match command.clone().try_get_matches_from(&args) {
                Ok(matches) => {
                    return Known::new(matches, rest);
                }
                Err(e) if matches!(e.kind(), ErrorKind::UnknownArgument) => {
                    if let Some(ContextValue::String(v)) = e
                        .context()
                        .find_map(|(k, v)| matches!(k, ContextKind::InvalidArg).then_some(v))
                    {
                        let unknown = args
                            .iter()
                            .find(|a| a.starts_with(v))
                            .cloned()
                            .ok_or_else(|| anyhow!("No argument starts with {}", v))?;

                        if args.contains(&unknown) {
                            args.retain(|a| a != &unknown);
                            rest.push(unknown);
                        } else {
                            bail!("Got unknown argument {} but it is not in args at all.", v);
                        }
                    } else {
                        bail!("No string value found for unknown argument: {}", e);
                    }
                }
                Err(e) => bail!("Error getting matches from args '{:?}': {}", args, e),
            }
        }
    }
}

#[derive(Parser, Debug, Clone)]
#[clap(
    allow_hyphen_values = true,
    trailing_var_arg = true,
    disable_help_flag = true,
    disable_version_flag = true
)]
struct Args {
    #[arg(long)]
    libafl_no_link: bool,
    #[arg(long)]
    libafl: bool,
    #[arg(long)]
    libafl_ignore_configurations: bool,
    #[arg(long, value_delimiter = ',')]
    libafl_configurations: Vec<String>,
}

fn main() -> Result<()> {
    let args = Args::parse_known()?;
    println!("{:#?}", args);
    Ok(())
}

Handles e.g. cargo run -- -x --libafl -y -zexecstack -fsanitize=fuzzer-no-link -fno-common --libafl-ignore-configurations (the above example will fail with single-hyphen commands)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parsing Area: Parser's logic and needs it changed somehow. C-enhancement Category: Raise on the bar on expectations S-wont-fix Status: Closed as there is no plan to fix this
Projects
None yet
Development

No branches or pull requests

10 participants