Skip to content

Conversation

@Sable-20
Copy link

@Sable-20 Sable-20 commented Mar 28, 2024

Complete rewrite of wordlistctl

This is a completely rust based rewrite of wordlistctl.

Original authors left in Cargo.toml as "blackarch" with me (the main contributor) written in separately.

The goal of this rewrite is to improve the speed of wordlistctl and implement any missing features or fix existing issues.

This will also improve code readability through Rust's strict type system along with strict formatting rules enforced by rustfmt and the linters being run. Code security is also paramount and has a workflow specifically for managing code security.

Manpage generation will also because automatic so that it can be automated and consistent.

Roadmap:

  • Document existing structs and enums that involve cli options
  • Eventually remove repo.json after ensuring all data is correct
  • Document all functions
  • Write contribution convention and instruction file
  • Integrate cargo doc function and manpage (using clap_mangen ) generation into artifact generation OR pre-commit hooks
  • Finish adding all commands
  • Implement command matching
  • Add functions
  • Implement multithreading and retrieving files as chunked data maybe via rayon and its par_iter style, considering using Tokio instead of rayon
  • Implement custom error types REMOVED FROM GOALS
  • Remove current bash completions and have them auto generated but a clap subproject (probably clap_complete)
  • Implement unit tests for all relevant modules
  • More....

Sable-20 added 26 commits March 20, 2024 14:22
Creating config file for rust rewrite of wordlistctl
Repo file replaced with more rust like toml file format, more human readable and easier to edit
Replace current manpage with manpage generated by `clap_mangen` -- https://crates.io/crates/clap_mangen

Mangen automatically generates ROFF files according to attributes written in code
Unneeded in Rust projects
repo.json moved to config/repo.toml
Begin rewrite in rust using clap builder as opposed to derive, allows more fine grained control over cli
Improper gitignore fixed
Unneeded `use` statement made to have access to enum `Units`
Going to test arg! vs Arg builder. Also fixed floating point number issues in units.rs
added repo structs auto-generated by `https://transform.tools/json-to-rust-serde` 

further edits should be done by hand by adding to the `repo.toml` and then added the needed structs in `repo.rs`

follow same naming scheme as autogenerated
unneeded after further testing
Worked on reading config but gave up after too many lifetime issues, did a bunch of work on parsing data and getting stuff back from file, can now read file, made minor updates to units. Finished up most of the args for `fetch`

removed mod which created redundant references such as `units::units::Units` it now just uses `units::Units`
Add code scanning GitHub action
no longer scans reference file `wordlistctl.py`
No longer uses builder format, instead uses derive from `clap` with multiple subcommands. Does compile, syntactically correct, however tests not yet written to test functions. Unit tests need to be implemented. Also need to finish implementing the rest of the arguments before incrementing minor version from 0.10.0-alpha.1.4.0 to 0.11.0-alpha.1.0.0

Haven't been consistent with updating semver each push, will be maintained from now on.
Merge dev into master to mark progress
@Sable-20
Copy link
Author

NOTE: in case it wasn't clear in the original comment, don't merge this yet it is a breaking change and the tool will not work until everything is fully implemented, this is merely to track the progress in reference to the main repository, there is much to be done and there will be likely be breaking changes.

We are still only on version 0.10.0-alpha.1.4.0

There just so much crap that I did but still needs to be done. I fixed and simplified a lot of stuff in `repo.rs` but theres still more to be done.

`repo.toml` file modified via regex find of `/^(size = )"(\d*.\d*)"/g` and replace regex of `/$1$2/`

Get back to me in a few days after I've slept
@Sable-20
Copy link
Author

Sable-20 commented Apr 9, 2024

A low of work has gone into this and we're almost there, we have to review the search and list options because while rewriting it feels almost redundant and it's possible to either merge them or do away with one of them.

Like I stated in my commit I'm going to do a proper code review because it is late and I may just be a bit crazy right now!

Sable-20 added 15 commits April 10, 2024 17:56
Started movement from Docker and local environments to nix so that development environments are reproducible as well as build environments. Cargo.lock was also added back so that crate versions can be locked.
Realized I was trying to fix the devShell which was already working when it was the build command failing
@Sable-20
Copy link
Author

As of the most recent commit (Sable-20@7520352) my rewrite has begun it's migration away from Docker towards Nix for reproducible dev environments as well as build environments

Backtracking, moving away from derive feature so that mangen and complete can easily be generated, god knows Derive makes it a pain in the ass to work with but it just does, and unless I want a 600 line command file I have to switch to buider
I can't do anymore professional commit messages tonight
replaced completions with better ones and added a man page!
Did some refactoring and got stuff working, have to solve the issue of where to put files if you don't have root permissions, probably just warn user on where they will go and tell them to use the intended directory of `/usr/share/wordlists/` you have to be root
makes things look prettier and messed around with some functions again
@Sable-20
Copy link
Author

It essentially works

I'd like to ask for a review and for feedback on the idea from a maintainer, sorry for the highlight but @dualfade you're the only maintainer I happen to have noticed from the matrix

@ikstream
Copy link
Member

Hey @Sable-20 I will try to have a look at it. This will take a bit of time, as it is quite a big one, but what I have seen so far looks very good.
Thanks a lot for the work you already put into this rewrite.

@Sable-20
Copy link
Author

Hey @Sable-20 I will try to have a look at it. This will take a bit of time, as it is quite a big one, but what I have seen so far looks very good. Thanks a lot for the work you already put into this rewrite.

I am aware it's a lot of code, I tried to rewrite in a more modular way that allows for pulling and changing, while it is essentially working there will likely in the future (hopefully near) be minor refactoring to allow separation of modules in a more efficient and organized fashion than currently.

After a few more changes I will update the version to 1.0.0, these changes will be cleaning up commented out code that is unneeded, optimizations of iterators, documentation of the builder and cleaning up the Nix flakes file. All of which are fairly minor and shouldn't require too extensive of a review after this one!

A contributing file and clippy linters will also be implemented, again, fairly minor changes that should require little to no review.

Thank you for taking the time to read my code, I apologize for how it's a tad bit messy.

Copy link

@pidone pidone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. First of all, thanks again for your PR and the rewrite in Rust and sorry that it took a while to review. It's pretty cool that the shell completions and manpage are generated automatically. Also that you directly considered docker and nixOS. I'm not really familiar with nixOS so I can't really comment on that part.

I have noted everything that I have noticed. But I would see the following things as essential:

  • we should not rename the tool, it should rather function as a drop-in replacment
  • The optional parameters are not really optional, because they are directly unwrapped.
  • The file structure of the wordlists directory is different in the implementation than in the python version. This could lead to all possible wordlists having to be downloaded again.
  • The config is currently being compiled in. This should be loaded at runtime.
  • The cli interface should differ as little as possible from the python version.

Comment on lines +6 to +12
# sed command
# -i to change it in the file
# s for substitution
# \(.*\) detects the version and saves it to capture group 1
# features = \[\(.*\)\] detects features and saves to capture group 2
# replacement just saves the version, disables default features, and adds in features http2, and rustls-tls so that this can be performed in docker
RUN sed -i 's/reqwest = { version = "\(.*\)", features = \[\(.*\)\] }/reqwest = { version = "\1", default-features = false, features = \["http2", "rustls-tls", \2\] }/' Cargo.toml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice workaround. But you can simple add openssl and ca-certificates and it should work without the native rusttls

Comment on lines +1 to +7
[[wordlists]]
[wordlists.deepmagic]
name = "deepmagic"
url = "https://raw.githubusercontent.com/danielmiessler/SecLists/master/Discovery/DNS/deepmagic.com-prefixes-top50000.txt"
size = 605.75
unit = "kb"
group = "discovery"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you choice a list of an large table here? (in rust words a vec of a hashmap).
In my opinion simply a list of tables would be better:

[[wordlists]]
name = "..."
...

[[wordlists]]
name = "..."

and so on.

This is a little bit more inefficient if you only search for a single list by name, but it won't matter here and the data format would be a bit nicer in my opinion.

}

pub fn load_config() -> Result<Config> {
let config = toml::from_str::<Config>(include_str!("../config/config.toml"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes the config at compile-time. This means that if you change the config after compiling, it would have no effect, since it has already been baked into the binary. In my opinion, this misses the point of a config. Here the config should be loaded at runtime. Preferably from XDG_CONFIG_HOME. There is also the dirs crate that selects the correct config for you depending on the platform.

Comment on lines +64 to +68
assert!(
!std::path::Path::new(&format!("/{}/{}", base_dir, list.get_name()))
.try_exists()
.expect("File already exists")
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it necessary to specify base-dir as an absolute path. Furthermore, the entire download aborts because the error is not handled in main.

Suggested change
assert!(
!std::path::Path::new(&format!("/{}/{}", base_dir, list.get_name()))
.try_exists()
.expect("File already exists")
);
let file = Path::new(base_dir).join(list.get_name());
if file.exists() {
info!("{} already exists -- skipping", list.get_name());
return Ok(());
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the python version created subdirectories for the groups. We should keep the file structure as it was.

Comment on lines +110 to +127
for range in PartialRangeIter::new(0, (length - 1).try_into().unwrap(), chunk_size as usize)? {
trace!("Downloading range: {:?}", range);
let res = client
.get(list.get_url())
.header(RANGE, range)
.send()
.await?;

let status = res.status();
if !(status == StatusCode::OK || status == StatusCode::PARTIAL_CONTENT) {
return Err(eyre!(
"Failed to download file-- server respnse: {}",
status
));
}
let bytes = res.bytes().await?;
std::io::copy(&mut bytes.as_ref(), &mut output_file)?;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you even split the download of a file so that it could be distributed to the workers. However, you have to be careful with the order. That's why I would only let the workers download one whole file at a time. And distribute the individual wordlists to the workers.

Comment on lines +389 to +390
"/usr/share/wordlists",
"rwordlistctl/0.1.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also use base_dir and user_agent here.

Comment on lines +407 to +408
"/usr/share/wordlists",
"rwordlistctl/0.1.0",
Copy link

@pidone pidone Mar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines +23 to +41
pub fn get_size(&self) -> f64 {
self.size
}

pub fn get_unit(&self) -> &str {
&self.unit
}

pub fn get_url(&self) -> &str {
&self.url
}

pub fn get_group(&self) -> &str {
&self.group
}

pub fn get_name(&self) -> &str {
&self.name
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why you are defining getter here and not making the fields public?

}

fn load_repo() -> Result<Repo> {
let repo = toml::from_str::<Repo>(include_str!("../config/repo.toml"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies here as for the config. The repos are read in once during compilation and are then fixed in the binary. I don't see it as critical with the repo and could also understand the argument that we can only extend the repos using PRs. But we should be aware of this.

(size, unit)
}

pub async fn convert_size(size: u64, unit: Units) -> u64 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must this function be async?

@ikstream
Copy link
Member

Thanks @pidone for taking the time to review the pull request, much appreciated.

@Sable-20 if you’re still up for it (sorry for the long time I haven’t responded) could you please have a look at the comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants