UrlSource treats any HTTP response as valid content, including 404s and 500s. If the server returns an error page, shaha happily builds a database from the HTML error body.
// url.rs — no status check after fetch
let response = reqwest::blocking::get(&url)?;
let content = response.text()?;
This can produce confusing results — you get a database with hashes of <html>404 Not Found</html> instead of actual words. There's even a test (test_url_source_http_500_succeeds) that documents this behavior as intentional, but it seems more like an oversight.
Expected: non-2xx responses should return an error (or at least a warning). Something like response.error_for_status()? would handle the common cases.
UrlSource treats any HTTP response as valid content, including 404s and 500s. If the server returns an error page, shaha happily builds a database from the HTML error body.
This can produce confusing results — you get a database with hashes of
<html>404 Not Found</html>instead of actual words. There's even a test (test_url_source_http_500_succeeds) that documents this behavior as intentional, but it seems more like an oversight.Expected: non-2xx responses should return an error (or at least a warning). Something like
response.error_for_status()?would handle the common cases.