Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also use query and fragment when matching URIs #13

Merged
merged 1 commit into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.1.2] - 2025-01-27
### Fixed
- When matching URIs against allow/disallow rules, the library previously used explicitly only the path part of the URI. Fixed it to use path, query and fragment.

## [1.1.1] - 2022-11-08
### Fixed
- The `Parser` now also trims hidden whitespace characters that aren't covered by PHP's `trim()` function by default. Such characters at the beginning of a line can cause parsing to fail, because it's important that user-agent and rule lines actually start with the corresponding keywords.
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2024 Christian Olear
Copyright (c) 2025 Christian Olear

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
Expand Down
10 changes: 3 additions & 7 deletions src/RulePattern.php
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,11 @@ public function pattern(): string
*/
public function matches(string|Url $uri): bool
{
$path = $uri instanceof Url ? $uri->path() : Url::parse($uri)->path();
$pathQueryFragment = $uri instanceof Url ? $uri->relative() : Url::parse($uri)->relative();

if (!is_string($path)) {
return false;
}

$path = Encoding::decodePercentEncodedAsciiCharactersInPath($path);
$pathQueryFragment = Encoding::decodePercentEncodedAsciiCharactersInPath($pathQueryFragment);

return preg_match($this->preparedRegexPattern(), $path) === 1;
return preg_match($this->preparedRegexPattern(), $pathQueryFragment) === 1;
}

private function preparedRegexPattern(): string
Expand Down
16 changes: 16 additions & 0 deletions tests/ParserTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,22 @@ public function test_parse_sitemap_lines(): void
], $robotsTxt->sitemaps());
}

public function test_it_uses_not_only_the_path_but_also_the_query_when_matching(): void
{
$robotsTxtContent = <<<ROBOTSTXT
User-agent: *
Disallow: /?foo
ROBOTSTXT;

$robotsTxt = (new Parser())->parse($robotsTxtContent);

$this->assertFalse($robotsTxt->isAllowed('/?foo', 'MyBot'));

$this->assertFalse($robotsTxt->isAllowed('/?foo=bar', 'MyBot'));

$this->assertTrue($robotsTxt->isAllowed('/yo?foo=bar', 'MyBot'));
}

/**
* @param string[] $expected
* @param RulePattern[] $actual
Expand Down
Loading