Conflicting results between methods 'canCrawl', 'check' and 'parse' #77

sanderheilbron · 2017-01-01T23:48:17Z

The output of both methods canCrawl and check does not match with the result of parsing (parse) the content of a robot.txt file.

Example:

User-Agent: *
Disallow: /page-a

User-Agent: *
Disallow: /page-b

User-Agent: Googlebot
Crawl-Delay: 20

When checking if user-agent Googlebot is allowed to crawl /page-a Robotto provides the following results:

isAllowed: false
check: false

Results of parse:

"*": {
  "allow": [],
  "disallow": [
    "/page-a",
    "/page-b"
  ]
},
"Googlebot": {
  "allow": [],
  "disallow": []
}

Following to how Googlebot handles robots.txt files, both methods isAllowed and check should result in true.

The text was updated successfully, but these errors were encountered:

lucasfcosta · 2017-01-02T19:35:56Z

Hi @sanderheilbron, thanks for your issue!
This issue is caused by the same reason as #75.

Merging #76 will fix this. I'll merge that and release a new version.
Let me know if the problem persists.

sanderheilbron · 2017-01-02T19:52:00Z

Hi @lucasfcosta, thanks for the update and your effort in fixing these issues!

Yesterday I did some local tests with the fix for #76 and noticed this issue. Are you sure it will be fixed by merging #76?

lucasfcosta · 2017-01-02T19:55:21Z

@sanderheilbron Yup! This was happening because whenever robotto found an user-agent line it would create a new object to hold those user-agent's rules.
Right now we will check if rules exists whenever we find an user-agent line instead of always creating a new object for it.

I'll be releasing a fix in a few minutes.

lucasfcosta · 2017-01-02T19:58:25Z

@sanderheilbron Done!
Robotto has just been released. The version with this fix is 1.0.15.
Let me know if you need anything else.
I'm always happy to be able to help 😄

sanderheilbron · 2017-01-02T20:00:37Z

@lucasfcosta Thanks!

sanderheilbron · 2017-01-02T22:21:46Z

Hi @lucasfcosta, just did some tests with v1.0.15, and unfortunately got the same results.

lucasfcosta · 2017-01-03T22:03:01Z

Hi @sanderheilbron, thanks for getting in touch.
Are you using the exact same input for your tests? I'll do some further investigation.

EDIT: Actually I think this behavior is correct.
Given what you have just posted I think that * should be applied to every user-agent so Googlebot really should not be allowed to access that page since it has been disabled for every user-agent. Am I right?

sanderheilbron · 2017-01-03T23:25:39Z

Hi @lucasfcosta, you can test the behaviour of Googlebot with the robots.txt-tester inside Google Search Console (https://www.google.com/webmasters/tools/robots-testing-tool).

Also you can use some other tools which follow how Googlebot handles robots.txt files:

lucasfcosta · 2017-01-04T21:23:07Z

@sanderheilbron thank you very much!
That's great info! Hopefully I'll have some time to refactor this module entirely until the end of the month. I'll let you know when this issue is fixed.

For now I will reopen it.

Thanks for your help and sorry for not being able to solve it right now. However, I promise I'll work on this whenever I have some spare time.

sanderheilbron · 2017-01-05T09:23:09Z

Thanks @lucasfcosta, I appreciate your time and effort.

sanderheilbron changed the title ~~Conflicting results between methods ‘canCrawl', ‘check' and ‘parse'~~ Conflicting results between methods 'canCrawl', 'check' and 'parse' Jan 1, 2017

lucasfcosta closed this as completed Jan 2, 2017

lucasfcosta reopened this Jan 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflicting results between methods 'canCrawl', 'check' and 'parse' #77

Conflicting results between methods 'canCrawl', 'check' and 'parse' #77

sanderheilbron commented Jan 1, 2017

lucasfcosta commented Jan 2, 2017

sanderheilbron commented Jan 2, 2017

lucasfcosta commented Jan 2, 2017

lucasfcosta commented Jan 2, 2017

sanderheilbron commented Jan 2, 2017

sanderheilbron commented Jan 2, 2017

lucasfcosta commented Jan 3, 2017 •

edited

Loading

sanderheilbron commented Jan 3, 2017

lucasfcosta commented Jan 4, 2017 •

edited

Loading

sanderheilbron commented Jan 5, 2017

Conflicting results between methods 'canCrawl', 'check' and 'parse' #77

Conflicting results between methods 'canCrawl', 'check' and 'parse' #77

Comments

sanderheilbron commented Jan 1, 2017

lucasfcosta commented Jan 2, 2017

sanderheilbron commented Jan 2, 2017

lucasfcosta commented Jan 2, 2017

lucasfcosta commented Jan 2, 2017

sanderheilbron commented Jan 2, 2017

sanderheilbron commented Jan 2, 2017

lucasfcosta commented Jan 3, 2017 • edited Loading

sanderheilbron commented Jan 3, 2017

lucasfcosta commented Jan 4, 2017 • edited Loading

sanderheilbron commented Jan 5, 2017

lucasfcosta commented Jan 3, 2017 •

edited

Loading

lucasfcosta commented Jan 4, 2017 •

edited

Loading