Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoclick Support #729

Merged
merged 15 commits into from
Jan 16, 2025
Merged

Autoclick Support #729

merged 15 commits into from
Jan 16, 2025

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Dec 3, 2024

Adds support for autoclick behavior:

  • Adds new autoclick behavior option to --behaviors, but not enabling by default
  • Adds support for new exposed function __bx_addSet which allows autoclick behavior to persist state about links that have already been clicked to avoid duplicates, only used if link has an href
  • Adds a new pageFinished flag on the worker state.
  • Adds a on('dialog') handler to reject onbeforeunload page navigations, when in behavior (page not finished), but accept when page is finished - to allow navigation away only when behaviors are done
  • Update to browsertrix-behaviors 0.7.0, which supports autoclick
  • Add --clickSelector option to customize elements that will be clicked, defaulting to a.
  • Add --linkSelector as alias for --selectLinks for consistency
  • Unknown options for --behaviors printed as warnings, instead of hard exit, for forward compatibility for new behavior types in the future

Fixes #728, also #216, #665, #31

@ikreymer ikreymer requested a review from tw4l December 3, 2024 23:21
@ikreymer
Copy link
Member Author

ikreymer commented Dec 3, 2024

Some examples of sites where the autoclick behavior is helpful:

@tw4l
Copy link
Member

tw4l commented Dec 4, 2024

Thanks for the sites to test out! I can confirm that it's working in the ways you describe for each, but I'm also noticing a whole lot of these messages in the logs for all of the autoclick-enabled crawls:

{"timestamp":"2024-12-04T21:08:45.377Z","logLevel":"warn","context":"behavior","message":"Behavior run partially failed","details":{"reason":{"type":"exception","message":"Protocol error (Runtime.evaluate): Execution context was destroyed.","stack":"ProtocolError: Protocol error (Runtime.evaluate): Execution context was destroyed.\n    at <instance_members_initializer> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:90:14)\n    at new Callback (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:94:16)\n    at CallbackRegistry.create (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:20:26)\n    at Connection._rawSend (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:86:26)\n    at CdpCDPSession.send (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:63:33)\n    at Browser.evaluateWithCLI (file:///app/dist/util/browser.js:208:56)\n    at file:///app/dist/crawler.js:747:89\n    at Array.map (<anonymous>)\n    at Crawler.runBehaviors (file:///app/dist/crawler.js:747:61)\n    at Crawler.doPostLoadActions (file:///app/dist/crawler.js:669:49)"},"page":"https://www.bs.ch/themen","workerid":0}}

I haven't caught exactly why yet, but worth looking into.

For selectors I like the idea of a --clickLinks arg!

don't duplicate WorkerOpts, expect to be updated!
@ikreymer
Copy link
Member Author

ikreymer commented Dec 5, 2024

https://www.bs.ch/themen

Thanks for the sites to test out! I can confirm that it's working in the ways you describe for each, but I'm also noticing a whole lot of these messages in the logs for all of the autoclick-enabled crawls:

{"timestamp":"2024-12-04T21:08:45.377Z","logLevel":"warn","context":"behavior","message":"Behavior run partially failed","details":{"reason":{"type":"exception","message":"Protocol error (Runtime.evaluate): Execution context was destroyed.","stack":"ProtocolError: Protocol error (Runtime.evaluate): Execution context was destroyed.\n    at <instance_members_initializer> (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:90:14)\n    at new Callback (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:94:16)\n    at CallbackRegistry.create (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/common/CallbackRegistry.js:20:26)\n    at Connection._rawSend (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/Connection.js:86:26)\n    at CdpCDPSession.send (file:///app/node_modules/puppeteer-core/lib/esm/puppeteer/cdp/CDPSession.js:63:33)\n    at Browser.evaluateWithCLI (file:///app/dist/util/browser.js:208:56)\n    at file:///app/dist/crawler.js:747:89\n    at Array.map (<anonymous>)\n    at Crawler.runBehaviors (file:///app/dist/crawler.js:747:61)\n    at Crawler.doPostLoadActions (file:///app/dist/crawler.js:669:49)"},"page":"https://www.bs.ch/themen","workerid":0}}

I haven't caught exactly why yet, but worth looking into.

Good catch! Turns out blocking the page unload was not actually working, and actually workeropts were being unnecessarily duplicated, see: 8a4e4db - should be fixed now!

@tw4l
Copy link
Member

tw4l commented Dec 5, 2024

Thanks for that change! :) Not sure whether to actually approve this PR since we won't be merging this as-is with everything hardcoded and still need to add the selector flag, but based on testing I think this behavior is working as intended and good to add!

- intercept targetcreated to avoid new windows opened from crawler page
- also intercept window.pageOpen to add as new URLs to crawl instead
- profiles: better interception of new window openings, also close additional tabs
- bump version to 1.5.0-beta.0
… 2.2.5, update puppeteer-core to 24.1.0

support autoclick behavior as option for --behaviors
ignore unknown behaviors passed to --behaviors intsead of exiting to improve forward compatibility
…default to 'a'

add --linkSelector as alias for --selectLinks for consistency
update cli docs
@ikreymer ikreymer marked this pull request as ready for review January 16, 2025 04:34
@ikreymer
Copy link
Member Author

Ready for final review, not enabling autoclick by default for now.

@ikreymer ikreymer merged commit b7150f1 into main Jan 16, 2025
4 checks passed
@ikreymer ikreymer deleted the autoclick-work branch January 16, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support for clicking on links / other elements.
2 participants