Skip to content

Commit c12e56b

Browse files
elacuestasanzenwin
andauthored
Windows support (#276)
* Supporting for Windows * Supporting for Windows * Supporting for Windows * Upload coverage report * Upload coverage report * up * No need for error details * Revert "No need for error details" This reverts commit a6b9f6e. * Restore original __all__ * Use platform.system(), remove Python version check * Make black happy * Black & typing adjustments * _WindowsAdapter class * Remove test markers * Decorator to adapt tests for Windows * Move _WindowsAdapter to _utils module * Adapt all tests for Windows * Update readme about Windows * Placeholder changelog entry for upcoming release * Rename coverage report CI step * Add pull request id to changelog * CI: add CODECOV_TOKEN to env (Windows) * Run twisted test on Windows too * Readme adjustments * Remove unused check for Deferred * asyncio reactor is not a requirement on Windows --------- Co-authored-by: sanzenwin <[email protected]>
1 parent ff06d5c commit c12e56b

15 files changed

+210
-44
lines changed

.github/workflows/tests.yml

+11
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ jobs:
1313
include:
1414
- os: macos-latest
1515
python-version: "3.12"
16+
- os: windows-latest
17+
python-version: "3.12"
1618

1719
steps:
1820
- uses: actions/checkout@v4
@@ -48,3 +50,12 @@ jobs:
4850
curl -Os https://uploader.codecov.io/latest/macos/codecov
4951
chmod +x codecov
5052
./codecov
53+
54+
- name: Upload coverage report (Windows)
55+
if: runner.os == 'Windows'
56+
env:
57+
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
58+
run: |
59+
$ProgressPreference = 'SilentlyContinue'
60+
Invoke-WebRequest -Uri https://uploader.codecov.io/latest/windows/codecov.exe -Outfile codecov.exe
61+
.\codecov.exe

README.md

+31-18
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,13 @@ See the [changelog](docs/changelog.md) document.
5656

5757
## Activation
5858

59+
### Download handler
60+
5961
Replace the default `http` and/or `https` Download Handlers through
6062
[`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/latest/topics/settings.html):
6163

6264
```python
65+
# settings.py
6366
DOWNLOAD_HANDLERS = {
6467
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
6568
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
@@ -70,12 +73,19 @@ Note that the `ScrapyPlaywrightDownloadHandler` class inherits from the default
7073
`http/https` handler. Unless explicitly marked (see [Basic usage](#basic-usage)),
7174
requests will be processed by the regular Scrapy download handler.
7275

73-
Also, be sure to [install the `asyncio`-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):
76+
77+
### Twisted reactor
78+
79+
When running on GNU/Linux or macOS you'll need to
80+
[install the `asyncio`-based Twisted reactor](https://docs.scrapy.org/en/latest/topics/asyncio.html#installing-the-asyncio-reactor):
7481

7582
```python
83+
# settings.py
7684
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
7785
```
7886

87+
This is not a requirement on Windows (see [Windows support](#windows-support))
88+
7989

8090
## Basic usage
8191

@@ -112,6 +122,20 @@ does not match the running Browser. If you prefer the `User-Agent` sent by
112122
default by the specific browser you're using, set the Scrapy user agent to `None`.
113123

114124

125+
## Windows support
126+
127+
Windows support is possible by running Playwright in a `ProactorEventLoop` in a separate thread.
128+
This is necessary because it's not possible to run Playwright in the same
129+
asyncio event loop as the Scrapy crawler:
130+
* Playwright runs the driver in a subprocess. Source:
131+
[Playwright repository](https://github.com/microsoft/playwright-python/blob/v1.44.0/playwright/_impl/_transport.py#L120-L130).
132+
* "On Windows, the default event loop `ProactorEventLoop` supports subprocesses,
133+
whereas `SelectorEventLoop` does not". Source:
134+
[Python docs](https://docs.python.org/3/library/asyncio-platforms.html#asyncio-windows-subprocess).
135+
* Twisted's `asyncio` reactor requires the `SelectorEventLoop`. Source:
136+
[Twisted repository](https://github.com/twisted/twisted/blob/twisted-24.3.0/src/twisted/internet/asyncioreactor.py#L31)
137+
138+
115139
## Supported [settings](https://docs.scrapy.org/en/latest/topics/settings.html)
116140

117141
### `PLAYWRIGHT_BROWSER_TYPE`
@@ -851,6 +875,12 @@ Refer to the
851875
[upstream docs](https://docs.scrapy.org/en/latest/topics/extensions.html#module-scrapy.extensions.memusage)
852876
for more information about supported settings.
853877

878+
### Windows support
879+
880+
Just like the [upstream Scrapy extension](https://docs.scrapy.org/en/latest/topics/extensions.html#module-scrapy.extensions.memusage), this custom memory extension does not work
881+
on Windows. This is because the stdlib [`resource`](https://docs.python.org/3/library/resource.html)
882+
module is not available.
883+
854884

855885
## Examples
856886

@@ -912,23 +942,6 @@ See the [examples](examples) directory for more.
912942

913943
## Known issues
914944

915-
### Lack of native support for Windows
916-
917-
This package does not work natively on Windows. This is because:
918-
919-
* Playwright runs the driver in a subprocess. Source:
920-
[Playwright repository](https://github.com/microsoft/playwright-python/blob/v1.28.0/playwright/_impl/_transport.py#L120-L129).
921-
* "On Windows, the default event loop `ProactorEventLoop` supports subprocesses,
922-
whereas `SelectorEventLoop` does not". Source:
923-
[Python docs](https://docs.python.org/3/library/asyncio-platforms.html#asyncio-windows-subprocess).
924-
* Twisted's `asyncio` reactor requires the `SelectorEventLoop`. Source:
925-
[Twisted repository](https://github.com/twisted/twisted/blob/twisted-22.4.0/src/twisted/internet/asyncioreactor.py#L31).
926-
927-
Some users have reported having success
928-
[running under WSL](https://github.com/scrapy-plugins/scrapy-playwright/issues/7#issuecomment-817394494).
929-
See also [#78](https://github.com/scrapy-plugins/scrapy-playwright/issues/78)
930-
for information about working in headful mode under WSL.
931-
932945
### No per-request proxy support
933946
Specifying a proxy via the `proxy` Request meta key is not supported.
934947
Refer to the [Proxy support](#proxy-support) section for more information.

docs/changelog.md

+5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# scrapy-playwright changelog
22

3+
### [v0.0.36](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.36) (2024-MM-DD)
4+
5+
* Windows support (#276)
6+
7+
38
### [v0.0.35](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.35) (2024-06-01)
49

510
* Update exception message check

scrapy_playwright/_utils.py

+42-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
1+
import asyncio
2+
import concurrent
13
import logging
4+
import platform
5+
import threading
26
from typing import Awaitable, Iterator, Optional, Tuple, Union
37

8+
import scrapy
49
from playwright.async_api import Error, Page, Request, Response
5-
from scrapy import Spider
610
from scrapy.http.headers import Headers
711
from scrapy.utils.python import to_unicode
12+
from twisted.internet.defer import Deferred
813
from w3lib.encoding import html_body_declared_encoding, http_content_type_encoding
914

1015

@@ -53,7 +58,7 @@ def _is_safe_close_error(error: Error) -> bool:
5358

5459
async def _get_page_content(
5560
page: Page,
56-
spider: Spider,
61+
spider: scrapy.Spider,
5762
context_name: str,
5863
scrapy_request_url: str,
5964
scrapy_request_method: str,
@@ -89,3 +94,38 @@ async def _get_header_value(
8994
return await resource.header_value(header_name)
9095
except Exception:
9196
return None
97+
98+
99+
if platform.system() == "Windows":
100+
101+
class _WindowsAdapter:
102+
"""Utility class to redirect coroutines to an asyncio event loop running
103+
in a different thread. This allows to use a ProactorEventLoop, which is
104+
supported by Playwright on Windows.
105+
"""
106+
107+
loop = None
108+
thread = None
109+
110+
@classmethod
111+
def get_event_loop(cls) -> asyncio.AbstractEventLoop:
112+
if cls.thread is None:
113+
if cls.loop is None:
114+
policy = asyncio.WindowsProactorEventLoopPolicy() # type: ignore
115+
cls.loop = policy.new_event_loop()
116+
asyncio.set_event_loop(cls.loop)
117+
if not cls.loop.is_running():
118+
cls.thread = threading.Thread(target=cls.loop.run_forever, daemon=True)
119+
cls.thread.start()
120+
logger.info("Started loop on separate thread: %s", cls.loop)
121+
return cls.loop
122+
123+
@classmethod
124+
async def get_result(cls, coro) -> concurrent.futures.Future:
125+
return asyncio.run_coroutine_threadsafe(coro=coro, loop=cls.get_event_loop()).result()
126+
127+
def _deferred_from_coro(coro) -> Deferred:
128+
return scrapy.utils.defer.deferred_from_coro(_WindowsAdapter.get_result(coro))
129+
130+
else:
131+
_deferred_from_coro = scrapy.utils.defer.deferred_from_coro

scrapy_playwright/handler.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import asyncio
22
import logging
3+
import platform
34
from contextlib import suppress
45
from dataclasses import dataclass
56
from ipaddress import ip_address
@@ -25,7 +26,6 @@
2526
from scrapy.http.headers import Headers
2627
from scrapy.responsetypes import responsetypes
2728
from scrapy.settings import Settings
28-
from scrapy.utils.defer import deferred_from_coro
2929
from scrapy.utils.misc import load_object
3030
from scrapy.utils.reactor import verify_installed_reactor
3131
from twisted.internet.defer import Deferred, inlineCallbacks
@@ -38,6 +38,7 @@
3838
_get_page_content,
3939
_is_safe_close_error,
4040
_maybe_await,
41+
_deferred_from_coro,
4142
)
4243

4344

@@ -101,7 +102,8 @@ class ScrapyPlaywrightDownloadHandler(HTTPDownloadHandler):
101102

102103
def __init__(self, crawler: Crawler) -> None:
103104
super().__init__(settings=crawler.settings, crawler=crawler)
104-
verify_installed_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
105+
if platform.system() != "Windows":
106+
verify_installed_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
105107
crawler.signals.connect(self._engine_started, signals.engine_started)
106108
self.stats = crawler.stats
107109

@@ -134,7 +136,7 @@ def from_crawler(cls: Type[PlaywrightHandler], crawler: Crawler) -> PlaywrightHa
134136

135137
def _engine_started(self) -> Deferred:
136138
"""Launch the browser. Use the engine_started signal as it supports returning deferreds."""
137-
return deferred_from_coro(self._launch())
139+
return _deferred_from_coro(self._launch())
138140

139141
async def _launch(self) -> None:
140142
"""Launch Playwright manager and configured startup context(s)."""
@@ -290,7 +292,7 @@ def _set_max_concurrent_context_count(self):
290292
def close(self) -> Deferred:
291293
logger.info("Closing download handler")
292294
yield super().close()
293-
yield deferred_from_coro(self._close())
295+
yield _deferred_from_coro(self._close())
294296

295297
async def _close(self) -> None:
296298
await asyncio.gather(*[ctx.context.close() for ctx in self.context_wrappers.values()])
@@ -305,7 +307,7 @@ async def _close(self) -> None:
305307

306308
def download_request(self, request: Request, spider: Spider) -> Deferred:
307309
if request.meta.get("playwright"):
308-
return deferred_from_coro(self._download_request(request, spider))
310+
return _deferred_from_coro(self._download_request(request, spider))
309311
return super().download_request(request, spider)
310312

311313
async def _download_request(self, request: Request, spider: Spider) -> Response:

tests/__init__.py

+28
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,38 @@
1+
import inspect
2+
import logging
3+
import platform
14
from contextlib import asynccontextmanager
5+
from functools import wraps
26

37
from scrapy import Request
48
from scrapy.http.response.html import HtmlResponse
59
from scrapy.utils.test import get_crawler
610

711

12+
logger = logging.getLogger("scrapy-playwright-tests")
13+
14+
15+
if platform.system() == "Windows":
16+
from scrapy_playwright._utils import _WindowsAdapter
17+
18+
def allow_windows(test_method):
19+
"""Wrap tests with the _WindowsAdapter class on Windows."""
20+
if not inspect.iscoroutinefunction(test_method):
21+
raise RuntimeError(f"{test_method} must be an async def method")
22+
23+
@wraps(test_method)
24+
async def wrapped(self, *args, **kwargs):
25+
logger.debug("Calling _WindowsAdapter.get_result for %r", self)
26+
await _WindowsAdapter.get_result(test_method(self, *args, **kwargs))
27+
28+
return wrapped
29+
30+
else:
31+
32+
def allow_windows(test_method):
33+
return test_method
34+
35+
836
@asynccontextmanager
937
async def make_handler(settings_dict: dict):
1038
"""Convenience function to obtain an initialized handler and close it gracefully"""

tests/conftest.py

+17
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,20 @@
1+
import platform
2+
3+
import pytest
4+
5+
6+
@pytest.hookimpl(tryfirst=True)
7+
def pytest_configure(config):
8+
# https://twistedmatrix.com/trac/ticket/9766
9+
# https://github.com/pytest-dev/pytest-twisted/issues/80
10+
11+
if config.getoption("reactor", "default") == "asyncio" and platform.system() == "Windows":
12+
import asyncio
13+
14+
selector_policy = asyncio.WindowsSelectorEventLoopPolicy()
15+
asyncio.set_event_loop_policy(selector_policy)
16+
17+
118
def pytest_sessionstart(session): # pylint: disable=unused-argument
219
"""
320
Called after the Session object has been created and before performing

tests/tests_asyncio/test_browser_contexts.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,12 @@
1010
from scrapy import Spider, Request
1111
from scrapy_playwright.page import PageMethod
1212

13-
from tests import make_handler
13+
from tests import allow_windows, make_handler
1414
from tests.mockserver import StaticMockServer
1515

1616

1717
class MixinTestCaseMultipleContexts:
18+
@allow_windows
1819
async def test_context_kwargs(self):
1920
settings_dict = {
2021
"PLAYWRIGHT_BROWSER_TYPE": self.browser_type,
@@ -37,6 +38,7 @@ async def test_context_kwargs(self):
3738
with pytest.raises(PlaywrightTimeoutError):
3839
await handler._download_request(req, Spider("foo"))
3940

41+
@allow_windows
4042
async def test_contexts_max_pages(self):
4143
settings = {
4244
"PLAYWRIGHT_BROWSER_TYPE": self.browser_type,
@@ -71,6 +73,7 @@ async def test_contexts_max_pages(self):
7173

7274
assert handler.stats.get_value("playwright/page_count/max_concurrent") == 4
7375

76+
@allow_windows
7477
async def test_max_contexts(self):
7578
def cb_close_context(task):
7679
response = task.result()
@@ -105,6 +108,7 @@ def cb_close_context(task):
105108

106109
assert handler.stats.get_value("playwright/context_count/max_concurrent") == 4
107110

111+
@allow_windows
108112
async def test_contexts_startup(self):
109113
settings = {
110114
"PLAYWRIGHT_BROWSER_TYPE": self.browser_type,
@@ -143,6 +147,7 @@ async def test_contexts_startup(self):
143147
assert cookie["value"] == "bar"
144148
assert cookie["domain"] == "example.org"
145149

150+
@allow_windows
146151
async def test_persistent_context(self):
147152
temp_dir = f"{tempfile.gettempdir()}/{uuid4()}"
148153
settings = {
@@ -161,6 +166,7 @@ async def test_persistent_context(self):
161166
assert handler.context_wrappers["persistent"].persistent
162167
assert not hasattr(handler, "browser")
163168

169+
@allow_windows
164170
async def test_mixed_persistent_contexts(self):
165171
temp_dir = f"{tempfile.gettempdir()}/{uuid4()}"
166172
settings = {
@@ -183,6 +189,7 @@ async def test_mixed_persistent_contexts(self):
183189
assert not handler.context_wrappers["non-persistent"].persistent
184190
assert isinstance(handler.browser, Browser)
185191

192+
@allow_windows
186193
async def test_contexts_dynamic(self):
187194
async with make_handler({"PLAYWRIGHT_BROWSER_TYPE": self.browser_type}) as handler:
188195
assert len(handler.context_wrappers) == 0

tests/tests_asyncio/test_extensions.py

+5
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import platform
12
from asyncio.subprocess import Process as AsyncioProcess
23
from unittest import IsolatedAsyncioTestCase
34
from unittest.mock import MagicMock, patch
@@ -34,6 +35,10 @@ class MockMemoryInfo:
3435
rss = 999
3536

3637

38+
@pytest.mark.skipif(
39+
platform.system() == "Windows",
40+
reason="resource stdlib module is not available on Windows",
41+
)
3742
@patch("scrapy.extensions.memusage.MailSender")
3843
class TestMemoryUsageExtension(IsolatedAsyncioTestCase):
3944
async def test_process_availability(self, _MailSender):

0 commit comments

Comments
 (0)