|
| 1 | +# Coding guidelines |
| 2 | + |
| 3 | +This file provides guidance to programming agents when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +The Apify SDK for Python (`apify` package on PyPI) is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the [Crawlee](https://crawlee.dev/python) web scraping framework and the [Apify API Client](https://docs.apify.com/api/client/python). Supports Python 3.10–3.14. Build system: hatchling. |
| 8 | + |
| 9 | +## Common Commands |
| 10 | + |
| 11 | +```bash |
| 12 | +# Install dependencies (including dev) |
| 13 | +uv sync --all-extras |
| 14 | + |
| 15 | +# Install dev dependencies + pre-commit hooks |
| 16 | +uv run poe install-dev |
| 17 | + |
| 18 | +# Format code (also auto-fixes lint issues via ruff check --fix) |
| 19 | +uv run poe format |
| 20 | + |
| 21 | +# Lint (format check + ruff check) |
| 22 | +uv run poe lint |
| 23 | + |
| 24 | +# Type check |
| 25 | +uv run poe type-check |
| 26 | + |
| 27 | +# Run all checks (lint + type-check + unit tests) |
| 28 | +uv run poe check-code |
| 29 | + |
| 30 | +# Unit tests (no API token needed) |
| 31 | +uv run poe unit-tests |
| 32 | + |
| 33 | +# Run a single test file |
| 34 | +uv run pytest tests/unit/actor/test_actor_lifecycle.py |
| 35 | + |
| 36 | +# Run a single test by name |
| 37 | +uv run pytest tests/unit/actor/test_actor_lifecycle.py -k "test_name" |
| 38 | + |
| 39 | +# Integration tests (needs APIFY_TEST_USER_API_TOKEN) |
| 40 | +uv run poe integration-tests |
| 41 | + |
| 42 | +# E2E tests (needs APIFY_TEST_USER_API_TOKEN, builds/deploys Actors on platform) |
| 43 | +uv run poe e2e-tests |
| 44 | +``` |
| 45 | + |
| 46 | +## Code Style |
| 47 | + |
| 48 | +- **Formatter/Linter**: Ruff (line length 120, single quotes for inline, double quotes for docstrings) |
| 49 | +- **Type checker**: ty (targets Python 3.10) |
| 50 | +- **All ruff rules enabled** with specific ignores — see `pyproject.toml` `[tool.ruff.lint]` for the full ignore list |
| 51 | +- Tests are exempt from docstring rules (`D`), assert warnings (`S101`), and private member access (`SLF001`) |
| 52 | +- Unused imports are allowed in `__init__.py` files (re-exports) |
| 53 | +- **Pre-commit hooks**: lint check + type check run automatically on commit |
| 54 | + |
| 55 | +## Architecture |
| 56 | + |
| 57 | +### Core (`src/apify/`) |
| 58 | + |
| 59 | +- **`_actor.py`** — The `_ActorType` class is the central API. `Actor` is a lazy-object-proxy (`lazy-object-proxy.Proxy`) wrapping `_ActorType` — it acts as both a class (e.g. `Actor.is_at_home()`) and an instance-like context manager (`async with Actor:`). On `__aenter__`, the proxy's `__wrapped__` is replaced with the active `_ActorType` instance. It manages the full Actor lifecycle (`init`, `exit`, `fail`), provides access to storages (`open_dataset`, `open_key_value_store`, `open_request_queue`), handles events, proxy configuration, charging, and platform API operations (`start`, `call`, `metamorph`, `reboot`). |
| 60 | + |
| 61 | +- **`_configuration.py`** — `Configuration` extends Crawlee's `Configuration` with Apify-specific settings (API URL, token, Actor run metadata, proxy settings, charging config). Configuration is populated from environment variables (`APIFY_*`). |
| 62 | + |
| 63 | +- **`_charging.py`** — Pay-per-event billing system. `ChargingManager` / `ChargingManagerImplementation` handle charging events against pricing info fetched from the API. |
| 64 | + |
| 65 | +- **`_proxy_configuration.py`** — `ProxyConfiguration` manages Apify proxy setup (residential, datacenter, groups, country targeting). |
| 66 | + |
| 67 | +- **`_models.py`** — Pydantic models for API data structures (Actor runs, webhooks, pricing info, etc.). |
| 68 | + |
| 69 | +### Storage Clients (`src/apify/storage_clients/`) |
| 70 | + |
| 71 | +Four storage client implementations, all implementing Crawlee's abstract storage client interface: |
| 72 | + |
| 73 | +- **`_apify/`** — `ApifyStorageClient`: talks to the Apify API for dataset, key-value store, and request queue operations (separate sub-clients for single vs. shared request queues). Used when running on the Apify platform. |
| 74 | +- **`_file_system/`** — `FileSystemStorageClient` (alias `ApifyFileSystemStorageClient`): extends Crawlee's file system client with Apify-specific key-value store behavior. |
| 75 | +- **`_smart_apify/`** — `SmartApifyStorageClient`: hybrid client that writes to both API and local file system for resilience. |
| 76 | +- **`MemoryStorageClient`** — re-exported from Crawlee for in-memory storage. |
| 77 | + |
| 78 | +### Storages (`src/apify/storages/`) |
| 79 | + |
| 80 | +Re-exports Crawlee's `Dataset`, `KeyValueStore`, and `RequestQueue` classes. |
| 81 | + |
| 82 | +### Events (`src/apify/events/`) |
| 83 | + |
| 84 | +- **`_apify_event_manager.py`** — `ApifyEventManager` extends Crawlee's event system with platform-specific events received via WebSocket connection. |
| 85 | + |
| 86 | +### Request Loaders (`src/apify/request_loaders/`) |
| 87 | + |
| 88 | +- **`_apify_request_list.py`** — `ApifyRequestList` creates request lists from Actor input URLs (supports both direct URLs and "requests from URL" sources). |
| 89 | + |
| 90 | +### Scrapy Integration (`src/apify/scrapy/`) |
| 91 | + |
| 92 | +Optional integration (`apify[scrapy]` extra) providing Scrapy scheduler, middlewares, pipelines, and extensions for running Scrapy spiders as Apify Actors. |
| 93 | + |
| 94 | +### Key Dependencies |
| 95 | + |
| 96 | +- **`crawlee`** — Base framework providing storage abstractions, event system, configuration, service locator pattern |
| 97 | +- **`apify-client`** — HTTP client for the Apify API (`ApifyClientAsync`) |
| 98 | +- **`apify-shared`** — Shared constants and utilities (`ApifyEnvVars`, `ActorEnvVars`, etc.) |
| 99 | + |
| 100 | +## Testing |
| 101 | + |
| 102 | +Three test levels in `tests/`: |
| 103 | + |
| 104 | +- **`unit/`** — Fast tests with no external dependencies. Use mocked API clients (`ApifyClientAsyncPatcher` fixture). Run with `uv run poe unit-tests`. |
| 105 | +- **`integration/`** — Tests making real Apify API calls but not deploying Actors. Requires `APIFY_TEST_USER_API_TOKEN`. Run with `uv run poe integration-tests`. |
| 106 | +- **`e2e/`** — Full end-to-end tests that build and deploy Actors on the platform. Slowest. Requires `APIFY_TEST_USER_API_TOKEN`. Use `make_actor` and `run_actor` fixtures. Run with `uv run poe e2e-tests`. |
| 107 | + |
| 108 | +All test levels use `pytest-asyncio` with `asyncio_mode = "auto"` (no need for `@pytest.mark.asyncio`). Tests run in parallel via `pytest-xdist` (`--numprocesses`). Each test gets isolated state via the autouse `_isolate_test_environment` fixture which resets `Actor`, `service_locator`, and `AliasResolver` state. Conftest files live in each subdirectory (`tests/unit/conftest.py`, etc.) — there is no top-level `tests/conftest.py`. |
| 109 | + |
| 110 | +### Key Test Fixtures |
| 111 | + |
| 112 | +- **`apify_client_async_patcher`** (unit) — `ApifyClientAsyncPatcher` instance for mocking `ApifyClientAsync` methods. Patch by `method`/`submethod`, tracks call history in `.calls`. |
| 113 | +- **`make_httpserver`/`httpserver`** (unit) — session-scoped `HTTPServer` via `pytest-httpserver` for HTTP interception. |
| 114 | +- **`apify_client_async`** (integration/e2e) — real `ApifyClientAsync` using `APIFY_TEST_USER_API_TOKEN`. |
| 115 | +- **`make_actor`** (e2e) — creates a temporary Actor on the platform from a function, `main_py` string, or source files dict; cleans up after the session. |
| 116 | +- **`run_actor`** (e2e) — calls an Actor and waits up to 10 minutes for completion. |
0 commit comments