diff --git a/.gitattributes b/.gitattributes index 6fc521e5d6b78..dac878f73670d 100644 --- a/.gitattributes +++ b/.gitattributes @@ -14,13 +14,10 @@ chart/charts/** export-ignore Dockerfile.ci export-ignore ISSUE_TRIAGE_PROCESS.rst export-ignore -STATIC_CODE_CHECKS.rst export-ignore -TESTING.rst export-ignore -LOCAL_VIRTUALENV.rst export-ignore CONTRIBUTING.rst export-ignore CI.rst export-ignore CI_DIAGRAMS.md export-ignore -CONTRIBUTORS_QUICK_START.rst export-ignore +contributing_docs/ export-ignore .devcontainer export-ignore .github export-ignore diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 47c477b2ab514..89670c88a9b49 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -38,7 +38,7 @@ http://chris.beams.io/posts/git-commit/ --- **^ Add meaningful description above** -Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. +Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). diff --git a/.github/SECURITY.md b/.github/SECURITY.md index 6b526c3a6ec11..4372b4528b477 100644 --- a/.github/SECURITY.md +++ b/.github/SECURITY.md @@ -97,7 +97,7 @@ do not apply to Airflow, or have a different severity than some generic scoring ### What happens after you report the issue ? -The [Airflow Security Team](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#security-team) will get back to you after assessing the report. You will usually get +The Airflow Security Team will get back to you after assessing the report. You will usually get confirmation that the issue is being worked (or that we quickly assessed it as invalid) within several business days. Note that this is an Open-Source projects and members of the security team are volunteers so please make sure to be patient. If you do not get a response within a week or so, please send a @@ -112,7 +112,8 @@ and the severity of the issue once the issue is fixed and release is public. Not appear there, so `users@airflow.apache.org` is the best place to monitor for the announcement. Security issues in Airflow are handled by the Airflow Security Team. Details about the Airflow Security -Team and how members of it are chosen can be found in the [Contributing documentation](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#security-team). +Team and how members of it are chosen can be found in the +[Contributing documentation](https://github.com/apache/airflow/blob/main/contributing-docs/01_roles_in_airflow_project.rst#security-team). ### Does CVE in Airflow Providers impact Airflow core package ? diff --git a/.github/boring-cyborg.yml b/.github/boring-cyborg.yml index f6798b54dbd08..4fbcfcf85e0ff 100644 --- a/.github/boring-cyborg.yml +++ b/.github/boring-cyborg.yml @@ -536,10 +536,8 @@ labelPRBasedOnFilePath: - dev/**/* - .github/**/* - Dockerfile.ci - - CONTRIBUTING.* - - LOCAL_VIRTUALENV.rst - - STATIC_CODE_CHECKS.rst - - TESTING.rst + - CONTRIBUTING.rst + - contributing-docs/**/* - yamllint-config.yml - .asf.yaml - .bash_completion @@ -670,12 +668,12 @@ labelerFlags: firstPRWelcomeComment: > Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our - Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) + Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst) Here are some useful points: - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( - https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) + https://github.com/apache/airflow/blob/main/contributing-docs/08_static_code_checks.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). @@ -683,7 +681,7 @@ firstPRWelcomeComment: > [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/dev/breeze/doc/breeze.rst) + - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from @@ -693,7 +691,7 @@ firstPRWelcomeComment: > communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( - https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices). + https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#coding-style-and-best-practices). - Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits. diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 2024caa647eb5..0ead7f7359999 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -33,7 +33,7 @@ repos: - id: doctoc name: Add TOC for Markdown and RST files files: - ^CONTRIBUTING\.md$|^README\.md$|^UPDATING.*\.md$|^chart/UPDATING.*\.md$|^dev/.*\.md$|^dev/.*\.rst$|^.github/.*\.md + ^README\.md$|^UPDATING.*\.md$|^chart/UPDATING.*\.md$|^dev/.*\.md$|^dev/.*\.rst$|^.github/.*\.md|^tests/system/README.md$ exclude: ^.*/.*_vendor/ args: - "--maxlevel" @@ -439,7 +439,7 @@ repos: name: Update extras in documentation entry: ./scripts/ci/pre_commit/pre_commit_insert_extras.py language: python - files: ^setup\.py$|^CONTRIBUTING\.rst$|^INSTALL$|^airflow/providers/.*/provider\.yaml$ + files: ^setup\.py$|^contributing-docs/12_airflow_dependencies_and_extras.rst$|^INSTALL$|^airflow/providers/.*/provider\.yaml$ pass_filenames: false additional_dependencies: ['rich>=12.4.4', 'tomli'] - id: check-extras-order @@ -612,7 +612,7 @@ repos: ^.pre-commit-config\.yaml$| ^.*CHANGELOG\.(rst|txt)$| ^.*RELEASE_NOTES\.rst$| - ^CONTRIBUTORS_QUICK_START.rst$| + ^contributing-docs/03_contributors_quick_start.rst$| ^.*\.(png|gif|jp[e]?g|tgz|lock)$| git - id: check-base-operator-partial-arguments @@ -818,7 +818,7 @@ repos: name: Sync integrations list with docs entry: ./scripts/ci/pre_commit/pre_commit_check_integrations_list.py language: python - files: ^scripts/ci/docker-compose/integration-.*\.yml$|^TESTING.rst$ + files: ^scripts/ci/docker-compose/integration-.*\.yml$|^contributing-docs/testing/integration_tests.rst$ additional_dependencies: ['black==23.10.0', 'tabulate', 'rich>=12.4.4', 'pyyaml'] require_serial: true pass_filenames: false diff --git a/BREEZE.rst b/BREEZE.rst index 14e2ee073d603..249ae3a8cb2cb 100644 --- a/BREEZE.rst +++ b/BREEZE.rst @@ -15,4 +15,4 @@ specific language governing permissions and limitations under the License. -Content of this guide has been moved to `Breeze docs internal folder `_ +Content of this guide has been moved to `Breeze docs internal folder `_ diff --git a/CI.rst b/CI.rst index 5800ba138cfeb..dcba9820887ca 100644 --- a/CI.rst +++ b/CI.rst @@ -499,7 +499,7 @@ infrastructure, as a developer you should be able to reproduce and re-run any of locally. One part of it are pre-commit checks, that allow you to run the same static checks in CI and locally, but another part is the CI environment which is replicated locally with Breeze. -You can read more about Breeze in `breeze.rst `_ but in essence it is a script that allows +You can read more about Breeze in `README.rst `_ but in essence it is a script that allows you to re-create CI environment in your local development instance and interact with it. In its basic form, when you do development you can run all the same tests that will be run in CI - but locally, before you submit them as PR. Another use case where Breeze is useful is when tests fail on CI. You can @@ -519,7 +519,7 @@ by running the sequence of corresponding ``breeze`` command. Make sure however t In the output of the CI jobs, you will find both - the flags passed and environment variables set. -You can read more about it in `Breeze `_ and `Testing `_ +You can read more about it in `Breeze `_ and `Testing `_ Since we store images from every CI run, you should be able easily reproduce any of the CI tests problems locally. You can do it by pulling and using the right image and running it with the right docker command, @@ -533,7 +533,7 @@ For example knowing that the CI job was for commit ``cd27124534b46c9688a1d89e75f But you usually need to pass more variables and complex setup if you want to connect to a database or -enable some integrations. Therefore it is easiest to use `Breeze `_ for that. +enable some integrations. Therefore it is easiest to use `Breeze `_ for that. For example if you need to reproduce a MySQL environment in python 3.8 environment you can run: .. code-block:: bash @@ -546,7 +546,7 @@ this case, you do not need to checkout the sources that were used for that run - the image - but remember that any changes you make in those sources are lost when you leave the image as the sources are not mapped from your host machine. -Depending whether the scripts are run locally via `Breeze `_ or whether they +Depending whether the scripts are run locally via `Breeze `_ or whether they are run in ``Build Images`` or ``Tests`` workflows they can take different values. You can use those variables when you try to reproduce the build locally (alternatively you can pass diff --git a/COMMITTERS.rst b/COMMITTERS.rst index 625019c21e80e..da215d8baf55e 100644 --- a/COMMITTERS.rst +++ b/COMMITTERS.rst @@ -15,14 +15,37 @@ specific language governing permissions and limitations under the License. +Committers and PMC members +========================== + +Before reading this document, you should be familiar with `Contributors' guide `__. +This document assumes that you are a bit familiar how Airflow's community work, but you would like to learn more +about the rules by which we add new members. + .. contents:: :local: -Committers and PMC's -==================== +Committers vs. Maintainers +-------------------------- + +Often you can hear two different terms about people who have write access to the Airflow repository - +"committers" and "maintainers". This is because those two terms are used in different contexts. + +* "Maintainers" is term used in GitHub documentation and configuration and is a generic term referring to + people who have write access to the repository. They can merge PRs, push to the repository, etc. +* "Committers" is a term used in Apache Software Foundation (ASF) and is a term referring to people who have + write access to the code repository and has a signed + [Contributor License Agreement (CLA)](https://www.apache.org/licenses/#clas) on file. They have an + apache.org mail address. This is an official [role](https://www.apache.org/foundation/how-it-works/#roles) + defined and governed by the Apache Software Foundation. -This document assumes that you know how Airflow's community work, but you would like to learn more about the rules by which we add new members. +For all practical purposes, both terms are interchangeable because the Apache Software Foundation rule is +the only Committers can have write access to the repositories managed by the PMC (Project Management Committee) +and that all Committers get write access to the repository. -Before reading this document, you should be familiar with `Contributor's guide `__. +You will see both terms used in different documentation, therefore our goal is not to use one of the terms +only - it is unavoidable to see both terms anyway. As a rule, we are using "committer" term in the context +of the official rules concerning Apache Software Foundation and "maintainer" term in the context where +technical GitHub access and permissions to the project are important. Guidelines to become an Airflow Committer ------------------------------------------ @@ -48,8 +71,7 @@ General prerequisites that we look for in all candidates: 1. Consistent contribution over last few months 2. Visibility on discussions on the dev mailing list, Slack channels or GitHub issues/discussions 3. Contributions to community health and project's sustainability for the long-term -4. Understands contributor/committer guidelines: - `Contributors' Guide `__ +4. Understands contributor/committer guidelines: `Contributors' Guide `__ Code contribution @@ -194,7 +216,7 @@ To be able to merge PRs, committers have to integrate their GitHub ID with Apach 3. Wait at least 30 minutes for an email inviting you to Apache GitHub Organization and accept invitation. 4. After accepting the GitHub Invitation verify that you are a member of the `Airflow committers team on GitHub `__. 5. Ask in ``#internal-airflow-ci-cd`` channel to be `configured in self-hosted runners `_ - by the CI maintainers. Wait for confirmation that this is done and some helpful tips from the CI maintainer + by the CI team. Wait for confirmation that this is done and some helpful tips from the CI team 6. After confirming that step 5 is done, open a PR to include your GitHub ID in: * ``dev/breeze/src/airflow_breeze/global_constants.py`` (COMMITTERS variable) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 1245fd4ab585d..1fbaf36f18063 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -15,1800 +15,11 @@ specific language governing permissions and limitations under the License. -.. contents:: :local: - -Contributions -============= - -Contributions are welcome and are greatly appreciated! Every little bit helps, -and credit will always be given. - -This document aims to explain the subject of contributions if you have not contributed to -any Open Source project, but it will also help people who have contributed to other projects learn about the -rules of that community. - -New Contributor ---------------- -If you are a new contributor, please follow the `Contributors Quick Start `__ guide to get a gentle step-by-step introduction to setting up the development -environment and making your first contribution. - -Get Mentoring Support ---------------------- - -If you are new to the project, you might need some help in understanding how the dynamics -of the community works and you might need to get some mentorship from other members of the -community - mostly Airflow committers (maintainers). Mentoring new members of the community is part of -maintainers job so do not be afraid of asking them to help you. You can do it -via comments in your PR, asking on a devlist or via Slack. For your convenience, -we have a dedicated #development-first-pr-support Slack channel where you can ask any questions -about making your first Pull Request (PR) contribution to the Airflow codebase - it's a safe space -where it is expected that people asking questions do not know a lot Airflow (yet!). -If you need help with Airflow see the Slack channel #troubleshooting. - -To check on how mentoring works for the projects under Apache Software Foundation's -`Apache Community Development - Mentoring `_. - -Report Bugs ------------ - -Report bugs through `GitHub `__. - -Please report relevant information and preferably code that exhibits the -problem. - -.. note:: - If you want to report a security finding, please consider - https://github.com/apache/airflow/security/policy - -Fix Bugs --------- - -Look through the GitHub issues for bugs. Anything is open to whoever wants to -implement it. - -Issue reporting and resolution process --------------------------------------- - -An unusual element of the Apache Airflow project is that you can open a PR to -fix an issue or make an enhancement, without needing to open an issue first. -This is intended to make it as easy as possible to contribute to the project. - -If you however feel the need to open an issue (usually a bug or feature request) -consider starting with a `GitHub Discussion `_ instead. -In the vast majority of cases discussions are better than issues - you should only open -issues if you are sure you found a bug and have a reproducible case, -or when you want to raise a feature request that will not require a lot of discussion. -If you have a very important topic to discuss, start a discussion on the -`Devlist `_ instead. - -The Apache Airflow project uses a set of labels for tracking and triaging issues, as -well as a set of priorities and milestones to track how and when the enhancements and bug -fixes make it into an Airflow release. This is documented as part of -the `Issue reporting and resolution process `_, - -Implement Features ------------------- - -Look through the `GitHub issues labeled "kind:feature" -`__ for features. - -Any unassigned feature request issue is open to whoever wants to implement it. - -We've created the operators, hooks, macros and executors we needed, but we've -made sure that this part of Airflow is extensible. New operators, hooks, macros -and executors are very welcomed! - -Improve Documentation ---------------------- - -Airflow could always use better documentation, whether as part of the official -Airflow docs, in docstrings, ``docs/*.rst`` or even on the web as blog posts or -articles. - -See the `Docs README `__ for more information about contributing to Airflow docs. - -Submit Feedback ---------------- - -The best way to send feedback is to `open an issue on GitHub `__. - -If you are proposing a new feature: - -- Explain in detail how it would work. -- Keep the scope as narrow as possible to make it easier to implement. -- Remember that this is a volunteer-driven project, and that contributions are - welcome :) - - -Roles -============= - -There are several roles within the Airflow Open-Source community. - -For detailed information for each role, see: `Committers and PMC's <./COMMITTERS.rst>`__. - -PMC Member ------------ - -The PMC (Project Management Committee) is a group of maintainers that drives changes in the way that -Airflow is managed as a project. - -Considering Apache, the role of the PMC is primarily to ensure that Airflow conforms to Apache's processes -and guidelines. - -Committers/Maintainers ----------------------- - -You will often see the term "committer" or "maintainer" in the context of the Airflow project. This is a person -who has write access to the Airflow repository and can merge pull requests. Committers (also known as maintainers) -are also responsible for reviewing pull requests and guiding contributors to make their first contribution. -They are also responsible for making sure that the project is moving forward and that the quality of the -code is maintained. - -The term "committer" and "maintainer" is used interchangeably. The term "committer" is the official term used by the -Apache Software Foundation, while "maintainer" is more commonly used in the Open Source community and is used -in context of GitHub in a number of guidelines and documentation, so this document will mostly use "maintainer", -when speaking about Github, Pull Request, Github Issues and Discussions. On the other hand, "committer" is more -often used in devlist discussions, official communications, Airflow website and every time when we formally -refer to the role. - -The official list of committers can be found `here `__. - -Additionally, committers are listed in a few other places (some of these may only be visible to existing committers): - -* https://whimsy.apache.org/roster/committee/airflow -* https://github.com/orgs/apache/teams/airflow-committers/members - -Committers are responsible for: - -* Championing one or more items on the `Roadmap `__ -* Reviewing & Merging Pull-Requests -* Scanning and responding to GitHub issues -* Responding to questions on the dev mailing list (dev@airflow.apache.org) - -Contributors ------------- - -A contributor is anyone who wants to contribute code, documentation, tests, ideas, or anything to the -Apache Airflow project. - -Contributors are responsible for: - -* Fixing bugs -* Adding features -* Championing one or more items on the `Roadmap `__. - -Security Team -------------- - -Security issues in Airflow are handled by the Airflow Security Team. The team consists -of selected PMC members that are interested in looking at, discussing and fixing -security issues, but it can also include committers and non-committer contributors that are -not PMC members yet and have been approved by the PMC members in a vote. You can request to -be added to the team by sending a message to private@airflow.apache.org. However, the team -should be small and focused on solving security issues, so the requests will be evaluated -on a case-by-case basis and the team size will be kept relatively small, limited to only actively -security-focused contributors. - -There are certain expectations from the members of the security team: - -* They are supposed to be active in assessing, discussing, fixing and releasing the - security issues in Airflow. While it is perfectly understood that as volunteers, we might have - periods of lower activity, prolonged lack of activity and participation will result in removal - from the team, pending PMC decision (the decision on removal can be taken by `LAZY CONSENSUS `_ among - all the PMC members on private@airflow.apache.org mailing list). - -* They are not supposed to reveal the information about pending and unfixed security issues to anyone - (including their employers) unless specifically authorised by the security team members, specifically - if diagnosing and solving the issue might involve the need of external experts - for example security - experts that are available through Airflow stakeholders. The intent about involving 3rd parties has - to be discussed and agreed upon at security@airflow.apache.org. - -* They have to have an `ICLA `_ signed with - Apache Software Foundation. - -* The security team members might inform 3rd parties about fixes, for example in order to assess if the fix - is solving the problem or in order to assess its applicability to be applied by 3rd parties, as soon - as a PR solving the issue is opened in the public airflow repository. - -* In case of critical security issues, the members of the security team might iterate on a fix in a - private repository and only open the PR in the public repository once the fix is ready to be released, - with the intent of minimizing the time between the fix being available and the fix being released. In this - case the PR might be sent to review and comment to the PMC members on private list, in order to request - an expedited voting on the release. The voting for such release might be done on the - private@airflow.apache.org mailing list and should be made public at the dev@apache.airflow.org - mailing list as soon as the release is ready to be announced. - -* The security team members working on the fix might be mentioned as remediation developers in the CVE - including their job affiliation if they want to. - -* Community members acting as release managers are by default members of the security team and unless they - want to, they do not have to be involved in discussing and solving the issues. They are responsible for - releasing the CVE information (announcement and publishing to security indexes) as part of the - release process. This is facilitated by the security tool provided by the Apache Software Foundation. - -* Severity of the issue is determined based on the criteria described in the - `Severity Rating blog post `_ by the Apache Software - Foundation Security team. - -Periodic Security team rotation -------------------------------- - -Handling security issues is something of a chore, it takes vigilance, requires quick reaction and responses -and often requires to act outside of the regular "day" job. This means that not everyone can keep up with -being part of the security team for long while being engaged and active. While we do not expect all the -security team members to be active all the time, and - since we are volunteers, it's perfectly understandable -that work, personal life, family and generally life might not help with being active. And this is not a -considered as being failure, it's more stating the fact of life. - -Also prolonged time of being exposed to handling "other's" problems and discussing similar kinds of problem -and responses might be tiring and might lead to burnout. - -However, for those who have never done that before, participation in the security team might be an interesting -experience and a way to learn a lot about security and security issue handling. We have a lot of -established processes and tools that make the work of the security team members easier, so this can be -treated as a great learning experience for some community members. And knowing that this is not -a "lifetime" assignment, but rather a temporary engagement might make it easier for people to decide to -join the security team. - -That's why introduced rotation of the security team members. - -Periodically - every 3-4 months (depending on actual churn of the security issues that are reported to us), -we re-evaluate the engagement and activity of the security team members, and we ask them if they want to -continue being part of the security team, taking into account their engagement since the last team refinement. -Generally speaking if the engagement during the last period was marginal, the person is considered as a -candidate for removing from the team and it requires a deliberate confirmation of re-engagement to take -the person off-the-list. - -At the same time we open up the possibility to other people in the community to join the team and make -a "call for new security team members" where community members can volunteer to join the security team. -Such volunteering should happen on the private@ list. The current members of the security team as well -as PMC members can also nominate other community members to join the team and those new team members -have to be well recognized and trusted by the community and accepted by the PMC. - -The proposal of team refinement is passed to the PMC as LAZY CONSENSUS (or VOTE if consensus cannot -be reached). In case the consensus cannot be reached for the whole list, we can split it and ask for -lazy consensus for each person separately. - -Contribution Workflow -===================== - -Typically, you start your first contribution by reviewing open tickets -at `GitHub issues `__. - -If you create pull-request, you don't have to create an issue first, but if you want, you can do it. -Creating an issue will allow you to collect feedback or share plans with other people. - -For example, you want to have the following sample ticket assigned to you: -`#7782: Add extra CC: to the emails sent by Airflow `_. - -In general, your contribution includes the following stages: - -.. image:: images/workflow.png - :align: center - :alt: Contribution Workflow - -1. Make your own `fork `__ of - the Apache Airflow `main repository `__. - -2. Create a `local virtualenv `_, - initialize the `Breeze environment `__, and - install `pre-commit framework `__. - If you want to add more changes in the future, set up your fork and enable GitHub Actions. - -3. Join `devlist `__ - and set up a `Slack account `__. - -4. Make the change and create a `Pull Request (PR) from your fork `__. - -5. Ping @ #development slack, comment @people. Be annoying. Be considerate. - -Step 1: Fork the Apache Airflow Repo ------------------------------------- -From the `apache/airflow `_ repo, -`create a fork `_: - -.. image:: images/fork.png - :align: center - :alt: Creating a fork - - -Step 2: Configure Your Environment ----------------------------------- - -You can use several development environments for Airflow. If you prefer to have development environments -on your local machine, you might choose Local Virtualenv, or dockerized Breeze environment, however we -also have support for popular remote development environments: GitHub Codespaces and GitPodify. -You can see the differences between the various environments -`here `__. - -The local env instructions can be found in full in the `LOCAL_VIRTUALENV.rst `_ file. - -The Breeze Docker Compose env is to maintain a consistent and common development environment so that you -can replicate CI failures locally and work on solving them locally rather by pushing to CI. - -The Breeze instructions can be found in full in the -`Breeze `_ file. - -You can configure the Docker-based Breeze development environment as follows: - -1. Install the latest versions of the `Docker Community Edition `_ and `Docker Compose `_ and add them to the PATH. - -2. Install `jq`_ on your machine. The exact command depends on the operating system (or Linux distribution) you use. - -.. _jq: https://stedolan.github.io/jq/ - -For example, on Ubuntu: - -.. code-block:: bash - - sudo apt install jq - -or on macOS with `Homebrew `_ - -.. code-block:: bash - - brew install jq - -3. Enter Breeze, and run the following in the Airflow source code directory: - -.. code-block:: bash - - breeze - -Breeze starts with downloading the Airflow CI image from -the Docker Hub and installing all required dependencies. - -This will enter the Docker environment and mount your local sources -to make them immediately visible in the environment. - -4. Create a local virtualenv, for example: - -.. code-block:: bash - - mkvirtualenv myenv --python=python3.9 - -5. Initialize the created environment: - -.. code-block:: bash - - ./scripts/tools/initialize_virtualenv.py - - -6. Open your IDE (for example, PyCharm) and select the virtualenv you created - as the project's default virtualenv in your IDE. - -Step 3: Connect with People ---------------------------- - -For effective collaboration, make sure to join the following Airflow groups: - -- Mailing lists: - - - Developer's mailing list ``_ - (quite substantial traffic on this list) - - - All commits mailing list: ``_ - (very high traffic on this list) - - - Airflow users mailing list: ``_ - (reasonably small traffic on this list) - -- `Issues on GitHub `__ - -- `Slack (chat) `__ - -Step 4: Prepare PR ------------------- - -1. Update the local sources to address the issue. - - For example, to address this example issue, do the following: - - * Read about `email configuration in Airflow `__. - - * Find the class you should modify. For the example GitHub issue, - this is `email.py `__. - - * Find the test class where you should add tests. For the example ticket, - this is `test_email.py `__. - - * Make sure your fork's main is synced with Apache Airflow's main before you create a branch. See - `How to sync your fork <#how-to-sync-your-fork>`_ for details. - - * Create a local branch for your development. Make sure to use latest - ``apache/main`` as base for the branch. See `How to Rebase PR <#how-to-rebase-pr>`_ for some details - on setting up the ``apache`` remote. Note, some people develop their changes directly in their own - ``main`` branches - this is OK and you can make PR from your main to ``apache/main`` but we - recommend to always create a local branch for your development. This allows you to easily compare - changes, have several changes that you work on at the same time and many more. - If you have ``apache`` set as remote then you can make sure that you have latest changes in your main - by ``git pull apache main`` when you are in the local ``main`` branch. If you have conflicts and - want to override your locally changed main you can override your local changes with - ``git fetch apache; git reset --hard apache/main``. - - * Modify the class and add necessary code and unit tests. - - * Run the unit tests from the `IDE `__ - or `local virtualenv `__ as you see fit. - - * Run the tests in `Breeze `__. - - * Run and fix all the `static checks `__. If you have - `pre-commits installed `__, - this step is automatically run while you are committing your code. If not, you can do it manually - via ``git add`` and then ``pre-commit run``. - - * Consider adding a newsfragment to your PR so you can add an entry in the release notes. - The following newsfragment types are supported: - - * `significant` - * `feature` - * `improvement` - * `bugfix` - * `doc` - * `misc` - - To add a newsfragment, create an ``rst`` file named ``{pr_number}.{type}.rst`` (e.g. ``1234.bugfix.rst``) - and place in either `newsfragments `__ for core newsfragments, - or `chart/newsfragments `__ for helm chart newsfragments. - - In general newsfragments must be one line. For newsfragment type ``significant``, you may include summary and body separated by a blank line, similar to ``git`` commit messages. - -2. Rebase your fork, squash commits, and resolve all conflicts. See `How to rebase PR <#how-to-rebase-pr>`_ - if you need help with rebasing your change. Remember to rebase often if your PR takes a lot of time to - review/fix. This will make rebase process much easier and less painful and the more often you do it, - the more comfortable you will feel doing it. - -3. Re-run static code checks again. - -4. Make sure your commit has a good title and description of the context of your change, enough - for maintainers reviewing it to understand why you are proposing a change. Make sure to follow other - PR guidelines described in `Pull Request guidelines <#pull-request-guidelines>`_. - Create Pull Request! Make yourself ready for the discussion! - -5. The ``static checks`` and ``tests`` in your PR serve as a first-line-of-check, whether the PR - passes the quality bar for Airflow. It basically means that until you get your PR green, it is not - likely to get reviewed by maintainers unless you specifically ask for it and explain that you would like - to get first pass of reviews and explain why achieving ``green`` status for it is not easy/feasible/desired. - Similarly if your PR contains ``[WIP]`` in the title or it is marked as ``Draft`` it is not likely to get - reviewed by maintainers unless you specifically ask for it and explain why and what specifically you want - to get reviewed before it reaches ``Ready for review`` status. This might happen if you want to get initial - feedback on the direction of your PR or if you want to get feedback on the design of your PR. - -6. Avoid @-mentioning individual maintainers in your PR, unless you have good reason to believe that they are - available, have time and/or interest in your PR. Generally speaking there are no "exclusive" reviewers for - different parts of the code. Reviewers review PRs and respond when they have some free time to spare and - when they feel they can provide some valuable feedback. If you want to get attention of maintainers, you can just - follow-up on your PR and ask for review in general, however be considerate and do not expect "immediate" - reviews. People review when they have time, most of the maintainers do such reviews in their - free time, which is taken away from their families and other interests, so allow sufficient time before you - follow-up - but if you see no reaction in several days, do follow-up, as with the number of PRs we have - daily, some of them might simply fall through the cracks, and following up shows your interest in completing - the PR as well as puts it at the top of "Recently commented" PRs. However, be considerate and mindful of - the time zones, holidays, busy periods, and expect that some discussions and conversation might take time - and get stalled occasionally. Generally speaking it's the author's responsibility to follow-up on the PR when - they want to get it reviewed and merged. - - -Step 5: Pass PR Review ----------------------- - -.. image:: images/review.png - :align: center - :alt: PR Review - -Note that maintainers will use **Squash and Merge** instead of **Rebase and Merge** -when merging PRs and your commit will be squashed to single commit. - -When a reviewer starts a conversation it is expected that you respond to questions, suggestions, doubts, -and generally it's great if all such conversations seem to converge to a common understanding. You do not -necessarily have to apply all the suggestions (often they are just opinions and suggestions even if they are -coming from seasoned maintainers) - it's perfectly ok that you respond to it with your own opinions and -understanding of the problem and your approach and if you have good arguments, presenting them is a good idea. - -The reviewers might leave several types of responses: - -* ``General PR comment`` - which usually means that there is a question/opinion/suggestion on how the PR can be - improved, or it's an ask to explain how you understand the PR. You can usually quote some parts of such - general comment and respond to it in your comments. Often comments that are raising questions in general - might lead to different discussions, even a request to move the discussion to the devlist or even lead to - completely new PRs created as a spin-off of the discussion. - -* ``Comment/Conversation around specific lines of code`` - such conversation usually flags a potential - improvement, or a potential problem with the code. It's a good idea to respond to such comments and explain - your approach and understanding of the problem. The whole idea of a conversation is try to reach a consensus - on a good way to address the problem. As an author you can resolve the conversation if you think the - problem raised in the comment is resolved or ask the reviewer to re-review, confirm If you do not understand - the comment, you can ask for clarifications. Generally assume good intention of the person who is reviewing - your code and resolve conversations also having good intentions. Understand that it's not a person that - is criticised or argued with, but rather the code and the approach. The important thing is to take care - about quality of the the code and the project and want to make sure that the code is good. - - It's ok to mark the conversation resolved by anyone who can do it - it could be the author, who thinks - the arguments are changes implemented make the conversation resolved, or the maintainer/person who - started the conversation or it can be even marked as resolved by the maintainer who attempts to merge the - PR and thinks that all conversations are resolved. However if you want to make sure attention and decision - on merging the PR is given by maintainer, make sure you monitor, follow-up and close the conversations when - you think they are resolved (ideally explaining why you think the conversation is resolved). - -* ``Request changes`` - this is where maintainer is pretty sure that you should make a change to your PR - because it contains serious flaw, design misconception, or a bug or it is just not in-line with the common - approach Airflow community took on the issue. Usually you should respond to such request and either fix - the problem or convince the maintainer that they were wrong (it happens more often than you think). - Sometimes even if you do not agree with the request, it's a good idea to make the change anyway, because - it might be a good idea to follow the common approach in the project. Sometimes it might even happen that - two maintainers will have completely different opinions on the same issue and you will have to lead the - discussion to try to achieve consensus. If you cannot achieve consensus and you think it's an important - issue, you can ask for a vote on the issue by raising a devlist discussion - where you explain your case - and follow up the discussion with a vote when you cannot achieve consensus there. The ``Request changes`` - status can be withdrawn by the maintainer, but if they don't - such PR cannot be merged - maintainers have - the right to veto any code modification according to the `Apache Software Foundation rules `_. - -* ``Approval`` - this is given by a maintainer after the code has been reviewed and the maintainer agrees that - it is a good idea to merge it. There might still be some unresolved conversations, requests and questions on - such PR and you are expected to resolve them before the PR is merged. But the ``Approval`` status is a sign - of trust from the maintainer who gave the approval that they think the PR is good enough as long as their - comments will be resolved and they put the trust in the hands of the author and - possibly - other - maintainers who will merge the request that they can do that without follow-up re-review and verification. - - -You need to have ``Approval`` of at least one maintainer (if you are maintainer yourself, it has to be -another maintainer). Ideally you should have 2 or more maintainers reviewing the code that touches -the core of Airflow - we do not have enforcement about ``2+`` reviewers required for Core of Airflow, -but maintainers will generally ask in the PR if they think second review is needed. - -Your PR can be merged by a maintainer who will see that the PR is approved, all conversations are resolved -and the code looks good. The criteria for PR being merge-able are: - -* ``green status for static checks and tests`` -* ``conversations resolved`` -* ``approval from 1 (or more for core changes) maintainers`` -* no unresolved ``Request changes`` - -Once you reach the status, you do not need to do anything to get the PR merged. One of the maintainers -will merge such PRs. However if you see that for a few days such a PR is not merged, do not hesitate to comment -on your PR and mention that you think it is ready to be merged. Also, it's a good practice to rebase your PR -to latest ``main``, because there could be other changes merged in the meantime that might cause conflicts or -fail tests or static checks, so by rebasing a PR that has been build few days ago you make sure that it -still passes the tests and static checks today. - - -.. note:: |experimental| - - In December 2023 we enabled - experimentally - the requirement to resolve all the open conversations in a - PR in order to make it merge-able. You will see in the status of the PR that it needs to have all the - conversations resolved before it can be merged. - - This is an experiment and we will evaluate by the end of January 2024. If it turns out to be a good idea, - we will keep it enabled in the future. - - The goal of this experiment is to make it easier to see when there are some conversations that are not - resolved for everyone involved in the PR - author, reviewers and maintainers who try to figure out if - the PR is ready to merge and - eventually - merge it. The goal is also to use conversations more as a "soft" way - to request changes and limit the use of ``Request changes`` status to only those cases when the maintainer - is sure that the PR should not be merged in the current state. That should lead to faster review/merge - cycle and less problems with stalled PRs that have ``Request changes`` status but all the issues are - already solved (assuming that maintainers will start treating the conversations this way). - - -Pull Request guidelines -======================= - -Before you submit a Pull Request (PR) from your forked repo, check that it meets -these guidelines: - -- Include tests, either as doctests, unit tests, or both, to your pull request. - - The airflow repo uses `GitHub Actions `__ to - run the tests and `codecov `__ to track - coverage. You can set up both for free on your fork. It will help you make sure you do not - break the build with your PR and that you help increase coverage. - Also we advise to install locally `pre-commit hooks `__ to - apply various checks, code generation and formatting at the time you make a local commit - which - gives you near-immediate feedback on things you need to fix before you push your code to the PR, or in - many case it will even fix it for you locally so that you can add and commit it straight away. - -- Follow our project's `Coding style and best practices`_. Usually we attempt to enforce the practices by - having appropriate pre-commits. There are checks amongst them that aren't currently enforced - programmatically (either because they are too hard or just not yet done). - -- We prefer that you ``rebase`` your PR (and do it quite often) rather than merge. It leads to - easier reviews and cleaner changes where you know exactly what changes you've done. You can learn more - about rebase vs. merge workflow in `Rebase and merge your pull request `__ - and `Rebase your fork `__. Make sure to resolve all conflicts - during rebase. - -- When merging PRs, Maintainer will use **Squash and Merge** which means then your PR will be merged as one - commit, regardless of the number of commits in your PR. During the review cycle, you can keep a commit - history for easier review, but if you need to, you can also squash all commits to reduce the - maintenance burden during rebase. - -- Add an `Apache License `__ header to all new files. If you - have ``pre-commit`` installed, pre-commit will do it automatically for you. If you hesitate to install - pre-commit for your local repository - for example because it takes a few seconds to commit your changes, - this one thing might be a good reason to convince anyone to install pre-commit. - -- If your PR adds functionality, make sure to update the docs as part of the same PR, not only - code and tests. Docstring is often sufficient. Make sure to follow the Sphinx compatible standards. - -- Make sure your code fulfills all the - `static code checks `__ we have in our code. The easiest way - to make sure of that is - again - to install `pre-commit hooks `__ - -- Make sure your PR is small and focused on one change only - avoid adding unrelated changes, mixing - adding features and refactoring. Keeping to that rule will make it easier to review your PR and will make - it easier for release managers if they decide that your change should be cherry-picked to release it in a - bug-fix release of Airflow. If you want to add a new feature and refactor the code, it's better to split the - PR to several smaller PRs. It's also quite a good and common idea to keep a big ``Draft`` PR if you have - a bigger change that you want to make and then create smaller PRs from it that are easier to review and - merge and cherry-pick. It takes a long time (and a lot of attention and focus of a reviewer to review - big PRs so by splitting it to smaller PRs you actually speed up the review process and make it easier - for your change to be eventually merged. - -- Run relevant tests locally before opening PR. Often tests are placed in the files that are corresponding - to the changed code (for example for ``airflow/cli/cli_parser.py`` changes you have tests in - ``tests/cli/test_cli_parser.py``). However there are a number of cases where the tests that should run - are placed elsewhere - you can either run tests for the whole ``TEST_TYPE`` that is relevant (see - ``breeze testing tests --help`` output for available test types) or you can run all tests, or eventually - you can push your code to PR and see results of the tests in the CI. - -- You can use any supported python version to run the tests, but the best is to check - if it works for the oldest supported version (Python 3.8 currently). In rare cases - tests might fail with the oldest version when you use features that are available in newer Python - versions. For that purpose we have ``airflow.compat`` package where we keep back-ported - useful features from newer versions. - -- Adhere to guidelines for commit messages described in this `article `__. - This makes the lives of those who come after you (and your future self) a lot easier. - -Airflow Git Branches -==================== - -All new development in Airflow happens in the ``main`` branch. All PRs should target that branch. - -We also have a ``v2-*-test`` branches that are used to test ``2.*.x`` series of Airflow and where maintainers -cherry-pick selected commits from the main branch. - -Cherry-picking is done with the ``-x`` flag. - -The ``v2-*-test`` branch might be broken at times during testing. Expect force-pushes there so -maintainers should coordinate between themselves on who is working on the ``v2-*-test`` branch - -usually these are developers with the release manager permissions. - -The ``v2-*-stable`` branch is rather stable - there are minimum changes coming from approved PRs that -passed the tests. This means that the branch is rather, well, "stable". - -Once the ``v2-*-test`` branch stabilises, the ``v2-*-stable`` branch is synchronized with ``v2-*-test``. -The ``v2-*-stable`` branches are used to release ``2.*.x`` releases. - -The general approach is that cherry-picking a commit that has already had a PR and unit tests run -against main is done to ``v2-*-test`` branches, but PRs from contributors towards 2.0 should target -``v2-*-stable`` branches. - -The ``v2-*-test`` branches and ``v2-*-stable`` ones are merged just before the release and that's the -time when they converge. - -The production images are released in DockerHub from: - -* main branch for development -* ``2.*.*``, ``2.*.*rc*`` releases from the ``v2-*-stable`` branch when we prepare release candidates and - final releases. - -Development Environments -======================== - -There are two environments, available on Linux and macOS, that you can use to -develop Apache Airflow: - -- `Local virtualenv development environment `_ - that supports running unit tests and can be used in your IDE. - -- `Breeze Docker-based development environment `_ that provides - an end-to-end CI solution with all software dependencies covered. - -The table below summarizes differences between the environments: - - -========================= ================================ ===================================== ======================================== -**Property** **Local virtualenv** **Breeze environment** **GitHub Codespaces** -========================= ================================ ===================================== ======================================== -Dev machine needed - (-) You need a dev PC - (-) You need a dev PC (+) Works with remote setup -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Test coverage - (-) unit tests only - (+) integration and unit tests (*/-) integration tests (extra config) -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Setup - (+) automated with breeze cmd - (+) automated with breeze cmd (+) automated with VSCode -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Installation difficulty - (-) depends on the OS setup - (+) works whenever Docker works (+) works in a modern browser/VSCode -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Team synchronization - (-) difficult to achieve - (+) reproducible within team (+) reproducible within team -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Reproducing CI failures - (-) not possible in many cases - (+) fully reproducible (+) reproduce CI failures -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Ability to update - (-) requires manual updates - (+) automated update via breeze cmd (+/-) can be rebuild on demand -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -Disk space and CPU usage - (+) relatively lightweight - (-) uses GBs of disk and many CPUs (-) integration tests (extra config) -------------------------- -------------------------------- ------------------------------------- ---------------------------------------- -IDE integration - (+) straightforward - (-) via remote debugging only (-) integration tests (extra config) -========================= ================================ ===================================== ---------------------------------------- - - -Typically, you are recommended to use both of these environments depending on your needs. - -Local virtualenv Development Environment ----------------------------------------- - -All details about using and running local virtualenv environment for Airflow can be found -in `LOCAL_VIRTUALENV.rst `__. - -Benefits: - -- Packages are installed locally. No container environment is required. - -- You can benefit from local debugging within your IDE. - -- With the virtualenv in your IDE, you can benefit from autocompletion and running tests directly from the IDE. - -Limitations: - -- You have to maintain your dependencies and local environment consistent with - other development environments that you have on your local machine. - -- You cannot run tests that require external components, such as mysql, - postgres database, hadoop, mongo, cassandra, redis, etc. - - The tests in Airflow are a mixture of unit and integration tests and some of - them require these components to be set up. Local virtualenv supports only - real unit tests. Technically, to run integration tests, you can configure - and install the dependencies on your own, but it is usually complex. - Instead, you are recommended to use - `Breeze development environment `__ with all required packages - pre-installed. - -- You need to make sure that your local environment is consistent with other - developer environments. This often leads to a "works for me" syndrome. The - Breeze container-based solution provides a reproducible environment that is - consistent with other developers. - -- You are **STRONGLY** encouraged to also install and use `pre-commit hooks `_ - for your local virtualenv development environment. - Pre-commit hooks can speed up your development cycle a lot. - -Breeze Development Environment ------------------------------- - -All details about using and running Airflow Breeze can be found in -`Breeze `__. - -The Airflow Breeze solution is intended to ease your local development as "*It's -a Breeze to develop Airflow*". - -Benefits: - -- Breeze is a complete environment that includes external components, such as - mysql database, hadoop, mongo, cassandra, redis, etc., required by some of - Airflow tests. Breeze provides a preconfigured Docker Compose environment - where all these services are available and can be used by tests - automatically. - -- Breeze environment is almost the same as used in the CI automated builds. - So, if the tests run in your Breeze environment, they will work in the CI as well. - See ``_ for details about Airflow CI. - -Limitations: - -- Breeze environment takes significant space in your local Docker cache. There - are separate environments for different Python and Airflow versions, and - each of the images takes around 3GB in total. - -- Though Airflow Breeze setup is automated, it takes time. The Breeze - environment uses pre-built images from DockerHub and it takes time to - download and extract those images. Building the environment for a particular - Python version takes less than 10 minutes. - -- Breeze environment runs in the background taking precious resources, such as - disk space and CPU. You can stop the environment manually after you use it - or even use a ``bare`` environment to decrease resource usage. - - - -.. note:: - - Breeze CI images are not supposed to be used in production environments. - They are optimized for repeatability of tests, maintainability and speed of building rather - than production performance. The production images are not yet officially published. - - - -Airflow dependencies -==================== - -.. note:: - - Only ``pip`` installation is currently officially supported. - - While there are some successes with using other tools like `poetry `_ or - `pip-tools `_, they do not share the same workflow as - ``pip`` - especially when it comes to constraint vs. requirements management. - Installing via ``Poetry`` or ``pip-tools`` is not currently supported. - - There are known issues with ``bazel`` that might lead to circular dependencies when using it to install - Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing - the problem in `this PR `_ so it might be that - newer versions of ``bazel`` will handle it. - - If you wish to install airflow using those tools you should use the constraint files and convert - them to appropriate format and workflow that your tool requires. - - -Extras ------- - -There are a number of extras that can be specified when installing Airflow. Those -extras can be specified after the usual pip install - for example ``pip install -e.[ssh]`` for editable -installation. Note that there are two kinds of those extras - ``regular`` extras (used when you install -airflow as a user, but in ``editable`` mode you can also install ``devel`` extras that are necessary if -you want to run airflow locally for testing and ``doc`` extras that install tools needed to build -the documentation. - -This is the full list of those extras: - -Devel extras -............. - -The ``devel`` extras are not available in the released packages. They are only available when you install -Airflow from sources in ``editable`` installation - i.e. one that you are usually using to contribute to -Airflow. They provide tools such as ``pytest`` and ``mypy`` for general purpose development and testing, also -some providers have their own development-related extras tbat allow to install tools necessary to run tests, -where the tools are specific for the provider. - - - .. START DEVEL EXTRAS HERE -devel, devel-all, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel- -hadoop, devel-mypy, devel-sentry, devel-static-checks, devel-tests - .. END DEVEL EXTRAS HERE - -Doc extras -........... - -The ``doc`` extras are not available in the released packages. They are only available when you install -Airflow from sources in ``editable`` installation - i.e. one that you are usually using to contribute to -Airflow. They provide tools needed when you want to build Airflow documentation (note that you also need -``devel`` extras installed for airflow and providers in order to build documentation for airflow and -provider packages respectively). The ``doc`` package is enough to build regular documentation, where -``doc_gen`` is needed to generate ER diagram we have describing our database. - - .. START DOC EXTRAS HERE -doc, doc-gen - .. END DOC EXTRAS HERE - - -Regular extras -.............. - -Those extras are available as regular Airflow extras and are targeted to be used by Airflow users and -contributors to select features of Airflow they want to use They might install additional providers or -just install dependencies that are necessary to enable the feature. - - .. START REGULAR EXTRAS HERE -aiobotocore, airbyte, alibaba, all, all-core, all-dbs, amazon, apache-atlas, apache-beam, apache- -cassandra, apache-drill, apache-druid, apache-flink, apache-hdfs, apache-hive, apache-impala, -apache-kafka, apache-kylin, apache-livy, apache-pig, apache-pinot, apache-spark, apache-webhdfs, -apprise, arangodb, asana, async, atlas, atlassian-jira, aws, azure, cassandra, celery, cgroups, -cloudant, cncf-kubernetes, cohere, common-io, common-sql, crypto, databricks, datadog, dbt-cloud, -deprecated-api, dingding, discord, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp, -gcp_api, github, github-enterprise, google, google-auth, graphviz, grpc, hashicorp, hdfs, hive, -http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, ldap, leveldb, microsoft-azure, -microsoft-mssql, microsoft-psrp, microsoft-winrm, mongo, mssql, mysql, neo4j, odbc, openai, -openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, pandas, papermill, password, -pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, salesforce, samba, saml, -segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, spark, sqlite, ssh, statsd, -tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, webhdfs, winrm, yandex, zendesk - .. END REGULAR EXTRAS HERE - -Provider packages ------------------ - -Airflow 2.0 is split into core and providers. They are delivered as separate packages: - -* ``apache-airflow`` - core of Apache Airflow -* ``apache-airflow-providers-*`` - More than 70 provider packages to communicate with external services - -The information/meta-data about the providers is kept in ``provider.yaml`` file in the right sub-directory -of ``airflow\providers``. This file contains: - -* package name (``apache-airflow-provider-*``) -* user-facing name of the provider package -* description of the package that is available in the documentation -* list of versions of package that have been released so far -* list of dependencies of the provider package -* list of additional-extras that the provider package provides (together with dependencies of those extras) -* list of integrations, operators, hooks, sensors, transfers provided by the provider (useful for documentation generation) -* list of connection types, extra-links, secret backends, auth backends, and logging handlers (useful to both - register them as they are needed by Airflow and to include them in documentation automatically). -* and more ... - -If you want to add dependencies to the provider, you should add them to the corresponding ``provider.yaml`` -and Airflow pre-commits and package generation commands will use them when preparing package information. - -In Airflow 1.10 all those providers were installed together within one single package and when you installed -airflow locally, from sources, they were also installed. In Airflow 2.0, providers are separated out, -and not packaged together with the core when you build "apache-airflow" package, however when you install -airflow project locally with ``pip install -e ".[devel]"`` they are available on the same -environment as Airflow. - -You should only update dependencies for the provider in the corresponding ``provider.yaml`` which is the -source of truth for all information about the provider. - -Some of the packages have cross-dependencies with other providers packages. This typically happens for -transfer operators where operators use hooks from the other providers in case they are transferring -data between the providers. The list of dependencies is maintained (automatically with the -``update-providers-dependencies`` pre-commit) in the ``generated/provider_dependencies.json``. -Same pre-commit also updates generate dependencies in ``pyproject.toml``. - -Cross-dependencies between provider packages are converted into extras - if you need functionality from -the other provider package you can install it adding [extra] after the -``apache-airflow-providers-PROVIDER`` for example: -``pip install apache-airflow-providers-google[amazon]`` in case you want to use GCP -transfer operators from Amazon ECS. - -If you add a new dependency between different providers packages, it will be detected automatically during -and pre-commit will generate new entry in ``generated/provider_dependencies.json`` and update -``pyproject.toml`` so that the package extra dependencies are properly handled when package -might be installed when breeze is restarted or by your IDE or by running ``pip install -e ".[devel]"``. - -Developing community managed provider packages ----------------------------------------------- - -While you can develop your own providers, Apache Airflow has 60+ providers that are managed by the community. -They are part of the same repository as Apache Airflow (we use ``monorepo`` approach where different -parts of the system are developed in the same repository but then they are packaged and released separately). -All the community-managed providers are in 'airflow/providers' folder and they are all sub-packages of -'airflow.providers' package. All the providers are available as ``apache-airflow-providers-`` -packages when installed by users, but when you contribute to providers you can work on airflow main -and install provider dependencies via ``editable`` extras - without having to manage and install providers -separately, you can easily run tests for the providers and when you run airflow from the ``main`` -sources, all community providers are automatically available for you. - -The capabilities of the community-managed providers are the same as the third-party ones. When -the providers are installed from PyPI, they provide the entry-point containing the metadata as described -in the previous chapter. However when they are locally developed, together with Airflow, the mechanism -of discovery of the providers is based on ``provider.yaml`` file that is placed in the top-folder of -the provider. The ``provider.yaml`` is the single source of truth for the provider metadata and it is -there where you should add and remove dependencies for providers (following by running -``update-providers-dependencies`` pre-commit to synchronize the dependencies with ``pyproject.toml`` -of Airflow). - -The ``provider.yaml`` file is compliant with the schema that is available in -`json-schema specification `_. - -Thanks to that mechanism, you can develop community managed providers in a seamless way directly from -Airflow sources, without preparing and releasing them as packages separately, which would be rather -complicated. - -Regardless if you plan to contribute your provider, when you are developing your own, custom providers, -you can use the above functionality to make your development easier. You can add your provider -as a sub-folder of the ``airflow.providers`` package, add the ``provider.yaml`` file and install airflow -in development mode - then capabilities of your provider will be discovered by airflow and you will see -the provider among other providers in ``airflow providers`` command output. - - -Documentation for the community managed providers -------------------------------------------------- - -When you are developing a community-managed provider, you are supposed to make sure it is well tested -and documented. Part of the documentation is ``provider.yaml`` file ``integration`` information and -``version`` information. This information is stripped-out from provider info available at runtime, -however it is used to automatically generate documentation for the provider. - -If you have pre-commits installed, pre-commit will warn you and let you know what changes need to be -done in the ``provider.yaml`` file when you add a new Operator, Hooks, Sensor or Transfer. You can -also take a look at the other ``provider.yaml`` files as examples. +Contributing +============ -Well documented provider contains those: - -* index.rst with references to packages, API used and example dags -* configuration reference -* class documentation generated from PyDoc in the code -* example dags -* how-to guides - -You can see for example ``google`` provider which has very comprehensive documentation: - -* `Documentation `_ -* `Example DAGs `_ - -Part of the documentation are example dags (placed in the ``tests/system`` folder). The reason why -they are in ``tests/system`` is because we are using the example dags for various purposes: - -* showing real examples of how your provider classes (Operators/Sensors/Transfers) can be used -* snippets of the examples are embedded in the documentation via ``exampleinclude::`` directive -* examples are executable as system tests and some of our stakeholders run them regularly to - check if ``system`` level instagration is still working, before releasing a new version of the provider. - -Testing the community managed providers ---------------------------------------- - -We have high requirements when it comes to testing the community managed providers. We have to be sure -that we have enough coverage and ways to tests for regressions before the community accepts such -providers. - -* Unit tests have to be comprehensive and they should tests for possible regressions and edge cases - not only "green path" - -* Integration tests where 'local' integration with a component is possible (for example tests with - MySQL/Postgres DB/Trino/Kerberos all have integration tests which run with real, dockerized components - -* System Tests which provide end-to-end testing, usually testing together several operators, sensors, - transfers connecting to a real external system - -You can read more about out approach for tests in `TESTING.rst `_ but here -are some highlights. - -Dependency management -===================== - -Airflow is not a standard python project. Most of the python projects fall into one of two types - -application or library. As described in -`this StackOverflow question `_, -the decision whether to pin (freeze) dependency versions for a python project depends on the type. For -applications, dependencies should be pinned, but for libraries, they should be open. - -For application, pinning the dependencies makes it more stable to install in the future - because new -(even transitive) dependencies might cause installation to fail. For libraries - the dependencies should -be open to allow several different libraries with the same requirements to be installed at the same time. - -The problem is that Apache Airflow is a bit of both - application to install and library to be used when -you are developing your own operators and DAGs. - -This - seemingly unsolvable - puzzle is solved by having pinned constraints files. - -Pinned constraint files -======================= - -.. note:: - - Only ``pip`` installation is officially supported. - - While it is possible to install Airflow with tools like `poetry `_ or - `pip-tools `_, they do not share the same workflow as - ``pip`` - especially when it comes to constraint vs. requirements management. - Installing via ``Poetry`` or ``pip-tools`` is not currently supported. - - There are known issues with ``bazel`` that might lead to circular dependencies when using it to install - Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing - the problem in `this PR `_ so it might be that - newer versions of ``bazel`` will handle it. - - If you wish to install airflow using those tools you should use the constraint files and convert - them to appropriate format and workflow that your tool requires. - - -By default when you install ``apache-airflow`` package - the dependencies are as open as possible while -still allowing the apache-airflow package to install. This means that ``apache-airflow`` package might fail to -install in case a direct or transitive dependency is released that breaks the installation. In such case -when installing ``apache-airflow``, you might need to provide additional constraints (for -example ``pip install apache-airflow==1.10.2 Werkzeug<1.0.0``) - -There are several sets of constraints we keep: - -* 'constraints' - those are constraints generated by matching the current airflow version from sources - and providers that are installed from PyPI. Those are constraints used by the users who want to - install airflow with pip, they are named ``constraints-.txt``. - -* "constraints-source-providers" - those are constraints generated by using providers installed from - current sources. While adding new providers their dependencies might change, so this set of providers - is the current set of the constraints for airflow and providers from the current main sources. - Those providers are used by CI system to keep "stable" set of constraints. They are named - ``constraints-source-providers-.txt`` - -* "constraints-no-providers" - those are constraints generated from only Apache Airflow, without any - providers. If you want to manage airflow separately and then add providers individually, you can - use those. Those constraints are named ``constraints-no-providers-.txt``. - -The first two can be used as constraints file when installing Apache Airflow in a repeatable way. -It can be done from the sources: - -from the PyPI package: - -.. code-block:: bash - - pip install apache-airflow[google,amazon,async]==2.2.5 \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.5/constraints-3.8.txt" - -The last one can be used to install Airflow in "minimal" mode - i.e when bare Airflow is installed without -extras. - -When you install airflow from sources (in editable mode) you should use "constraints-source-providers" -instead (this accounts for the case when some providers have not yet been released and have conflicting -requirements). - -.. code-block:: bash - - pip install -e ".[devel]" \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt" - - -This works also with extras - for example: - -.. code-block:: bash - - pip install ".[ssh]" \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt" - - -There are different set of fixed constraint files for different python major/minor versions and you should -use the right file for the right python version. - -If you want to update just airflow dependencies, without paying attention to providers, you can do it using -``constraints-no-providers`` constraint files as well. - -.. code-block:: bash - - pip install . --upgrade \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt" - - -The ``constraints-.txt`` and ``constraints-no-providers-.txt`` -will be automatically regenerated by CI job every time after the ``pyproject.toml`` is updated and pushed -if the tests are successful. - - -Documentation -============= - -Documentation for ``apache-airflow`` package and other packages that are closely related to it ie. -providers packages are in ``/docs/`` directory. For detailed information on documentation development, -see: `docs/README.rst `_ - -Diagrams -======== - -We started to use (and gradually convert old diagrams to use it) `Diagrams `_ -as our tool of choice to generate diagrams. The diagrams are generated from Python code and can be -automatically updated when the code changes. The diagrams are generated using pre-commit hooks (See -static checks below) but they can also be generated manually by running the corresponding Python code. - -To run the code you need to install the dependencies in the virtualenv you use to run it: -* ``pip install diagrams rich``. You need to have graphviz installed in your -system (``brew install graphviz`` on macOS for example). - -The source code of the diagrams are next to the generated diagram, the difference is that the source -code has ``.py`` extension and the generated diagram has ``.png`` extension. The pre-commit hook - ``generate-airflow-diagrams`` will look for ``diagram_*.py`` files in the ``docs`` subdirectories -to find them and runs them when the sources changed and the diagrams are not up to date (the -pre-commit will automatically generate an .md5sum hash of the sources and store it next to the diagram -file). - -In order to generate the diagram manually you can run the following command: - -.. code-block:: bash - - python .py - -You can also generate all diagrams by: - -.. code-block:: bash - - pre-commit run generate-airflow-diagrams - -or with Breeze: - -.. code-block:: bash - - breeze static-checks --type generate-airflow-diagrams --all-files - -When you iterate over a diagram, you can also setup a "save" action in your IDE to run the python -file automatically when you save the diagram file. - -Once you've done iteration and you are happy with the diagram, you can commit the diagram, the source -code and the .md5sum file. The pre-commit hook will then not run the diagram generation until the -source code for it changes. - - -Static code checks -================== - -We check our code quality via static code checks. See -`STATIC_CODE_CHECKS.rst `_ for details. - -Your code must pass all the static code checks in the CI in order to be eligible for Code Review. -The easiest way to make sure your code is good before pushing is to use pre-commit checks locally -as described in the static code checks documentation. - -.. _coding_style: - -Coding style and best practices -=============================== - -Most of our coding style rules are enforced programmatically by ruff and mypy, which are run automatically -with static checks and on every Pull Request (PR), but there are some rules that are not yet automated and -are more Airflow specific or semantic than style. - -Don't Use Asserts Outside Tests -------------------------------- - -Our community agreed that to various reasons we do not use ``assert`` in production code of Apache Airflow. -For details check the relevant `mailing list thread `_. - -In other words instead of doing: - -.. code-block:: python - - assert some_predicate() - -you should do: - -.. code-block:: python - - if not some_predicate(): - handle_the_case() - -The one exception to this is if you need to make an assert for typechecking (which should be almost a last resort) you can do this: - -.. code-block:: python - - if TYPE_CHECKING: - assert isinstance(x, MyClass) - - -Database Session Handling -------------------------- - -**Explicit is better than implicit.** If a function accepts a ``session`` parameter it should not commit the -transaction itself. Session management is up to the caller. - -To make this easier, there is the ``create_session`` helper: - -.. code-block:: python - - from sqlalchemy.orm import Session - - from airflow.utils.session import create_session - - - def my_call(x, y, *, session: Session): - ... - # You MUST not commit the session here. - - - with create_session() as session: - my_call(x, y, session=session) - -.. warning:: - **DO NOT** add a default to the ``session`` argument **unless** ``@provide_session`` is used. - -If this function is designed to be called by "end-users" (i.e. DAG authors) then using the ``@provide_session`` wrapper is okay: - -.. code-block:: python - - from sqlalchemy.orm import Session - - from airflow.utils.session import NEW_SESSION, provide_session - - - @provide_session - def my_method(arg, *, session: Session = NEW_SESSION): - ... - # You SHOULD not commit the session here. The wrapper will take care of commit()/rollback() if exception - -In both cases, the ``session`` argument is a `keyword-only argument`_. This is the most preferred form if -possible, although there are some exceptions in the code base where this cannot be used, due to backward -compatibility considerations. In most cases, ``session`` argument should be last in the argument list. - -.. _`keyword-only argument`: https://www.python.org/dev/peps/pep-3102/ - - -Don't use time() for duration calculations ------------------------------------------ - -If you wish to compute the time difference between two events with in the same process, use -``time.monotonic()``, not ``time.time()`` nor ``timezone.utcnow()``. - -If you are measuring duration for performance reasons, then ``time.perf_counter()`` should be used. (On many -platforms, this uses the same underlying clock mechanism as monotonic, but ``perf_counter`` is guaranteed to be -the highest accuracy clock on the system, monotonic is simply "guaranteed" to not go backwards.) - -If you wish to time how long a block of code takes, use ``Stats.timer()`` -- either with a metric name, which -will be timed and submitted automatically: - -.. code-block:: python - - from airflow.stats import Stats - - ... - - with Stats.timer("my_timer_metric"): - ... - -or to time but not send a metric: - -.. code-block:: python - - from airflow.stats import Stats - - ... - - with Stats.timer() as timer: - ... - - log.info("Code took %.3f seconds", timer.duration) - -For full docs on ``timer()`` check out `airflow/stats.py`_. - -If the start_date of a duration calculation needs to be stored in a database, then this has to be done using -datetime objects. In all other cases, using datetime for duration calculation MUST be avoided as creating and -diffing datetime operations are (comparatively) slow. - -Naming Conventions for provider packages ----------------------------------------- - -In Airflow 2.0 we standardized and enforced naming for provider packages, modules and classes. -those rules (introduced as AIP-21) were not only introduced but enforced using automated checks -that verify if the naming conventions are followed. Here is a brief summary of the rules, for -detailed discussion you can go to `AIP-21 Changes in import paths `_ - -The rules are as follows: - -* Provider packages are all placed in 'airflow.providers' - -* Providers are usually direct sub-packages of the 'airflow.providers' package but in some cases they can be - further split into sub-packages (for example 'apache' package has 'cassandra', 'druid' ... providers ) out - of which several different provider packages are produced (apache.cassandra, apache.druid). This is - case when the providers are connected under common umbrella but very loosely coupled on the code level. - -* In some cases the package can have sub-packages but they are all delivered as single provider - package (for example 'google' package contains 'ads', 'cloud' etc. sub-packages). This is in case - the providers are connected under common umbrella and they are also tightly coupled on the code level. - -* Typical structure of provider package: - * example_dags -> example DAGs are stored here (used for documentation and System Tests) - * hooks -> hooks are stored here - * operators -> operators are stored here - * sensors -> sensors are stored here - * secrets -> secret backends are stored here - * transfers -> transfer operators are stored here - -* Module names do not contain word "hooks", "operators" etc. The right type comes from - the package. For example 'hooks.datastore' module contains DataStore hook and 'operators.datastore' - contains DataStore operators. - -* Class names contain 'Operator', 'Hook', 'Sensor' - for example DataStoreHook, DataStoreExportOperator - -* Operator name usually follows the convention: ``Operator`` - (BigQueryExecuteQueryOperator) is a good example - -* Transfer Operators are those that actively push data from one service/provider and send it to another - service (might be for the same or another provider). This usually involves two hooks. The convention - for those ``ToOperator``. They are not named *TransferOperator nor *Transfer. - -* Operators that use external service to perform transfer (for example CloudDataTransferService operators - are not placed in "transfers" package and do not have to follow the naming convention for - transfer operators. - -* It is often debatable where to put transfer operators but we agreed to the following criteria: - - * We use "maintainability" of the operators as the main criteria - so the transfer operator - should be kept at the provider which has highest "interest" in the transfer operator - - * For Cloud Providers or Service providers that usually means that the transfer operators - should land at the "target" side of the transfer - -* Secret Backend name follows the convention: ``Backend``. - -* Tests are grouped in parallel packages under "tests.providers" top level package. Module name is usually - ``test_.py``, - -* System tests (not yet fully automated but allowing to run e2e testing of particular provider) are - named with _system.py suffix. - -Test Infrastructure -=================== - -We support the following types of tests: - -* **Unit tests** are Python tests launched with ``pytest``. - Unit tests are available both in the `Breeze environment `_ - and `local virtualenv `_. - -* **Integration tests** are available in the Breeze development environment - that is also used for Airflow's CI tests. Integration test are special tests that require - additional services running, such as Postgres, Mysql, Kerberos, etc. - -* **System tests** are automatic tests that use external systems like - Google Cloud. These tests are intended for an end-to-end DAG execution. - -For details on running different types of Airflow tests, see `TESTING.rst `_. - -Metadata Database Updates -========================= - -When developing features, you may need to persist information to the metadata -database. Airflow has `Alembic `__ built-in -module to handle all schema changes. Alembic must be installed on your -development machine before continuing with migration. - - -.. code-block:: bash - - # starting at the root of the project - $ pwd - ~/airflow - # change to the airflow directory - $ cd airflow - $ alembic revision -m "add new field to db" - Generating - ~/airflow/airflow/migrations/versions/a1e23c41f123_add_new_field_to_db.py - -Note that migration file names are standardized by pre-commit hook ``update-migration-references``, so that they sort alphabetically and indicate -the Airflow version in which they first appear (the alembic revision ID is removed). As a result you should expect to see a pre-commit failure -on the first attempt. Just stage the modified file and commit again -(or run the hook manually before committing). - -After your new migration file is run through pre-commit it will look like this: - -.. code-block:: - - 1234_A_B_C_add_new_field_to_db.py - -This represents that your migration is the 1234th migration and expected for release in Airflow version A.B.C. - -Node.js Environment Setup -========================= - -``airflow/www/`` contains all yarn-managed, front-end assets. Flask-Appbuilder -itself comes bundled with jQuery and bootstrap. While they may be phased out -over time, these packages are currently not managed with yarn. - -Make sure you are using recent versions of node and yarn. No problems have been -found with node\>=8.11.3 and yarn\>=1.19.1. The pre-commit framework of ours install -node and yarn automatically when installed - if you use ``breeze`` you do not need to install -neither node nor yarn. - -Installing yarn and its packages manually ------------------------------------------ - -To install yarn on macOS: - -1. Run the following commands (taken from `this source `__): - -.. code-block:: bash - - brew install node - brew install yarn - yarn config set prefix ~/.yarn - - -2. Add ``~/.yarn/bin`` to your ``PATH`` so that commands you are installing - could be used globally. - -3. Set up your ``.bashrc`` file and then ``source ~/.bashrc`` to reflect the - change. - -.. code-block:: bash - - export PATH="$HOME/.yarn/bin:$PATH" - -4. Install third-party libraries defined in ``package.json`` by running the following command - -.. code-block:: bash - - yarn install - -Generate Bundled Files with yarn --------------------------------- - -To parse and generate bundled files for Airflow, run either of the following -commands: - -.. code-block:: bash - - # Compiles the production / optimized js & css - yarn run prod - - # Starts a web server that manages and updates your assets as you modify them - # You'll need to run the webserver in debug mode too: ``airflow webserver -d`` - yarn run dev - - -Follow Style Guide ------------------- - -We try to enforce a more consistent style and follow the Javascript/Typescript community -guidelines. - -Once you add or modify any JS/TS code in the project, please make sure it -follows the guidelines defined in `Airbnb -JavaScript Style Guide `__. - -Apache Airflow uses `ESLint `__ as a tool for identifying and -reporting issues in JS/TS, and `Prettier `__ for code formatting. -Most IDE directly integrate with these tools, you can also manually run them with any of the following commands: - -.. code-block:: bash - - # Format code in .js, .jsx, .ts, .tsx, .json, .css, .html files - yarn format - - # Check JS/TS code in .js, .jsx, .ts, .tsx, .html files and report any errors/warnings - yarn run lint - - # Check JS/TS code in .js, .jsx, .ts, .tsx, .html files and report any errors/warnings and fix them if possible - yarn run lint:fix - - # Run tests for all .test.js, .test.jsx, .test.ts, test.tsx files - yarn test - -React, JSX and Chakra ------------------------------ - -In order to create a more modern UI, we have started to include `React `__ in the ``airflow/www/`` project. -If you are unfamiliar with React then it is recommended to check out their documentation to understand components and jsx syntax. - -We are using `Chakra UI `__ as a component and styling library. Notably, all styling is done in a theme file or -inline when defining a component. There are a few shorthand style props like ``px`` instead of ``padding-right, padding-left``. -To make this work, all Chakra styling and css styling are completely separate. It is best to think of the React components as a separate app -that lives inside of the main app. - -How to sync your fork -===================== - -When you have your fork, you should periodically synchronize the main of your fork with the -Apache Airflow main. In order to do that you can ``git pull --rebase`` to your local git repository from -apache remote and push the main (often with ``--force`` to your fork). There is also an easy -way to sync your fork in GitHub's web UI with the `Fetch upstream feature -`_. - -This will force-push the ``main`` branch from ``apache/airflow`` to the ``main`` branch -in your fork. Note that in case you modified the main in your fork, you might loose those changes. - - -How to rebase PR -================ - -A lot of people are unfamiliar with the rebase workflow in Git, but we think it is an excellent workflow, -providing a better alternative to the merge workflow. We've therefore written a short guide for those who -would like to learn it. - - -As of February 2022, GitHub introduced the capability of "Update with Rebase" which make it easy to perform -rebase straight in the GitHub UI, so in cases when there are no conflicts, rebasing to latest version -of ``main`` can be done very easily following the instructions -`in the GitHub blog `_ - -.. image:: images/rebase.png - :align: center - :alt: Update PR with rebase - -However, when you have conflicts, sometimes you will have to perform rebase manually, and resolve the -conflicts, and remainder of the section describes how to approach it. - -As opposed to the merge workflow, the rebase workflow allows us to clearly separate your changes from the -changes of others. It puts the responsibility of rebasing on the -author of the change. It also produces a "single-line" series of commits on the main branch. This -makes it easier to understand what was going on and to find reasons for problems (it is especially -useful for "bisecting" when looking for a commit that introduced some bugs). - -First of all, we suggest you read about the rebase workflow here: -`Merging vs. rebasing `_. This is an -excellent article that describes all the ins/outs of the rebase workflow. I recommend keeping it for future reference. - -The goal of rebasing your PR on top of ``apache/main`` is to "transplant" your change on top of -the latest changes that are merged by others. It also allows you to fix all the conflicts -that arise as a result of other people changing the same files as you and merging the changes to ``apache/main``. - -Here is how rebase looks in practice (you can find a summary below these detailed steps): - -1. You first need to add the Apache project remote to your git repository. This is only necessary once, -so if it's not the first time you are following this tutorial you can skip this step. In this example, -we will be adding the remote -as "apache" so you can refer to it easily: - -* If you use ssh: ``git remote add apache git@github.com:apache/airflow.git`` -* If you use https: ``git remote add apache https://github.com/apache/airflow.git`` - -2. You then need to make sure that you have the latest main fetched from the ``apache`` repository. You can do this - via: - - ``git fetch apache`` (to fetch apache remote) - - ``git fetch --all`` (to fetch all remotes) - -3. Assuming that your feature is in a branch in your repository called ``my-branch`` you can easily check - what is the base commit you should rebase from by: - - ``git merge-base my-branch apache/main`` - - This will print the HASH of the base commit which you should use to rebase your feature from. - For example: ``5abce471e0690c6b8d06ca25685b0845c5fd270f``. Copy that HASH and go to the next step. - - Optionally, if you want better control you can also find this commit hash manually. - - Run: - - ``git log`` - - And find the first commit that you DO NOT want to "transplant". - - Performing: - - ``git rebase HASH`` - - Will "transplant" all commits after the commit with the HASH. - -4. Providing that you weren't already working on your branch, check out your feature branch locally via: - - ``git checkout my-branch`` - -5. Rebase: - - ``git rebase HASH --onto apache/main`` - - For example: - - ``git rebase 5abce471e0690c6b8d06ca25685b0845c5fd270f --onto apache/main`` - -6. If you have no conflicts - that's cool. You rebased. You can now run ``git push --force-with-lease`` to - push your changes to your repository. That should trigger the build in our CI if you have a - Pull Request (PR) opened already. - -7. While rebasing you might have conflicts. Read carefully what git tells you when it prints information - about the conflicts. You need to solve the conflicts manually. This is sometimes the most difficult - part and requires deliberately correcting your code and looking at what has changed since you developed your - changes. - - There are various tools that can help you with this. You can use: - - ``git mergetool`` - - You can configure different merge tools with it. You can also use IntelliJ/PyCharm's excellent merge tool. - When you open a project in PyCharm which has conflicts, you can go to VCS > Git > Resolve Conflicts and there - you have a very intuitive and helpful merge tool. For more information, see - `Resolve conflicts `_. - -8. After you've solved your conflict run: - - ``git rebase --continue`` - - And go either to point 6. or 7, depending on whether you have more commits that cause conflicts in your PR (rebasing applies each - commit from your PR one-by-one). - -Summary -------------- - -Useful when you understand the flow but don't remember the steps and want a quick reference. - -``git fetch --all`` -``git merge-base my-branch apache/main`` -``git checkout my-branch`` -``git rebase HASH --onto apache/main`` -``git push --force-with-lease`` - -How to communicate -================== - -Apache Airflow is a Community within Apache Software Foundation. As the motto of -the Apache Software Foundation states "Community over Code" - people in the -community are far more important than their contribution. - -This means that communication plays a big role in it, and this chapter is all about it. - -In our communication, everyone is expected to follow the `ASF Code of Conduct `_. - -We have various channels of communication - starting from the official devlist, comments -in the PR, Slack, wiki. - -All those channels can be used for different purposes. -You can join the channels via links at the `Airflow Community page `_ - -* The `Apache Airflow devlist `_ for: - * official communication - * general issues, asking community for opinion - * discussing proposals - * voting -* The `Airflow CWiki `_ for: - * detailed discussions on big proposals (Airflow Improvement Proposals also name AIPs) - * helpful, shared resources (for example Apache Airflow logos - * information that can be reused by others (for example instructions on preparing workshops) -* GitHub `Pull Requests (PRs) `_ for: - * discussing implementation details of PRs - * not for architectural discussions (use the devlist for that) -* The deprecated `JIRA issues `_ for: - * checking out old but still valuable issues that are not on GitHub yet - * mentioning the JIRA issue number in the title of the related PR you would like to open on GitHub - -**IMPORTANT** -We don't create new issues on JIRA anymore. The reason we still look at JIRA issues is that there are valuable tickets inside of it. However, each new PR should be created on `GitHub issues `_ as stated in `Contribution Workflow Example `_ - -* The `Apache Airflow Slack `_ for: - * ad-hoc questions related to development (#development channel) - * asking for review (#development channel) - * asking for help with first contribution PRs (#development-first-pr-support channel) - * troubleshooting (#troubleshooting channel) - * group talks (including SIG - special interest groups) (#sig-* channels) - * notifications (#announcements channel) - * random queries (#random channel) - * regional announcements (#users-* channels) - * occasional discussions (wherever appropriate including group and 1-1 discussions) - -Please exercise caution against posting same questions across multiple channels. Doing so not only prevents -redundancy but also promotes more efficient and effective communication for everyone involved. - -The devlist is the most important and official communication channel. Often at Apache project you can -hear "if it is not in the devlist - it did not happen". If you discuss and agree with someone from the -community on something important for the community (including if it is with maintainer or PMC member) the -discussion must be captured and reshared on devlist in order to give other members of the community to -participate in it. - -We are using certain prefixes for email subjects for different purposes. Start your email with one of those: - * ``[DISCUSS]`` - if you want to discuss something but you have no concrete proposal yet - * ``[PROPOSAL]`` - if usually after "[DISCUSS]" thread discussion you want to propose something and see - what other members of the community think about it. - * ``[AIP-NN]`` - if the mail is about one of the Airflow Improvement Proposals - * ``[VOTE]`` - if you would like to start voting on a proposal discussed before in a "[PROPOSAL]" thread - -Voting is governed by the rules described in `Voting `_ - -We are all devoting our time for community as individuals who except for being active in Apache Airflow have -families, daily jobs, right for vacation. Sometimes we are in different timezones or simply are -busy with day-to-day duties that our response time might be delayed. For us it's crucial -to remember to respect each other in the project with no formal structure. -There are no managers, departments, most of us is autonomous in our opinions, decisions. -All of it makes Apache Airflow community a great space for open discussion and mutual respect -for various opinions. - -Disagreements are expected, discussions might include strong opinions and contradicting statements. -Sometimes you might get two maintainers asking you to do things differently. This all happened in the past -and will continue to happen. As a community we have some mechanisms to facilitate discussion and come to -a consensus, conclusions or we end up voting to make important decisions. It is important that these -decisions are not treated as personal wins or looses. At the end it's the community that we all care about -and what's good for community, should be accepted even if you have a different opinion. There is a nice -motto that you should follow in case you disagree with community decision "Disagree but engage". Even -if you do not agree with a community decision, you should follow it and embrace (but you are free to -express your opinion that you don't agree with it). - -As a community - we have high requirements for code quality. This is mainly because we are a distributed -and loosely organised team. We have both - contributors that commit one commit only, and people who add -more commits. It happens that some people assume informal "stewardship" over parts of code for some time - -but at any time we should make sure that the code can be taken over by others, without excessive communication. -Setting high requirements for the code (fairly strict code review, static code checks, requirements of -automated tests, pre-commit checks) is the best way to achieve that - by only accepting good quality -code. Thanks to full test coverage we can make sure that we will be able to work with the code in the future. -So do not be surprised if you are asked to add more tests or make the code cleaner - -this is for the sake of maintainability. - -Here are a few rules that are important to keep in mind when you enter our community: - -* Do not be afraid to ask questions -* The communication is asynchronous - do not expect immediate answers, ping others on slack - (#development channel) if blocked -* There is a #newbie-questions channel in slack as a safe place to ask questions -* You can ask one of the maintainers to be a mentor for you, maintainers can guide you within the community -* You can apply to more structured `Apache Mentoring Programme `_ -* It's your responsibility as an author to take your PR from start-to-end including leading communication - in the PR -* It's your responsibility as an author to ping maintainers to review your PR - be mildly annoying sometimes, - it's OK to be slightly annoying with your change - it is also a sign for maintainers that you care -* Be considerate to the high code quality/test coverage requirements for Apache Airflow -* If in doubt - ask the community for their opinion or propose to vote at the devlist -* Discussions should concern subject matters - judge or criticise the merit but never criticise people -* It's OK to express your own emotions while communicating - it helps other people to understand you -* Be considerate for feelings of others. Tell about how you feel not what you think of others - -Commit Policy -============= - -The following commit policy passed by a vote 8(binding FOR) to 0 against on May 27, 2016 on the dev list -and slightly modified and consensus reached in October 2020: - -* Commits need a +1 vote from a committer who is not the author -* Do not merge a PR that regresses linting or does not pass CI tests (unless we have - justification such as clearly transient error). -* When we do AIP voting, both PMC and committer +1s are considered as binding vote. +.. contents:: :local: -Resources & Links -================= -- `Airflow's official documentation `__ +Contributions are welcome and are greatly appreciated! Every little bit helps, and credit will always be given. -- `More resources and links to Airflow related content on the Wiki `__ +Go to `Contributors' guide <./contributing-docs/README.rst>`__. diff --git a/IMAGES.rst b/IMAGES.rst index 9fea6dea08882..89a014f15de0b 100644 --- a/IMAGES.rst +++ b/IMAGES.rst @@ -51,7 +51,7 @@ You can read more details about building, extending and customizing the PROD ima CI image -------- -The CI image is used by `Breeze `_ as the shell image but it is also used during CI tests. +The CI image is used by `Breeze `_ as the shell image but it is also used during CI tests. The image is single segment image that contains Airflow installation with "all" dependencies installed. It is optimised for rebuild speed. It installs PIP dependencies from the current branch first - so that any changes in ``pyproject.toml`` do not trigger reinstalling of all dependencies. @@ -61,7 +61,7 @@ from the latest sources so that we are sure that latest dependencies are install Building docker images from current sources =========================================== -The easy way to build the CI/PROD images is to use ``_. It uses a number of +The easy way to build the CI/PROD images is to use ``_. It uses a number of optimization and caches to build it efficiently and fast when you are developing Airflow and need to update to latest version. @@ -181,11 +181,11 @@ Default mechanism used in Breeze for building CI images uses images pulled from GitHub Container Registry. This is done to speed up local builds and building images for CI runs - instead of > 12 minutes for rebuild of CI images, it takes usually about 1 minute when cache is used. For CI images this is usually the best strategy - to use default "pull" cache. This is default strategy when -``_ builds are performed. +``_ builds are performed. For Production Image - which is far smaller and faster to build, it's better to use local build cache (the standard mechanism that docker uses. This is the default strategy for production images when -``_ builds are performed. The first time you run it, it will take considerably longer time than +``_ builds are performed. The first time you run it, it will take considerably longer time than if you use the pull mechanism, but then when you do small, incremental changes to local sources, Dockerfile image and scripts, further rebuilds with local build cache will be considerably faster. @@ -275,7 +275,7 @@ GitHub Container Registry docker login ghcr.io Since there are different naming conventions used for Airflow images and there are multiple images used, -`Breeze `_ provides easy to use management interface for the images. The +`Breeze `_ provides easy to use management interface for the images. The `CI system of ours `_ is designed in the way that it should automatically refresh caches, rebuild the images periodically and update them whenever new version of base Python is released. However, occasionally, you might need to rebuild images locally and push them directly to the registries @@ -295,7 +295,7 @@ For example this command will run the same Python 3.8 image as was used in build breeze --image-tag 9a621eaa394c0a0a336f8e1b31b35eff4e4ee86e --python 3.8 --integration rabbitmq -You can see more details and examples in `Breeze `_ +You can see more details and examples in `Breeze `_ Customizing the CI image ======================== diff --git a/INSTALL b/INSTALL index ab0ff03ef5178..173442e4857b3 100644 --- a/INSTALL +++ b/INSTALL @@ -242,6 +242,7 @@ The list of available extras is below. Regular extras that are available for users in the Airflow package. # START REGULAR EXTRAS HERE + aiobotocore, airbyte, alibaba, all, all-core, all-dbs, amazon, apache-atlas, apache-beam, apache- cassandra, apache-drill, apache-druid, apache-flink, apache-hdfs, apache-hive, apache-impala, apache-kafka, apache-kylin, apache-livy, apache-pig, apache-pinot, apache-spark, apache-webhdfs, @@ -255,20 +256,25 @@ openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, pandas, pa pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, webhdfs, winrm, yandex, zendesk + # END REGULAR EXTRAS HERE Devel extras - used to install development-related tools. Only available during editable install. # START DEVEL EXTRAS HERE + devel, devel-all, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel- hadoop, devel-mypy, devel-sentry, devel-static-checks, devel-tests + # END DEVEL EXTRAS HERE Doc extras - used to install dependencies that are needed to build documentation. Only available during editable install. # START DOC EXTRAS HERE + doc, doc-gen + # END DOC EXTRAS HERE ## Compiling front end assets diff --git a/README.md b/README.md index 6fbbc5b607b11..56effc061bc84 100644 --- a/README.md +++ b/README.md @@ -67,6 +67,7 @@ Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The - [Base OS support for reference Airflow images](#base-os-support-for-reference-airflow-images) - [Approach to dependencies of Airflow](#approach-to-dependencies-of-airflow) - [Contributing](#contributing) +- [Voting Policy](#voting-policy) - [Who uses Apache Airflow?](#who-uses-apache-airflow) - [Who maintains Apache Airflow?](#who-maintains-apache-airflow) - [What goes into the next release?](#what-goes-into-the-next-release) @@ -426,13 +427,18 @@ might decide to add additional limits (and justify them with comment). ## Contributing -Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst). +Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/contributing-docs/README.rst). Official Docker (container) images for Apache Airflow are described in [IMAGES.rst](https://github.com/apache/airflow/blob/main/IMAGES.rst). +## Voting Policy + +* Commits need a +1 vote from a committer who is not the author +* When we do AIP voting, both PMC member's and committer's `+1s` are considered a binding vote. + ## Who uses Apache Airflow? We know about around 500 organizations that are using Apache Airflow (but there are likely many more) diff --git a/TESTING.rst b/TESTING.rst deleted file mode 100644 index b4c9dbcef0047..0000000000000 --- a/TESTING.rst +++ /dev/null @@ -1,2603 +0,0 @@ - .. Licensed to the Apache Software Foundation (ASF) under one - or more contributor license agreements. See the NOTICE file - distributed with this work for additional information - regarding copyright ownership. The ASF licenses this file - to you under the Apache License, Version 2.0 (the - "License"); you may not use this file except in compliance - with the License. You may obtain a copy of the License at - - .. http://www.apache.org/licenses/LICENSE-2.0 - - .. Unless required by applicable law or agreed to in writing, - software distributed under the License is distributed on an - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - KIND, either express or implied. See the License for the - specific language governing permissions and limitations - under the License. - -.. contents:: :local: - -Airflow Test Infrastructure -=========================== - -* **Unit tests** are Python tests that do not require any additional integrations. - Unit tests are available both in the `Breeze environment `__ - and local virtualenv. - -* **Integration tests** are available in the Breeze development environment - that is also used for Airflow CI tests. Integration tests are special tests that require - additional services running, such as Postgres, MySQL, Kerberos, etc. - -* **System tests** are automatic tests that use external systems like - Google Cloud. These tests are intended for an end-to-end DAG execution. - The tests can be executed on both the current version of Apache Airflow and any older - versions from 1.10.* series. - -This document is about running Python tests. Before the tests are run, use -`static code checks `__ that enable catching typical errors in the code. - -Airflow Unit Tests -================== - -All tests for Apache Airflow are run using `pytest `_ . - -Writing Unit Tests ------------------- - -Follow the guidelines when writing unit tests: - -* For standard unit tests that do not require integrations with external systems, make sure to simulate all communications. -* All Airflow tests are run with ``pytest``. Make sure to set your IDE/runners (see below) to use ``pytest`` by default. -* For new tests, use standard "asserts" of Python and ``pytest`` decorators/context managers for testing - rather than ``unittest`` ones. See `pytest docs `_ for details. -* Use a parameterized framework for tests that have variations in parameters. -* Use with ``pytest.warn`` to capture warnings rather than ``recwarn`` fixture. We are aiming for 0-warning in our - tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that - overrides ``recwarn`` fixture behaviour. - - -Airflow configuration for unit tests ------------------------------------- - -Some of the unit tests require special configuration set as the ``default``. This is done automatically by -adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used -fixture. This in turn makes Airflow load test configuration from the file -``airflow/config_templates/unit_tests.cfg``. Test configuration from there replaces the original -defaults from ``airflow/config_templates/config.yml``. If you want to add some test-only configuration, -as default for all tests you should add the value to this file. - -You can also - of course - override the values in individual test by patching environment variables following -the usual ``AIRFLOW__SECTION__KEY`` pattern or ``conf_vars`` context manager. - -Airflow test types ------------------- - -Airflow tests in the CI environment are split into several test types. You can narrow down which -test types you want to use in various ``breeze testing`` sub-commands in three ways: - -* via specifying the ``--test-type`` when you run single test type in ``breeze testing tests`` command -* via specifying space separating list of test types via ``--paralleltest-types`` or - ``--exclude-parallel-test-types`` options when you run tests in parallel (in several testing commands) - -Those test types are defined: - -* ``Always`` - those are tests that should be always executed (always sub-folder) -* ``API`` - Tests for the Airflow API (api, api_connexion, api_experimental and api_internal sub-folders) -* ``CLI`` - Tests for the Airflow CLI (cli folder) -* ``Core`` - for the core Airflow functionality (core, executors, jobs, models, ti_deps, utils sub-folders) -* ``Operators`` - tests for the operators (operators folder with exception of Virtualenv Operator tests and - External Python Operator tests that have their own test type). They are skipped by the -``virtualenv_operator`` and ``external_python_operator`` test markers that the tests are marked with. -* ``WWW`` - Tests for the Airflow webserver (www folder) -* ``Providers`` - Tests for all Providers of Airflow (providers folder) -* ``PlainAsserts`` - tests that require disabling ``assert-rewrite`` feature of Pytest (usually because - a buggy/complex implementation of an imported library) (``plain_asserts`` marker) -* ``Other`` - all other tests remaining after the above tests are selected - -There are also Virtualenv/ExternalPython operator test types that are excluded from ``Operators`` test type -and run as separate test types. Those are : - -* ``PythonVenv`` - tests for PythonVirtualenvOperator - selected directly as TestPythonVirtualenvOperator -* ``BranchPythonVenv`` - tests for BranchPythonVirtualenvOperator - selected directly as TestBranchPythonVirtualenvOperator -* ``ExternalPython`` - tests for ExternalPythonOperator - selected directly as TestExternalPythonOperator -* ``BranchExternalPython`` - tests for BranchExternalPythonOperator - selected directly as TestBranchExternalPythonOperator - -We have also tests that run "all" tests (so they do not look at the folder, but at the ``pytest`` markers -the tests are marked with to run with some filters applied. - -* ``All-Postgres`` - tests that require Postgres database. They are only run when backend is Postgres (``backend("postgres")`` marker) -* ``All-MySQL`` - tests that require MySQL database. They are only run when backend is MySQL (``backend("mysql")`` marker) -* ``All-Quarantined`` - tests that are flaky and need to be fixed (``quarantined`` marker) -* ``All`` - all tests are run (this is the default) - - -We also have ``Integration`` tests that are running Integration tests with external software that is run -via ``--integration`` flag in ``breeze`` environment - via ``breeze testing integration-tests``. - -* ``Integration`` - tests that require external integration images running in docker-compose - -This is done for three reasons: - -1. in order to selectively run only subset of the test types for some PRs -2. in order to allow efficient parallel test execution of the tests on Self-Hosted runners - -For case 2. We can utilise memory and CPUs available on both CI and local development machines to run -test in parallel, but we cannot use pytest xdist plugin for that - we need to split the tests into test -types and run each test type with their own instance of database and separate container where the tests -in each type are run with exclusive access to their database and each test within test type runs sequentially. -By the nature of those tests - they rely on shared databases - and they update/reset/cleanup data in the -databases while they are executing. - - -DB and non-DB tests -------------------- - -There are two kinds of unit tests in Airflow - DB and non-DB tests. - -Some of the tests of Airflow (around 7000 of them on October 2023) -require a database to connect to in order to run. Those tests store and read data from Airflow DB using -Airflow's core code and it's crucial to run the tests against all real databases that Airflow supports in order -to check if the SQLAlchemy queries are correct and if the database schema is correct. - -Those tests should be marked with ``@pytest.mark.db`` decorator on one of the levels: - -* test method can be marked with ``@pytest.mark.db`` decorator -* test class can be marked with ``@pytest.mark.db`` decorator -* test module can be marked with ``pytestmark = pytest.mark.db`` at the top level of the module - -Airflow's CI runs different test kinds separately. - -For the DB tests, they are run against the multiple databases Airflow support, multiple versions of those -and multiple Python versions it supports. In order to save time for testing not all combinations are -tested but enough various combinations are tested to detect potential problems. - -As of October 2023, Airflow has ~9000 Non-DB tests and around 7000 DB tests. - -Airflow non-DB tests --------------------- - -For the Non-DB tests, they are run once for each tested Python version with ``none`` database backend (which -causes any database access to fail. Those tests are run with ``pytest-xdist`` plugin in parallel which -means that we can efficiently utilised multi-processor machines (including ``self-hosted`` runners with -8 CPUS we have to run the tests with maximum parallelism). - -It's usually straightforward to run those tests in local virtualenv because they do not require any -setup or running database. They also run much faster than DB tests. You can run them with ``pytest`` command -or with ``breeze`` that has all the dependencies needed to run all tests automatically installed. Of course -you can also select just specific test or folder or module for the Pytest to collect/run tests from there, -the example below shows how to run all tests, parallelising them with ``pytest-xdist`` -(by specifying ``tests`` folder): - -.. code-block:: bash - - pytest tests --skip-db-tests -n auto - - -The ``--skip-db-tests`` flag will only run tests that are not marked as DB tests. - - -You can also run ``breeze`` command to run all the tests (they will run in a separate container, -the selected python version and without access to any database). Adding ``--use-xdist`` flag will run all -tests in parallel using ``pytest-xdist`` plugin. - -We have a dedicated, opinionated ``breeze testing non-db-tests`` command as well that runs non-DB tests -(it is also used in CI to run the non-DB tests, where you do not have to specify extra flags for -parallel running and you can run all the Non-DB tests -(or just a subset of them with ``--parallel-test-types`` or ``--exclude-parallel-test-types``) in parallel: - -.. code-block:: bash - - breeze testing non-db-tests - -You can pass ``--parallel-test-type`` list of test types to execute or ``--exclude--parallel-test-types`` -to exclude them from the default set:. - -.. code-block:: bash - - breeze testing non-db-tests --parallel-test-types "Providers API CLI" - - -.. code-block:: bash - - breeze testing non-db-tests --exclude-parallel-test-types "Providers API CLI" - -You can also run the same commands via ``breeze testing tests`` - by adding the necessary flags manually: - -.. code-block:: bash - - breeze testing tests --skip-db-tests --backend none --use-xdist - -Also you can enter interactive shell with ``breeze`` and run tests from there if you want to iterate -with the tests. Source files in ``breeze`` are mounted as volumes so you can modify them locally and -rerun in Breeze as you will (``-n auto`` will parallelize tests using ``pytest-xdist`` plugin): - -.. code-block:: bash - - breeze shell --backend none --python 3.8 - > pytest tests --skip-db-tests -n auto - - -Airflow DB tests ----------------- - -Airflow DB tests require database to run. It can be any of the supported Airflow Databases and they can -be run either using local virtualenv or Breeze - - - -By default, the DB tests will use sqlite and the "airflow.db" database created and populated in the -``${AIRFLOW_HOME}`` folder. You do not need to do anything to get the database created and initialized, -but if you need to clean and restart the db, you can run tests with ``-with-db-init`` flag - then the -database will be re-initialized. You can also set ``AIRFLOW__DATABASE__SQL_ALCHEMY_CONN`` environment -variable to point to supported database (Postgres, MySQL, etc.) and the tests will use that database. You -might need to run ``airflow db reset`` to initialize the database in that case. - -The "non-DB" tests are perfectly fine to run when you have database around but if you want to just run -DB tests (as happens in our CI for the ``Database`` runs) you can use ``--run-db-tests-only`` flag to filter -out non-DB tests (and obviously you can specify not only on the whole ``tests`` directory but on any -folders/files/tests selection, ``pytest`` supports). - -.. code-block:: bash - - pytest tests/ --run-db-tests-only - -You can also run DB tests with ``breeze`` dockerized environment. You can choose backend to use with -``--backend`` flag. The default is ``sqlite`` but you can also use others such as ``postgres`` or ``mysql``. -You can also select backend version and Python version to use. You can specify the ``test-type`` to run - -breeze will list the test types you can run with ``--help`` and provide auto-complete for them. Example -below runs the ``Core`` tests with ``postgres`` backend and ``3.8`` Python version: - -We have a dedicated, opinionated ``breeze testing db-tests`` command as well that runs DB tests -(it is also used in CI to run the DB tests, where you do not have to specify extra flags for -parallel running and you can run all the DB tests -(or just a subset of them with ``--parallel-test-types`` or ``--exclude-parallel-test-types``) in parallel: - -.. code-block:: bash - - breeze testing non-db-tests --backent postgres - -You can pass ``--parallel-test-type`` list of test types to execute or ``--exclude--parallel-test-types`` -to exclude them from the default set:. - -.. code-block:: bash - - breeze testing db-tests --parallel-test-types "Providers API CLI" - - -.. code-block:: bash - - breeze testing db-tests --exclude-parallel-test-types "Providers API CLI" - -You can also run the same commands via ``breeze testing tests`` - by adding the necessary flags manually: - -.. code-block:: bash - - breeze testing tests --run-db-tests-only --backend postgres --run-tests-in-parallel - - -Also - if you want to iterate with the tests you can enter interactive shell and run the tests iteratively - -either by package/module/test or by test type - whatever ``pytest`` supports. - -.. code-block:: bash - - breeze shell --backend postgres --python 3.8 - > pytest tests --run-db-tests-only - -As explained before, you cannot run DB tests in parallel using ``pytest-xdist`` plugin, but ``breeze`` has -support to split all the tests into test-types to run in separate containers and with separate databases -and you can run the tests using ``--run-tests-in-parallel`` flag (which is automatically enabled when -you use ``breeze testing db-tests`` command): - -.. code-block:: bash - - breeze testing tests --run-db-tests-only --backend postgres --python 3.8 --run-tests-in-parallel - - -Best practices for DB tests -=========================== - -Usually when you add new tests you add tests "similar" to the ones that are already there. In most cases, -therefore you do not have to worry about the test type - it will be automatically selected for you by the -fact that the Test Class that you add the tests or the whole module will be marked with ``db_test`` marker. - -You should strive to write "pure" non-db unit tests (i.e. DB tests) but sometimes it's just better to plug-in -the existing framework of DagRuns, Dags, Connections and Variables to use the Database directly rather -than having to mock the DB access for example. It's up to you to decide. - -However, if you choose to write DB tests you have to make sure you add the ``db_test`` marker - either to -the test method, class (with decorator) or whole module (with pytestmark at the top level of the module). - -In most cases when you add tests to existing modules or classes, you follow similar tests so you do not -have to do anything, but in some cases you need to decide if your test should be marked as DB test or -whether it should be changed to not use the database at all. - -If your test accesses the database but is not marked properly the Non-DB test in CI will fail with this message: - -.. code :: - - "Your test accessed the DB but `_AIRFLOW_SKIP_DB_TESTS` is set. - Either make sure your test does not use database or mark your test with `@pytest.mark.db_test`. - -Marking test as DB test ------------------------ - -You can apply the marker on method/function/class level with ``@pytest.mark.db_test`` decorator or -at the module level with ``pytestmark = pytest.mark.db_test`` at the top level of the module. - -It's up to the author to decide whether to mark the test, class, or module as "DB-test" - generally the -less DB tests - the better and if we can clearly separate the parts that are DB from non-DB, we should, -but also it's ok if few tests are marked as DB tests when they are not but they are part of the class -or module that is "mostly-DB". - -Sometimes, when your class can be clearly split to DB and non-DB parts, it's better to split the class -into two separate classes and mark only the DB class as DB test. - -Method level: - -.. code-block:: python - - import pytest - - - @pytest.mark.db_test - def test_add_tagging(self, sentry, task_instance): - ... - -Class level: - - -.. code-block:: python - - import pytest - - - @pytest.mark.db_test - class TestDatabricksHookAsyncAadTokenSpOutside: - ... - -Module level (at the top of the module): - -.. code-block:: python - - import pytest - - from airflow.models.baseoperator import BaseOperator - from airflow.models.dag import DAG - from airflow.ti_deps.dep_context import DepContext - from airflow.ti_deps.deps.task_concurrency_dep import TaskConcurrencyDep - - pytestmark = pytest.mark.db_test - - -How to verify if DB test is correctly classified ------------------------------------------------- - -When you add if you want to see if your DB test is correctly classified, you can run the test or group -of tests with ``--skip-db-tests`` flag. - -You can run the all (or subset of) test types if you want to make sure all ot the problems are fixed - - .. code-block:: bash - - breeze testing tests --skip-db-tests tests/your_test.py - -For the whole test suite you can run: - - .. code-block:: bash - - breeze testing non-db-tests - -For selected test types (example - the tests will run for Providers/API/CLI code only: - - .. code-block:: bash - - breeze testing non-db-tests --parallel-test-types "Providers API CLI" - - -How to make your test not depend on DB --------------------------------------- - -This is tricky and there is no single solution. Sometimes we can mock-out the methods that require -DB access or objects that normally require database. Sometimes we can decide to test just sinle method -of class rather than more complex set of steps. Generally speaking it's good to have as many "pure" -unit tests that require no DB as possible comparing to DB tests. They are usually faster an more -reliable as well. - - -Special cases -------------- - -There are some tricky test cases that require special handling. Here are some of them: - - -Parameterized tests stability -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The parameterized tests require stable order of parameters if they are run via xdist - because the parameterized -tests are distributed among multiple processes and handled separately. In some cases the parameterized tests -have undefined / random order (or parameters are not hashable - for example set of enums). In such cases -the xdist execution of the tests will fail and you will get an error mentioning "Known Limitations of xdist". -You can see details about the limitation `here `_ - -The error in this case will look similar to: - - .. code-block:: - - Different tests were collected between gw0 and gw7. The difference is: - - -The fix for that is to sort the parameters in ``parametrize``. For example instead of this: - - .. code-block:: python - - @pytest.mark.parametrize("status", ALL_STATES) - def test_method(): - ... - - -do that: - - - .. code-block:: python - - @pytest.mark.parametrize("status", sorted(ALL_STATES)) - def test_method(): - ... - -Similarly if your parameters are defined as result of utcnow() or other dynamic method - you should -avoid that, or assign unique IDs for those parametrized tests. Instead of this: - - .. code-block:: python - - @pytest.mark.parametrize( - "url, expected_dag_run_ids", - [ - ( - f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte=" - f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", - [], - ), - ( - f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte=" - f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", - ["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"], - ), - ], - ) - def test_end_date_gte_lte(url, expected_dag_run_ids): - ... - -Do this: - - .. code-block:: python - - @pytest.mark.parametrize( - "url, expected_dag_run_ids", - [ - pytest.param( - f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte=" - f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", - [], - id="end_date_gte", - ), - pytest.param( - f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte=" - f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", - ["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"], - id="end_date_lte", - ), - ], - ) - def test_end_date_gte_lte(url, expected_dag_run_ids): - ... - - - -Problems with Non-DB test collection -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Sometimes, even if whole module is marked as ``@pytest.mark.db_test`` even parsing the file and collecting -tests will fail when ``--skip-db-tests`` is used because some of the imports od objects created in the -module will read the database. - -Usually what helps is to move such initialization code to inside the tests or pytest fixtures (and pass -objects needed by tests as fixtures rather than importing them from the module). Similarly you might -use DB - bound objects (like Connection) in your ``parametrize`` specification - this will also fail pytest -collection. Move creation of such objects to inside the tests: - -Moving object creation from top-level to inside tests. This code will break collection of tests even if -the test is marked as DB test: - - - .. code-block:: python - - pytestmark = pytest.mark.db_test - - TI = TaskInstance( - task=BashOperator(task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now()), - run_id="fake_run", - state=State.RUNNING, - ) - - - class TestCallbackRequest: - @pytest.mark.parametrize( - "input,request_class", - [ - (CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest), - ( - TaskCallbackRequest( - full_filepath="filepath", - simple_task_instance=SimpleTaskInstance.from_ti(ti=TI), - processor_subdir="/test_dir", - is_failure_callback=True, - ), - TaskCallbackRequest, - ), - ( - DagCallbackRequest( - full_filepath="filepath", - dag_id="fake_dag", - run_id="fake_run", - processor_subdir="/test_dir", - is_failure_callback=False, - ), - DagCallbackRequest, - ), - ( - SlaCallbackRequest( - full_filepath="filepath", - dag_id="fake_dag", - processor_subdir="/test_dir", - ), - SlaCallbackRequest, - ), - ], - ) - def test_from_json(self, input, request_class): - ... - - -Instead - this will not break collection. The TaskInstance is not initialized when the module is parsed, -it will only be initialized when the test gets executed because we moved initialization of it from -top level / parametrize to inside the test: - - .. code-block:: python - - pytestmark = pytest.mark.db_test - - - class TestCallbackRequest: - @pytest.mark.parametrize( - "input,request_class", - [ - (CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest), - ( - None, # to be generated when test is run - TaskCallbackRequest, - ), - ( - DagCallbackRequest( - full_filepath="filepath", - dag_id="fake_dag", - run_id="fake_run", - processor_subdir="/test_dir", - is_failure_callback=False, - ), - DagCallbackRequest, - ), - ( - SlaCallbackRequest( - full_filepath="filepath", - dag_id="fake_dag", - processor_subdir="/test_dir", - ), - SlaCallbackRequest, - ), - ], - ) - def test_from_json(self, input, request_class): - if input is None: - ti = TaskInstance( - task=BashOperator( - task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now() - ), - run_id="fake_run", - state=State.RUNNING, - ) - - input = TaskCallbackRequest( - full_filepath="filepath", - simple_task_instance=SimpleTaskInstance.from_ti(ti=ti), - processor_subdir="/test_dir", - is_failure_callback=True, - ) - - -Sometimes it is difficult to rewrite the tests, so you might add conditional handling and mock out some -database-bound methods or objects to avoid hitting the database during test collection. The code below -will hit the Database while parsing the tests, because this is what Variable.setdefault does when -parametrize specification is being parsed - even if test is marked as DB test. - - - .. code-block:: python - - from airflow.models.variable import Variable - - pytestmark = pytest.mark.db_test - - initial_db_init() - - - @pytest.mark.parametrize( - "env, expected", - [ - pytest.param( - {"plain_key": "plain_value"}, - "{'plain_key': 'plain_value'}", - id="env-plain-key-val", - ), - pytest.param( - {"plain_key": Variable.setdefault("plain_var", "banana")}, - "{'plain_key': 'banana'}", - id="env-plain-key-plain-var", - ), - pytest.param( - {"plain_key": Variable.setdefault("secret_var", "monkey")}, - "{'plain_key': '***'}", - id="env-plain-key-sensitive-var", - ), - pytest.param( - {"plain_key": "{{ var.value.plain_var }}"}, - "{'plain_key': '{{ var.value.plain_var }}'}", - id="env-plain-key-plain-tpld-var", - ), - ], - ) - def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): - ... - - -You can make the code conditional and mock out the Variable to avoid hitting the database. - - - .. code-block:: python - - from airflow.models.variable import Variable - - pytestmark = pytest.mark.db_test - - - if os.environ.get("_AIRFLOW_SKIP_DB_TESTS") == "true": - # Handle collection of the test by non-db case - Variable = mock.MagicMock() # type: ignore[misc] # noqa: F811 - else: - initial_db_init() - - - @pytest.mark.parametrize( - "env, expected", - [ - pytest.param( - {"plain_key": "plain_value"}, - "{'plain_key': 'plain_value'}", - id="env-plain-key-val", - ), - pytest.param( - {"plain_key": Variable.setdefault("plain_var", "banana")}, - "{'plain_key': 'banana'}", - id="env-plain-key-plain-var", - ), - pytest.param( - {"plain_key": Variable.setdefault("secret_var", "monkey")}, - "{'plain_key': '***'}", - id="env-plain-key-sensitive-var", - ), - pytest.param( - {"plain_key": "{{ var.value.plain_var }}"}, - "{'plain_key': '{{ var.value.plain_var }}'}", - id="env-plain-key-plain-tpld-var", - ), - ], - ) - def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): - ... - -You can also use fixture to create object that needs database just like this. - - - .. code-block:: python - - from airflow.models import Connection - - pytestmark = pytest.mark.db_test - - - @pytest.fixture() - def get_connection1(): - return Connection() - - - @pytest.fixture() - def get_connection2(): - return Connection(host="apache.org", extra={}) - - - @pytest.mark.parametrize( - "conn", - [ - "get_connection1", - "get_connection2", - ], - ) - def test_as_json_from_connection(self, conn: Connection): - conn = request.getfixturevalue(conn) - ... - - -Running Unit tests -================== - -Running Unit Tests from PyCharm IDE ------------------------------------ - -To run unit tests from the PyCharm IDE, create the `local virtualenv `_, -select it as the default project's environment, then configure your test runner: - -.. image:: images/pycharm/configure_test_runner.png - :align: center - :alt: Configuring test runner - -and run unit tests as follows: - -.. image:: images/pycharm/running_unittests.png - :align: center - :alt: Running unit tests - -**NOTE:** You can run the unit tests in the standalone local virtualenv -(with no Breeze installed) if they do not have dependencies such as -Postgres/MySQL/Hadoop/etc. - -Running Unit Tests from PyCharm IDE using Breeze ------------------------------------------------- - -Ideally, all unit tests should be run using the standardized Breeze environment. While not -as convenient as the one-click "play button" in PyCharm, the IDE can be configured to do -this in two clicks. - -1. Add Breeze as an "External Tool": - - a. From the settings menu, navigate to Tools > External Tools - b. Click the little plus symbol to open the "Create Tool" popup and fill it out: - -.. image:: images/pycharm/pycharm_create_tool.png - :align: center - :alt: Installing Python extension - -2. Add the tool to the context menu: - - a. From the settings menu, navigate to Appearance & Behavior > Menus & Toolbars > Project View Popup Menu - b. Click on the list of entries where you would like it to be added. Right above or below "Project View Popup Menu Run Group" may be a good choice, you can drag and drop this list to rearrange the placement later as desired. - c. Click the little plus at the top of the popup window - d. Find your "External Tool" in the new "Choose Actions to Add" popup and click OK. If you followed the image above, it will be at External Tools > External Tools > Breeze - -**Note:** That only adds the option to that one menu. If you would like to add it to the context menu -when right-clicking on a tab at the top of the editor, for example, follow the steps above again -and place it in the "Editor Tab Popup Menu" - -.. image:: images/pycharm/pycharm_add_to_context.png - :align: center - :alt: Installing Python extension - -3. To run tests in Breeze, right click on the file or directory in the Project View and click Breeze. - - -Running Unit Tests from Visual Studio Code ------------------------------------------- - -To run unit tests from the Visual Studio Code: - -1. Using the ``Extensions`` view install Python extension, reload if required - -.. image:: images/vscode_install_python_extension.png - :align: center - :alt: Installing Python extension - -2. Using the ``Testing`` view click on ``Configure Python Tests`` and select ``pytest`` framework - -.. image:: images/vscode_configure_python_tests.png - :align: center - :alt: Configuring Python tests - -.. image:: images/vscode_select_pytest_framework.png - :align: center - :alt: Selecting pytest framework - -3. Open ``/.vscode/settings.json`` and add ``"python.testing.pytestArgs": ["tests"]`` to enable tests discovery - -.. image:: images/vscode_add_pytest_settings.png - :align: center - :alt: Enabling tests discovery - -4. Now you are able to run and debug tests from both the ``Testing`` view and test files - -.. image:: images/vscode_run_tests.png - :align: center - :alt: Running tests - -Running Unit Tests in local virtualenv --------------------------------------- - -To run unit, integration, and system tests from the Breeze and your -virtualenv, you can use the `pytest `_ framework. - -Custom ``pytest`` plugin runs ``airflow db init`` and ``airflow db reset`` the first -time you launch them. So, you can count on the database being initialized. Currently, -when you run tests not supported **in the local virtualenv, they may either fail -or provide an error message**. - -There are many available options for selecting a specific test in ``pytest``. Details can be found -in the official documentation, but here are a few basic examples: - -.. code-block:: bash - - pytest tests/core -k "TestCore and not check" - -This runs the ``TestCore`` class but skips tests of this class that include 'check' in their names. -For better performance (due to a test collection), run: - -.. code-block:: bash - - pytest tests/core/test_core.py -k "TestCore and not bash" - -This flag is useful when used to run a single test like this: - -.. code-block:: bash - - pytest tests/core/test_core.py -k "test_check_operators" - -This can also be done by specifying a full path to the test: - -.. code-block:: bash - - pytest tests/core/test_core.py::TestCore::test_dag_params_and_task_params - -To run the whole test class, enter: - -.. code-block:: bash - - pytest tests/core/test_core.py::TestCore - -You can use all available ``pytest`` flags. For example, to increase a log level -for debugging purposes, enter: - -.. code-block:: bash - - pytest --log-cli-level=DEBUG tests/core/test_core.py::TestCore - - -Running Tests using Breeze interactive shell --------------------------------------------- - -You can run tests interactively using regular pytest commands inside the Breeze shell. This has the -advantage, that Breeze container has all the dependencies installed that are needed to run the tests -and it will ask you to rebuild the image if it is needed and some new dependencies should be installed. - -By using interactive shell and iterating over the tests, you can iterate and re-run tests one-by-one -or group by group right after you modified them. - -Entering the shell is as easy as: - -.. code-block:: bash - - breeze - -This should drop you into the container. - -You can also use other switches (like ``--backend`` for example) to configure the environment for your -tests (and for example to switch to different database backend - see ``--help`` for more details). - -Once you enter the container, you might run regular pytest commands. For example: - -.. code-block:: bash - - pytest --log-cli-level=DEBUG tests/core/test_core.py::TestCore - - -Running Tests using Breeze from the Host ----------------------------------------- - -If you wish to only run tests and not to drop into the shell, apply the -``tests`` command. You can add extra targets and pytest flags after the ``--`` command. Note that -often you want to run the tests with a clean/reset db, so usually you want to add ``--db-reset`` flag -to breeze command. The Breeze image usually will have all the dependencies needed and it -will ask you to rebuild the image if it is needed and some new dependencies should be installed. - -.. code-block:: bash - - breeze testing tests tests/providers/http/hooks/test_http.py tests/core/test_core.py --db-reset --log-cli-level=DEBUG - -You can run the whole test suite without adding the test target: - -.. code-block:: bash - - breeze testing tests --db-reset - -You can also specify individual tests or a group of tests: - -.. code-block:: bash - - breeze testing tests --db-reset tests/core/test_core.py::TestCore - -You can also limit the tests to execute to specific group of tests - -.. code-block:: bash - - breeze testing tests --test-type Core - -In case of Providers tests, you can run tests for all providers - -.. code-block:: bash - - breeze testing tests --test-type Providers - -You can limit the set of providers you would like to run tests of - -.. code-block:: bash - - breeze testing tests --test-type "Providers[airbyte,http]" - -You can also run all providers but exclude the providers you would like to skip - -.. code-block:: bash - - breeze testing tests --test-type "Providers[-amazon,google]" - - -Inspecting docker compose after test commands ---------------------------------------------- - -Sometimes you need to inspect docker compose after tests command complete, -for example when test environment could not be properly set due to -failed healthchecks. This can be achieved with ``--skip-docker-compose-down`` -flag: - -.. code-block:: bash - - breeze testing tests --skip--docker-compose-down - - -Running full Airflow unit test suite in parallel ------------------------------------------------- - -If you run ``breeze testing tests --run-in-parallel`` tests run in parallel -on your development machine - maxing out the number of parallel runs at the number of cores you -have available in your Docker engine. - -In case you do not have enough memory available to your Docker (8 GB), the ``Integration``. ``Provider`` -and ``Core`` test type are executed sequentially with cleaning the docker setup in-between. This -allows to print - -This allows for massive speedup in full test execution. On 8 CPU machine with 16 cores and 64 GB memory -and fast SSD disk, the whole suite of tests completes in about 5 minutes (!). Same suite of tests takes -more than 30 minutes on the same machine when tests are run sequentially. - -.. note:: - - On MacOS you might have less CPUs and less memory available to run the tests than you have in the host, - simply because your Docker engine runs in a Linux Virtual Machine under-the-hood. If you want to make - use of the parallelism and memory usage for the CI tests you might want to increase the resources available - to your docker engine. See the `Resources `_ chapter - in the ``Docker for Mac`` documentation on how to do it. - -You can also limit the parallelism by specifying the maximum number of parallel jobs via -MAX_PARALLEL_TEST_JOBS variable. If you set it to "1", all the test types will be run sequentially. - -.. code-block:: bash - - MAX_PARALLEL_TEST_JOBS="1" ./scripts/ci/testing/ci_run_airflow_testing.sh - -.. note:: - - In case you would like to cleanup after execution of such tests you might have to cleanup - some of the docker containers running in case you use ctrl-c to stop execution. You can easily do it by - running this command (it will kill all docker containers running so do not use it if you want to keep some - docker containers running): - - .. code-block:: bash - - docker kill $(docker ps -q) - -Running Backend-Specific Tests ------------------------------- - -Tests that are using a specific backend are marked with a custom pytest marker ``pytest.mark.backend``. -The marker has a single parameter - the name of a backend. It corresponds to the ``--backend`` switch of -the Breeze environment (one of ``mysql``, ``sqlite``, or ``postgres``). Backend-specific tests only run when -the Breeze environment is running with the right backend. If you specify more than one backend -in the marker, the test runs for all specified backends. - -Example of the ``postgres`` only test: - -.. code-block:: python - - @pytest.mark.backend("postgres") - def test_copy_expert(self): - ... - - -Example of the ``postgres,mysql`` test (they are skipped with the ``sqlite`` backend): - -.. code-block:: python - - @pytest.mark.backend("postgres", "mysql") - def test_celery_executor(self): - ... - - -You can use the custom ``--backend`` switch in pytest to only run tests specific for that backend. -Here is an example of running only postgres-specific backend tests: - -.. code-block:: bash - - pytest --backend postgres - -Running Long-running tests --------------------------- - -Some of the tests rung for a long time. Such tests are marked with ``@pytest.mark.long_running`` annotation. -Those tests are skipped by default. You can enable them with ``--include-long-running`` flag. You -can also decide to only run tests with ``-m long-running`` flags to run only those tests. - -Running Quarantined tests -------------------------- - -Some of our tests are quarantined. This means that this test will be run in isolation and that it will be -re-run several times. Also when quarantined tests fail, the whole test suite will not fail. The quarantined -tests are usually flaky tests that need some attention and fix. - -Those tests are marked with ``@pytest.mark.quarantined`` annotation. -Those tests are skipped by default. You can enable them with ``--include-quarantined`` flag. You -can also decide to only run tests with ``-m quarantined`` flag to run only those tests. - -Running Tests with provider packages ------------------------------------- - -Airflow 2.0 introduced the concept of splitting the monolithic Airflow package into separate -providers packages. The main "apache-airflow" package contains the bare Airflow implementation, -and additionally we have 70+ providers that we can install additionally to get integrations with -external services. Those providers live in the same monorepo as Airflow, but we build separate -packages for them and the main "apache-airflow" package does not contain the providers. - -Most of the development in Breeze happens by iterating on sources and when you run -your tests during development, you usually do not want to build packages and install them separately. -Therefore by default, when you enter Breeze airflow and all providers are available directly from -sources rather than installed from packages. This is for example to test the "provider discovery" -mechanism available that reads provider information from the package meta-data. - -When Airflow is run from sources, the metadata is read from provider.yaml -files, but when Airflow is installed from packages, it is read via the package entrypoint -``apache_airflow_provider``. - -By default, all packages are prepared in wheel format. To install Airflow from packages you -need to run the following steps: - -1. Prepare provider packages - -.. code-block:: bash - - breeze release-management prepare-provider-packages [PACKAGE ...] - -If you run this command without packages, you will prepare all packages. However, You can specify -providers that you would like to build if you just want to build few provider packages. -The packages are prepared in ``dist`` folder. Note that this command cleans up the ``dist`` folder -before running, so you should run it before generating ``apache-airflow`` package. - -2. Prepare airflow packages - -.. code-block:: bash - - breeze release-management prepare-airflow-package - -This prepares airflow .whl package in the dist folder. - -3. Enter breeze installing both airflow and providers from the dist packages - -.. code-block:: bash - - breeze --use-airflow-version wheel --use-packages-from-dist --mount-sources skip - -Airflow Docker Compose Tests -============================ - -Running Docker Compose Tests with Breeze ----------------------------------------- - -We also test in CI whether the Docker Compose that we expose in our documentation via -`Running Airflow in Docker `_ -works as expected. Those tests are run in CI ("Test docker-compose quick start") -and you can run them locally as well. - -The way the tests work: - -1. They first build the Airflow production image -2. Then they take the Docker Compose file of ours and use the image to start it -3. Then they perform some simple DAG trigger tests which checks whether Airflow is up and can process - an example DAG - -This is done in a local environment, not in the Breeze CI image. It uses ``COMPOSE_PROJECT_NAME`` set to -``quick-start`` to avoid conflicts with other docker compose deployments you might have. - -The complete test can be performed using Breeze. The prerequisite to that -is to have ``docker-compose`` (Docker Compose v1) or ``docker compose`` plugin (Docker Compose v2) -available on the path. - -Running complete test with breeze: - -.. code-block:: bash - - breeze prod-image build --python 3.8 - breeze testing docker-compose-tests - -In case the test fails, it will dump the logs from the running containers to the console and it -will shutdown the Docker Compose deployment. In case you want to debug the Docker Compose deployment -created for the test, you can pass ``--skip-docker-compose-deletion`` flag to Breeze or -export ``SKIP_DOCKER_COMPOSE_DELETION`` set to "true" variable and the deployment -will not be deleted after the test. - -You can also specify maximum timeout for the containers with ``--wait-for-containers-timeout`` flag. -You can also add ``-s`` option to the command pass it to underlying pytest command -to see the output of the test as it happens (it can be also set via -``WAIT_FOR_CONTAINERS_TIMEOUT`` environment variable) - -The test can be also run manually with ``pytest docker_tests/test_docker_compose_quick_start.py`` -command, provided that you have a local airflow venv with ``dev`` extra set and the -``DOCKER_IMAGE`` environment variable is set to the image you want to test. The variable defaults -to ``ghcr.io/apache/airflow/main/prod/python3.8:latest`` which is built by default -when you run ``breeze prod-image build --python 3.8``. also the switches ``--skip-docker-compose-deletion`` -and ``--wait-for-containers-timeout`` can only be passed via environment variables. - -If you want to debug the deployment using ``docker compose`` commands after ``SKIP_DOCKER_COMPOSE_DELETION`` -was used, you should set ``COMPOSE_PROJECT_NAME`` to ``quick-start`` because this is what the test uses: - -.. code-block:: bash - - export COMPOSE_PROJECT_NAME=quick-start - -You can also add ``--project-name quick-start`` to the ``docker compose`` commands you run. -When the test will be re-run it will automatically stop previous deployment and start a new one. - -Running Docker Compose deployment manually ------------------------------------------- - -You can also (independently of Pytest test) run docker-compose deployment manually with the image you built using -the prod image build command above. - -.. code-block:: bash - - export AIRFLOW_IMAGE_NAME=ghcr.io/apache/airflow/main/prod/python3.8:latest - -and follow the instructions in the -`Running Airflow in Docker `_ -but make sure to use the docker-compose file from the sources in -``docs/apache-airflow/stable/howto/docker-compose/`` folder. - -Then, the usual ``docker compose`` and ``docker`` commands can be used to debug such running instances. -The test performs a simple API call to trigger a DAG and wait for it, but you can follow our -documentation to connect to such running docker compose instances and test it manually. - -Airflow Integration Tests -========================= - -Some of the tests in Airflow are integration tests. These tests require ``airflow`` Docker -image and extra images with integrations (such as ``celery``, ``mongodb``, etc.). -The integration tests are all stored in the ``tests/integration`` folder. - -Enabling Integrations ---------------------- - -Airflow integration tests cannot be run in the local virtualenv. They can only run in the Breeze -environment with enabled integrations and in the CI. See `CI `_ for details about Airflow CI. - -When you are in the Breeze environment, by default, all integrations are disabled. This enables only true unit tests -to be executed in Breeze. You can enable the integration by passing the ``--integration `` -switch when starting Breeze. You can specify multiple integrations by repeating the ``--integration`` switch -or using the ``--integration all-testable`` switch that enables all testable integrations and -``--integration all`` switch that enables all integrations. - -NOTE: Every integration requires a separate container with the corresponding integration image. -These containers take precious resources on your PC, mainly the memory. The started integrations are not stopped -until you stop the Breeze environment with the ``stop`` command and started with the ``start`` command. - -The following integrations are available: - -.. BEGIN AUTO-GENERATED INTEGRATION LIST - -+--------------+----------------------------------------------------+ -| Identifier | Description | -+==============+====================================================+ -| cassandra | Integration required for Cassandra hooks. | -+--------------+----------------------------------------------------+ -| celery | Integration required for Celery executor tests. | -+--------------+----------------------------------------------------+ -| kafka | Integration required for Kafka hooks. | -+--------------+----------------------------------------------------+ -| kerberos | Integration that provides Kerberos authentication. | -+--------------+----------------------------------------------------+ -| mongo | Integration required for MongoDB hooks. | -+--------------+----------------------------------------------------+ -| openlineage | Integration required for Openlineage hooks. | -+--------------+----------------------------------------------------+ -| otel | Integration required for OTEL/opentelemetry hooks. | -+--------------+----------------------------------------------------+ -| pinot | Integration required for Apache Pinot hooks. | -+--------------+----------------------------------------------------+ -| statsd | Integration required for Satsd hooks. | -+--------------+----------------------------------------------------+ -| trino | Integration required for Trino hooks. | -+--------------+----------------------------------------------------+ - -.. END AUTO-GENERATED INTEGRATION LIST' - -To start the ``mongo`` integration only, enter: - -.. code-block:: bash - - breeze --integration mongo - -To start ``mongo`` and ``cassandra`` integrations, enter: - -.. code-block:: bash - - breeze --integration mongo --integration cassandra - -To start all testable integrations, enter: - -.. code-block:: bash - - breeze --integration all-testable - -To start all integrations, enter: - -.. code-block:: bash - - breeze --integration all-testable - -Note that Kerberos is a special kind of integration. Some tests run differently when -Kerberos integration is enabled (they retrieve and use a Kerberos authentication token) and differently when the -Kerberos integration is disabled (they neither retrieve nor use the token). Therefore, one of the test jobs -for the CI system should run all tests with the Kerberos integration enabled to test both scenarios. - -Running Integration Tests -------------------------- - -All tests using an integration are marked with a custom pytest marker ``pytest.mark.integration``. -The marker has a single parameter - the name of integration. - -Example of the ``celery`` integration test: - -.. code-block:: python - - @pytest.mark.integration("celery") - def test_real_ping(self): - hook = RedisHook(redis_conn_id="redis_default") - redis = hook.get_conn() - - assert redis.ping(), "Connection to Redis with PING works." - -The markers can be specified at the test level or the class level (then all tests in this class -require an integration). You can add multiple markers with different integrations for tests that -require more than one integration. - -If such a marked test does not have a required integration enabled, it is skipped. -The skip message clearly says what is needed to use the test. - -To run all tests with a certain integration, use the custom pytest flag ``--integration``. -You can pass several integration flags if you want to enable several integrations at once. - -**NOTE:** If an integration is not enabled in Breeze or CI, -the affected test will be skipped. - -To run only ``mongo`` integration tests: - -.. code-block:: bash - - pytest --integration mongo tests/integration - -To run integration tests for ``mongo`` and ``celery``: - -.. code-block:: bash - - pytest --integration mongo --integration celery tests/integration - - -Here is an example of the collection limited to the ``providers/apache`` sub-directory: - -.. code-block:: bash - - pytest --integration cassandra tests/integrations/providers/apache - -Running Integration Tests from the Host ---------------------------------------- - -You can also run integration tests using Breeze from the host. - -Runs all integration tests: - - .. code-block:: bash - - breeze testing integration-tests --db-reset --integration all-testable - -Runs all mongo DB tests: - - .. code-block:: bash - - breeze testing integration-tests --db-reset --integration mongo - -Helm Unit Tests -=============== - -On the Airflow Project, we have decided to stick with pythonic testing for our Helm chart. This makes our chart -easier to test, easier to modify, and able to run with the same testing infrastructure. To add Helm unit tests -add them in ``helm_tests``. - -.. code-block:: python - - class TestBaseChartTest: - ... - -To render the chart create a YAML string with the nested dictionary of options you wish to test. You can then -use our ``render_chart`` function to render the object of interest into a testable Python dictionary. Once the chart -has been rendered, you can use the ``render_k8s_object`` function to create a k8s model object. It simultaneously -ensures that the object created properly conforms to the expected resource spec and allows you to use object values -instead of nested dictionaries. - -Example test here: - -.. code-block:: python - - from tests.charts.common.helm_template_generator import render_chart, render_k8s_object - - git_sync_basic = """ - dags: - gitSync: - enabled: true - """ - - - class TestGitSyncScheduler: - def test_basic(self): - helm_settings = yaml.safe_load(git_sync_basic) - res = render_chart( - "GIT-SYNC", - helm_settings, - show_only=["templates/scheduler/scheduler-deployment.yaml"], - ) - dep: k8s.V1Deployment = render_k8s_object(res[0], k8s.V1Deployment) - assert "dags" == dep.spec.template.spec.volumes[1].name - - -To execute all Helm tests using breeze command and utilize parallel pytest tests, you can run the -following command (but it takes quite a long time even in a multi-processor machine). - -.. code-block:: bash - - breeze testing helm-tests - -You can also execute tests from a selected package only. Tests in ``tests/chart`` are grouped by packages -so rather than running all tests, you can run only tests from a selected package. For example: - -.. code-block:: bash - - breeze testing helm-tests --helm-test-package basic - -Will run all tests from ``tests-charts/basic`` package. - - -You can also run Helm tests individually via the usual ``breeze`` command. Just enter breeze and run the -tests with pytest as you would do with regular unit tests (you can add ``-n auto`` command to run Helm -tests in parallel - unlike most of the regular unit tests of ours that require a database, the Helm tests are -perfectly safe to be run in parallel (and if you have multiple processors, you can gain significant -speedups when using parallel runs): - -.. code-block:: bash - - breeze - -This enters breeze container. - -.. code-block:: bash - - pytest helm_tests -n auto - -This runs all chart tests using all processors you have available. - -.. code-block:: bash - - pytest helm_tests/test_airflow_common.py -n auto - -This will run all tests from ``tests_airflow_common.py`` file using all processors you have available. - -.. code-block:: bash - - pytest helm_tests/test_airflow_common.py - -This will run all tests from ``tests_airflow_common.py`` file sequentially. - - -Kubernetes tests -================ - -Airflow has tests that are run against real Kubernetes cluster. We are using -`Kind `_ to create and run the cluster. We integrated the tools to start/stop/ -deploy and run the cluster tests in our repository and into Breeze development environment. - -KinD has a really nice ``kind`` tool that you can use to interact with the cluster. Run ``kind --help`` to -learn more. - -K8S test environment ------------------------- - -Before running ``breeze k8s`` cluster commands you need to setup the environment. This is done -by ``breeze k8s setup-env`` command. Breeze in this command makes sure to download tools that -are needed to run k8s tests: Helm, Kind, Kubectl in the right versions and sets up a -Python virtualenv that is needed to run the tests. All those tools and env are setup in -``.build/.k8s-env`` folder. You can activate this environment yourselves as usual by sourcing -``bin/activate`` script, but since we are supporting multiple clusters in the same installation -it is best if you use ``breeze k8s shell`` with the right parameters specifying which cluster -to use. - -Multiple cluster support ------------------------- - -The main feature of ``breeze k8s`` command is that it allows you to manage multiple KinD clusters - one -per each combination of Python and Kubernetes version. This is used during CI where we can run same -tests against those different clusters - even in parallel. - -The cluster name follows the pattern ``airflow-python-X.Y-vA.B.C`` where X.Y is a major/minor Python version -and A.B.C is Kubernetes version. Example cluster name: ``airflow-python-3.8-v1.24.0`` - -Most of the commands can be executed in parallel for multiple images/clusters by adding ``--run-in-parallel`` -to create clusters or deploy airflow. Similarly checking for status, dumping logs and deleting clusters -can be run with ``--all`` flag and they will be executed sequentially for all locally created clusters. - -Per-cluster configuration files -------------------------------- - -Once you start the cluster, the configuration for it is stored in a dynamically created folder - separate -folder for each python/kubernetes_version combination. The folder is ``./build/.k8s-clusters/`` - -There are two files there: - -* kubectl config file stored in .kubeconfig file - our scripts set the ``KUBECONFIG`` variable to it -* KinD cluster configuration in .kindconfig.yml file - our scripts set the ``KINDCONFIG`` variable to it - -The ``KUBECONFIG`` file is automatically used when you enter any of the ``breeze k8s`` commands that use -``kubectl`` or when you run ``kubectl`` in the k8s shell. The ``KINDCONFIG`` file is used when cluster is -started but You and the ``k8s`` command can inspect it to know for example what port is forwarded to the -webserver running in the cluster. - -The files are deleted by ``breeze k8s delete-cluster`` command. - -Managing Kubernetes Cluster ---------------------------- - -For your testing, you manage Kind cluster with ``k8s`` breeze command group. Those commands allow to -created: - -.. image:: ./images/breeze/output_k8s.svg - :width: 100% - :alt: Breeze k8s - -The command group allows you to setup environment, start/stop/recreate/status Kind Kubernetes cluster, -configure cluster (via ``create-cluster``, ``configure-cluster`` command). Those commands can be run with -``--run-in-parallel`` flag for all/selected clusters and they can be executed in parallel. - -In order to deploy Airflow, the PROD image of Airflow need to be extended and example dags and POD -template files should be added to the image. This is done via ``build-k8s-image``, ``upload-k8s-image``. -This can also be done for all/selected images/clusters in parallel via ``--run-in-parallel`` flag. - -Then Airflow (by using Helm Chart) can be deployed to the cluster via ``deploy-airflow`` command. -This can also be done for all/selected images/clusters in parallel via ``--run-in-parallel`` flag. You can -pass extra options when deploying airflow to configure your depliyment. - -You can check the status, dump logs and finally delete cluster via ``status``, ``logs``, ``delete-cluster`` -commands. This can also be done for all created clusters in parallel via ``--all`` flag. - -You can interact with the cluster (via ``shell`` and ``k9s`` commands). - -You can run set of k8s tests via ``tests`` command. You can also run tests in parallel on all/selected -clusters by ``--run-in-parallel`` flag. - - -Running tests with Kubernetes Cluster -------------------------------------- - -You can either run all tests or you can select which tests to run. You can also enter interactive virtualenv -to run the tests manually one by one. - - -Running Kubernetes tests via breeze: - -.. code-block:: bash - - breeze k8s tests - breeze k8s tests TEST TEST [TEST ...] - -Optionally add ``--executor``: - -.. code-block:: bash - - breeze k8s tests --executor CeleryExecutor - breeze k8s tests --executor CeleryExecutor TEST TEST [TEST ...] - -Entering shell with Kubernetes Cluster --------------------------------------- - -This shell is prepared to run Kubernetes tests interactively. It has ``kubectl`` and ``kind`` cli tools -available in the path, it has also activated virtualenv environment that allows you to run tests via pytest. - -The virtualenv is available in ./.build/.k8s-env/ -The binaries are available in ``.build/.k8s-env/bin`` path. - -.. code-block:: bash - - breeze k8s shell - -Optionally add ``--executor``: - -.. code-block:: bash - - breeze k8s shell --executor CeleryExecutor - - -K9s CLI - debug Kubernetes in style! ------------------------------------- - -Breeze has built-in integration with fantastic k9s CLI tool, that allows you to debug the Kubernetes -installation effortlessly and in style. K9S provides terminal (but windowed) CLI that helps you to: - -- easily observe what's going on in the Kubernetes cluster -- observe the resources defined (pods, secrets, custom resource definitions) -- enter shell for the Pods/Containers running, -- see the log files and more. - -You can read more about k9s at `https://k9scli.io/ `_ - -Here is the screenshot of k9s tools in operation: - -.. image:: images/testing/k9s.png - :align: center - :alt: K9S tool - - -You can enter the k9s tool via breeze (after you deployed Airflow): - -.. code-block:: bash - - breeze k8s k9s - -You can exit k9s by pressing Ctrl-C. - -Typical testing pattern for Kubernetes tests --------------------------------------------- - -The typical session for tests with Kubernetes looks like follows: - - -1. Prepare the environment: - -.. code-block:: bash - - breeze k8s setup-env - -The first time you run it, it should result in creating the virtualenv and installing good versions -of kind, kubectl and helm. All of them are installed in ``./build/.k8s-env`` (binaries available in ``bin`` -sub-folder of it). - -.. code-block:: text - - Initializing K8S virtualenv in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env - Reinstalling PIP version in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env - Installing necessary packages in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env - The ``kind`` tool is not downloaded yet. Downloading 0.14.0 version. - Downloading from: https://github.com/kubernetes-sigs/kind/releases/download/v0.14.0/kind-darwin-arm64 - The ``kubectl`` tool is not downloaded yet. Downloading 1.24.3 version. - Downloading from: https://storage.googleapis.com/kubernetes-release/release/v1.24.3/bin/darwin/arm64/kubectl - The ``helm`` tool is not downloaded yet. Downloading 3.9.2 version. - Downloading from: https://get.helm.sh/helm-v3.9.2-darwin-arm64.tar.gz - Extracting the darwin-arm64/helm to /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin - Moving the helm to /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin/helm - - -This prepares the virtual environment for tests and downloads the right versions of the tools -to ``./build/.k8s-env`` - -2. Create the KinD cluster: - -.. code-block:: bash - - breeze k8s create-cluster - -Should result in KinD creating the K8S cluster. - -.. code-block:: text - - Config created in /Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kindconfig.yaml: - - # Licensed to the Apache Software Foundation (ASF) under one - # or more contributor license agreements. See the NOTICE file - # distributed with this work for additional information - # regarding copyright ownership. The ASF licenses this file - # to you under the Apache License, Version 2.0 (the - # "License"); you may not use this file except in compliance - # with the License. You may obtain a copy of the License at - # - # http://www.apache.org/licenses/LICENSE-2.0 - # - # Unless required by applicable law or agreed to in writing, - # software distributed under the License is distributed on an - # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - # KIND, either express or implied. See the License for the - # specific language governing permissions and limitations - # under the License. - --- - kind: Cluster - apiVersion: kind.x-k8s.io/v1alpha4 - networking: - ipFamily: ipv4 - apiServerAddress: "127.0.0.1" - apiServerPort: 48366 - nodes: - - role: control-plane - - role: worker - extraPortMappings: - - containerPort: 30007 - hostPort: 18150 - listenAddress: "127.0.0.1" - protocol: TCP - - - - Creating cluster "airflow-python-3.8-v1.24.2" ... - ✓ Ensuring node image (kindest/node:v1.24.2) 🖼 - ✓ Preparing nodes 📦 📦 - ✓ Writing configuration 📜 - ✓ Starting control-plane 🕹️ - ✓ Installing CNI 🔌 - ✓ Installing StorageClass 💾 - ✓ Joining worker nodes 🚜 - Set kubectl context to "kind-airflow-python-3.8-v1.24.2" - You can now use your cluster with: - - kubectl cluster-info --context kind-airflow-python-3.8-v1.24.2 - - Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/ - - KinD Cluster API server URL: http://localhost:48366 - Connecting to localhost:18150. Num try: 1 - Error when connecting to localhost:18150 : ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) - - Airflow webserver is not available at port 18150. Run `breeze k8s deploy-airflow --python 3.8 --kubernetes-version v1.24.2` to (re)deploy airflow - - KinD cluster airflow-python-3.8-v1.24.2 created! - - NEXT STEP: You might now configure your cluster by: - - breeze k8s configure-cluster - -3. Configure cluster for Airflow - this will recreate namespace and upload test resources for Airflow. - -.. code-block:: bash - - breeze k8s configure-cluster - -.. code-block:: text - - Configuring airflow-python-3.8-v1.24.2 to be ready for Airflow deployment - Deleting K8S namespaces for kind-airflow-python-3.8-v1.24.2 - Error from server (NotFound): namespaces "airflow" not found - Error from server (NotFound): namespaces "test-namespace" not found - Creating namespaces - namespace/airflow created - namespace/test-namespace created - Created K8S namespaces for cluster kind-airflow-python-3.8-v1.24.2 - - Deploying test resources for cluster kind-airflow-python-3.8-v1.24.2 - persistentvolume/test-volume created - persistentvolumeclaim/test-volume created - service/airflow-webserver-node-port created - Deployed test resources for cluster kind-airflow-python-3.8-v1.24.2 - - - NEXT STEP: You might now build your k8s image by: - - breeze k8s build-k8s-image - -4. Check the status of the cluster - -.. code-block:: bash - - breeze k8s status - -Should show the status of current KinD cluster. - -.. code-block:: text - - ======================================================================================================================== - Cluster: airflow-python-3.8-v1.24.2 - - * KUBECONFIG=/Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kubeconfig - * KINDCONFIG=/Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kindconfig.yaml - - Cluster info: airflow-python-3.8-v1.24.2 - - Kubernetes control plane is running at https://127.0.0.1:48366 - CoreDNS is running at https://127.0.0.1:48366/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy - - To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. - - Storage class for airflow-python-3.8-v1.24.2 - - NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE - standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 83s - - Running pods for airflow-python-3.8-v1.24.2 - - NAME READY STATUS RESTARTS AGE - coredns-6d4b75cb6d-rwp9d 1/1 Running 0 71s - coredns-6d4b75cb6d-vqnrc 1/1 Running 0 71s - etcd-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s - kindnet-ckc8l 1/1 Running 0 69s - kindnet-qqt8k 1/1 Running 0 71s - kube-apiserver-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s - kube-controller-manager-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s - kube-proxy-6g7hn 1/1 Running 0 69s - kube-proxy-dwfvp 1/1 Running 0 71s - kube-scheduler-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s - - KinD Cluster API server URL: http://localhost:48366 - Connecting to localhost:18150. Num try: 1 - Error when connecting to localhost:18150 : ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) - - Airflow webserver is not available at port 18150. Run `breeze k8s deploy-airflow --python 3.8 --kubernetes-version v1.24.2` to (re)deploy airflow - - - Cluster healthy: airflow-python-3.8-v1.24.2 - -5. Build the image base on PROD Airflow image. You need to build the PROD image first (the command will - guide you if you did not - either by running the build separately or passing ``--rebuild-base-image`` flag - -.. code-block:: bash - - breeze k8s build-k8s-image - -.. code-block:: text - - Building the K8S image for Python 3.8 using airflow base image: ghcr.io/apache/airflow/main/prod/python3.8:latest - - [+] Building 0.1s (8/8) FINISHED - => [internal] load build definition from Dockerfile 0.0s - => => transferring dockerfile: 301B 0.0s - => [internal] load .dockerignore 0.0s - => => transferring context: 35B 0.0s - => [internal] load metadata for ghcr.io/apache/airflow/main/prod/python3.8:latest 0.0s - => [1/3] FROM ghcr.io/apache/airflow/main/prod/python3.8:latest 0.0s - => [internal] load build context 0.0s - => => transferring context: 3.00kB 0.0s - => CACHED [2/3] COPY airflow/example_dags/ /opt/airflow/dags/ 0.0s - => CACHED [3/3] COPY airflow/kubernetes_executor_templates/ /opt/airflow/pod_templates/ 0.0s - => exporting to image 0.0s - => => exporting layers 0.0s - => => writing image sha256:c0bdd363c549c3b0731b8e8ce34153d081f239ee2b582355b7b3ffd5394c40bb 0.0s - => => naming to ghcr.io/apache/airflow/main/prod/python3.8-kubernetes:latest - - NEXT STEP: You might now upload your k8s image by: - - breeze k8s upload-k8s-image - - -5. Upload the image to KinD cluster - this uploads your image to make it available for the KinD cluster. - -.. code-block:: bash - - breeze k8s upload-k8s-image - -.. code-block:: text - - K8S Virtualenv is initialized in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env - Good version of kind installed: 0.14.0 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin - Good version of kubectl installed: 1.25.0 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin - Good version of helm installed: 3.9.2 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin - Stable repo is already added - Uploading Airflow image ghcr.io/apache/airflow/main/prod/python3.8-kubernetes to cluster airflow-python-3.8-v1.24.2 - Image: "ghcr.io/apache/airflow/main/prod/python3.8-kubernetes" with ID "sha256:fb6195f7c2c2ad97788a563a3fe9420bf3576c85575378d642cd7985aff97412" not yet present on node "airflow-python-3.8-v1.24.2-worker", loading... - Image: "ghcr.io/apache/airflow/main/prod/python3.8-kubernetes" with ID "sha256:fb6195f7c2c2ad97788a563a3fe9420bf3576c85575378d642cd7985aff97412" not yet present on node "airflow-python-3.8-v1.24.2-control-plane", loading... - - NEXT STEP: You might now deploy airflow by: - - breeze k8s deploy-airflow - - -7. Deploy Airflow to the cluster - this will use Airflow Helm Chart to deploy Airflow to the cluster. - -.. code-block:: bash - - breeze k8s deploy-airflow - -.. code-block:: text - - Deploying Airflow for cluster airflow-python-3.8-v1.24.2 - Deploying kind-airflow-python-3.8-v1.24.2 with airflow Helm Chart. - Copied chart sources to /private/var/folders/v3/gvj4_mw152q556w2rrh7m46w0000gn/T/chart_edu__kir/chart - Deploying Airflow from /private/var/folders/v3/gvj4_mw152q556w2rrh7m46w0000gn/T/chart_edu__kir/chart - NAME: airflow - LAST DEPLOYED: Tue Aug 30 22:57:54 2022 - NAMESPACE: airflow - STATUS: deployed - REVISION: 1 - TEST SUITE: None - NOTES: - Thank you for installing Apache Airflow 2.3.4! - - Your release is named airflow. - You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser: - - Airflow Webserver: kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow - Default Webserver (Airflow UI) Login credentials: - username: admin - password: admin - Default Postgres connection credentials: - username: postgres - password: postgres - port: 5432 - - You can get Fernet Key value by running the following: - - echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode) - - WARNING: - Kubernetes workers task logs may not persist unless you configure log persistence or remote logging! - Logging options can be found at: https://airflow.apache.org/docs/helm-chart/stable/manage-logs.html - (This warning can be ignored if logging is configured with environment variables or secrets backend) - - ########################################################### - # WARNING: You should set a static webserver secret key # - ########################################################### - - You are using a dynamically generated webserver secret key, which can lead to - unnecessary restarts of your Airflow components. - - Information on how to set a static webserver secret key can be found here: - https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key - Deployed kind-airflow-python-3.8-v1.24.2 with airflow Helm Chart. - - Airflow for Python 3.8 and K8S version v1.24.2 has been successfully deployed. - - The KinD cluster name: airflow-python-3.8-v1.24.2 - The kubectl cluster name: kind-airflow-python-3.8-v1.24.2. - - - KinD Cluster API server URL: http://localhost:48366 - Connecting to localhost:18150. Num try: 1 - Established connection to webserver at http://localhost:18150/health and it is healthy. - Airflow Web server URL: http://localhost:18150 (admin/admin) - - NEXT STEP: You might now run tests or interact with airflow via shell (kubectl, pytest etc.) or k9s commands: - - - breeze k8s tests - - breeze k8s shell - - breeze k8s k9s - - -8. Run Kubernetes tests - -Note that the tests are executed in production container not in the CI container. -There is no need for the tests to run inside the Airflow CI container image as they only -communicate with the Kubernetes-run Airflow deployed via the production image. -Those Kubernetes tests require virtualenv to be created locally with airflow installed. -The virtualenv required will be created automatically when the scripts are run. - -8a) You can run all the tests - -.. code-block:: bash - - breeze k8s tests - -.. code-block:: text - - Running tests with kind-airflow-python-3.8-v1.24.2 cluster. - Command to run: pytest kubernetes_tests - ========================================================================================= test session starts ========================================================================================== - platform darwin -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin/python - cachedir: .pytest_cache - rootdir: /Users/jarek/IdeaProjects/airflow/kubernetes_tests - plugins: anyio-3.6.1, instafail-0.4.2, xdist-2.5.0, forked-1.4.0, timeouts-1.2.1, cov-3.0.0 - setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s - collected 55 items - - test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag PASSED [ 1%] - test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure PASSED [ 3%] - test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_already_checked_on_failure PASSED [ 5%] - test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_already_checked_on_success ... - -8b) You can enter an interactive shell to run tests one-by-one - -This enters the virtualenv in ``.build/.k8s-env`` folder: - -.. code-block:: bash - - breeze k8s shell - -Once you enter the environment, you receive this information: - -.. code-block:: text - - Entering interactive k8s shell. - - (kind-airflow-python-3.8-v1.24.2:KubernetesExecutor)> - -In a separate terminal you can open the k9s CLI: - -.. code-block:: bash - - breeze k8s k9s - -Use it to observe what's going on in your cluster. - -9. Debugging in IntelliJ/PyCharm - -It is very easy to running/debug Kubernetes tests with IntelliJ/PyCharm. Unlike the regular tests they are -in ``kubernetes_tests`` folder and if you followed the previous steps and entered the shell using -``breeze k8s shell`` command, you can setup your IDE very easy to run (and debug) your -tests using the standard IntelliJ Run/Debug feature. You just need a few steps: - -9a) Add the virtualenv as interpreter for the project: - -.. image:: images/testing/kubernetes-virtualenv.png - :align: center - :alt: Kubernetes testing virtualenv - -The virtualenv is created in your "Airflow" source directory in the -``.build/.k8s-env`` folder and you have to find ``python`` binary and choose -it when selecting interpreter. - -9b) Choose pytest as test runner: - -.. image:: images/testing/pytest-runner.png - :align: center - :alt: Pytest runner - -9c) Run/Debug tests using standard "Run/Debug" feature of IntelliJ - -.. image:: images/testing/run-test.png - :align: center - :alt: Run/Debug tests - - -NOTE! The first time you run it, it will likely fail with -``kubernetes.config.config_exception.ConfigException``: -``Invalid kube-config file. Expected key current-context in kube-config``. You need to add KUBECONFIG -environment variable copying it from the result of "breeze k8s tests": - -.. code-block:: bash - - echo ${KUBECONFIG} - - /home/jarek/code/airflow/.build/.kube/config - -.. image:: images/testing/kubeconfig-env.png - :align: center - :alt: Run/Debug tests - - -The configuration for Kubernetes is stored in your "Airflow" source directory in ".build/.kube/config" file -and this is where KUBECONFIG env should point to. - -You can iterate with tests while you are in the virtualenv. All the tests requiring Kubernetes cluster -are in "kubernetes_tests" folder. You can add extra ``pytest`` parameters then (for example ``-s`` will -print output generated test logs and print statements to the terminal immediately. You should have -kubernetes_tests as your working directory. - -.. code-block:: bash - - pytest test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure -s - -You can modify the tests or KubernetesPodOperator and re-run them without re-deploying -Airflow to KinD cluster. - -10. Dumping logs - -Sometimes You want to see the logs of the clister. This can be done with ``breeze k8s logs``. - -.. code-block:: bash - - breeze k8s logs - -11. Redeploying airflow - -Sometimes there are side effects from running tests. You can run ``breeze k8s deploy-airflow --upgrade`` -without recreating the whole cluster. - -.. code-block:: bash - - breeze k8s deploy-airflow --upgrade - -If needed you can also delete the cluster manually (within the virtualenv activated by -``breeze k8s shell``: - -.. code-block:: bash - - kind get clusters - kind delete clusters - -Kind has also useful commands to inspect your running cluster: - -.. code-block:: text - - kind --help - -12. Stop KinD cluster when you are done - -.. code-block:: bash - - breeze k8s delete-cluster - -.. code-block:: text - - Deleting KinD cluster airflow-python-3.8-v1.24.2! - Deleting cluster "airflow-python-3.8-v1.24.2" ... - KinD cluster airflow-python-3.8-v1.24.2 deleted! - - -Running complete k8s tests --------------------------- - -You can also run complete k8s tests with - -.. code-block:: bash - - breeze k8s run-complete-tests - -This will create cluster, build images, deploy airflow run tests and finally delete clusters as single -command. It is the way it is run in our CI, you can also run such complete tests in parallel. - -Manually testing release candidate packages -=========================================== - -Breeze can be used to test new release candidates of packages - both Airflow and providers. You can easily -turn the CI image of Breeze to install and start Airflow for both Airflow and provider packages - both, -packages that are built from sources and packages that are downloaded from PyPI when they are released -there as release candidates. - -The way to test it is rather straightforward: - -1) Make sure that the packages - both ``airflow`` and ``providers`` are placed in the ``dist`` folder - of your Airflow source tree. You can either build them there or download from PyPI (see the next chapter) - -2) You can run ```breeze shell`` or ``breeze start-airflow`` commands with adding the following flags - - ``--mount-sources remove`` and ``--use-packages-from-dist``. The first one removes the ``airflow`` - source tree from the container when starting it, the second one installs ``airflow`` and ``providers`` - packages from the ``dist`` folder when entering breeze. - -Testing pre-release packages ----------------------------- - -There are two ways how you can get Airflow packages in ``dist`` folder - by building them from sources or -downloading them from PyPI. - -.. note :: - - Make sure you run ``rm dist/*`` before you start building packages or downloading them from PyPI because - the packages built there already are not removed manually. - -In order to build apache-airflow from sources, you need to run the following command: - -.. code-block:: bash - - breeze release-management prepare-airflow-package - -In order to build providers from sources, you need to run the following command: - -.. code-block:: bash - - breeze release-management prepare-provider-packages ... - -The packages are built in ``dist`` folder and the command will summarise what packages are available in the -``dist`` folder after it finishes. - -If you want to download the packages from PyPI, you need to run the following command: - -.. code-block:: bash - - pip download apache-airflow-providers-==X.Y.Zrc1 --dest dist --no-deps - -You can use it for both release and pre-release packages. - -Examples of testing pre-release packages ----------------------------------------- - -Few examples below explain how you can test pre-release packages, and combine them with locally build -and released packages. - -The following example downloads ``apache-airflow`` and ``celery`` and ``kubernetes`` provider packages from PyPI and -eventually starts Airflow with the Celery Executor. It also loads example dags and default connections: - -.. code:: bash - - rm dist/* - pip download apache-airflow==2.7.0rc1 --dest dist --no-deps - pip download apache-airflow-providers-cncf-kubernetes==7.4.0rc1 --dest dist --no-deps - pip download apache-airflow-providers-cncf-kubernetes==3.3.0rc1 --dest dist --no-deps - breeze start-airflow --mount-sources remove --use-packages-from-dist --executor CeleryExecutor --load-default-connections --load-example-dags - - -The following example downloads ``celery`` and ``kubernetes`` provider packages from PyPI, builds -``apache-airflow`` package from the main sources and eventually starts Airflow with the Celery Executor. -It also loads example dags and default connections: - -.. code:: bash - - rm dist/* - breeze release-management prepare-airflow-package - pip download apache-airflow-providers-cncf-kubernetes==7.4.0rc1 --dest dist --no-deps - pip download apache-airflow-providers-cncf-kubernetes==3.3.0rc1 --dest dist --no-deps - breeze start-airflow --mount-sources remove --use-packages-from-dist --executor CeleryExecutor --load-default-connections --load-example-dags - -The following example builds ``celery``, ``kubernetes`` provider packages from PyPI, downloads 2.6.3 version -of ``apache-airflow`` package from PyPI and eventually starts Airflow using default executor -for the backend chosen (no example dags, no default connections): - -.. code:: bash - - rm dist/* - pip download apache-airflow==2.6.3 --dest dist --no-deps - breeze release-management prepare-provider-packages celery cncf.kubernetes - breeze start-airflow --mount-sources remove --use-packages-from-dist - -You can mix and match packages from PyPI (final or pre-release candidates) with locally build packages. You -can also choose which providers to install this way since the ``--remove-sources`` flag makes sure that Airflow -installed does not contain all the providers - only those that you explicitly downloaded or built in the -``dist`` folder. This way you can test all the combinations of Airflow + Providers you might need. - - -Airflow System Tests -==================== - -System tests need to communicate with external services/systems that are available -if you have appropriate credentials configured for your tests. -The system tests derive from the ``tests.test_utils.system_test_class.SystemTests`` class. They should also -be marked with ``@pytest.marker.system(SYSTEM)`` where ``system`` designates the system -to be tested (for example, ``google.cloud``). These tests are skipped by default. - -You can execute the system tests by providing the ``--system SYSTEM`` flag to ``pytest``. You can -specify several --system flags if you want to execute tests for several systems. - -The system tests execute a specified example DAG file that runs the DAG end-to-end. - -See more details about adding new system tests below. - -Environment for System Tests ----------------------------- - -**Prerequisites:** You may need to set some variables to run system tests. If you need to -add some initialization of environment variables to Breeze, you can add a -``variables.env`` file in the ``files/airflow-breeze-config/variables.env`` file. It will be automatically -sourced when entering the Breeze environment. You can also add some additional -initialization commands in this file if you want to execute something -always at the time of entering Breeze. - -There are several typical operations you might want to perform such as: - -* generating a file with the random value used across the whole Breeze session (this is useful if - you want to use this random number in names of resources that you create in your service -* generate variables that will be used as the name of your resources -* decrypt any variables and resources you keep as encrypted in your configuration files -* install additional packages that are needed in case you are doing tests with 1.10.* Airflow series - (see below) - -Example variables.env file is shown here (this is part of the variables.env file that is used to -run Google Cloud system tests. - -.. code-block:: bash - - # Build variables. This file is sourced by Breeze. - # Also it is sourced during continuous integration build in Cloud Build - - # Auto-export all variables - set -a - - echo - echo "Reading variables" - echo - - # Generate random number that will be used across your session - RANDOM_FILE="/random.txt" - - if [[ ! -f "${RANDOM_FILE}" ]]; then - echo "${RANDOM}" > "${RANDOM_FILE}" - fi - - RANDOM_POSTFIX=$(cat "${RANDOM_FILE}") - - -To execute system tests, specify the ``--system SYSTEM`` -flag where ``SYSTEM`` is a system to run the system tests for. It can be repeated. - - -Forwarding Authentication from the Host ----------------------------------------------------- - -For system tests, you can also forward authentication from the host to your Breeze container. You can specify -the ``--forward-credentials`` flag when starting Breeze. Then, it will also forward the most commonly used -credentials stored in your ``home`` directory. Use this feature with care as it makes your personal credentials -visible to anything that you have installed inside the Docker container. - -Currently forwarded credentials are: - * credentials stored in ``${HOME}/.aws`` for ``aws`` - Amazon Web Services client - * credentials stored in ``${HOME}/.azure`` for ``az`` - Microsoft Azure client - * credentials stored in ``${HOME}/.config`` for ``gcloud`` - Google Cloud client (among others) - * credentials stored in ``${HOME}/.docker`` for ``docker`` client - * credentials stored in ``${HOME}/.snowsql`` for ``snowsql`` - SnowSQL (Snowflake CLI client) - -Adding a New System Test --------------------------- - -We are working on automating system tests execution (AIP-4) but for now, system tests are skipped when -tests are run in our CI system. But to enable the test automation, we encourage you to add system -tests whenever an operator/hook/sensor is added/modified in a given system. - -* To add your own system tests, derive them from the - ``tests.test_utils.system_tests_class.SystemTest`` class and mark with the - ``@pytest.mark.system(SYSTEM_NAME)`` marker. The system name should follow the path defined in - the ``providers`` package (for example, the system tests from ``tests.providers.google.cloud`` - package should be marked with ``@pytest.mark.system("google.cloud")``. - -* If your system tests need some credential files to be available for an - authentication with external systems, make sure to keep these credentials in the - ``files/airflow-breeze-config/keys`` directory. Mark your tests with - ``@pytest.mark.credential_file()`` so that they are skipped if such a credential file is not there. - The tests should read the right credentials and authenticate them on their own. The credentials are read - in Breeze from the ``/files`` directory. The local "files" folder is mounted to the "/files" folder in Breeze. - -* If your system tests are long-running ones (i.e., require more than 20-30 minutes - to complete), mark them with the ```@pytest.markers.long_running`` marker. - Such tests are skipped by default unless you specify the ``--long-running`` flag to pytest. - -* The system test itself (python class) does not have any logic. Such a test runs - the DAG specified by its ID. This DAG should contain the actual DAG logic - to execute. Make sure to define the DAG in ``providers//example_dags``. These example DAGs - are also used to take some snippets of code out of them when documentation is generated. So, having these - DAGs runnable is a great way to make sure the documentation is describing a working example. Inside - your test class/test method, simply use ``self.run_dag(,)`` to run the DAG. Then, - the system class will take care about running the DAG. Note that the DAG_FOLDER should be - a subdirectory of the ``tests.test_utils.AIRFLOW_MAIN_FOLDER`` + ``providers//example_dags``. - - -A simple example of a system test is available in: - -``tests/providers/google/cloud/operators/test_compute_system.py``. - -It runs two DAGs defined in ``airflow.providers.google.cloud.example_dags.example_compute.py``. - - -The typical system test session -------------------------------- - -Here is the typical session that you need to do to run system tests: - -1. Enter breeze - -.. code-block:: bash - - breeze down - breeze --python 3.8 --db-reset --forward-credentials - -This will: - -* stop the whole environment (i.e. recreates metadata database from the scratch) -* run Breeze with: - * python 3.8 version - * resetting the Airflow database - * forward your local credentials to Breeze - -3. Run the tests: - -.. code-block:: bash - - pytest -o faulthandler_timeout=2400 \ - --system=google tests/providers/google/cloud/operators/test_compute_system.py - -Iteration with System Tests if your resources are slow to create ----------------------------------------------------------------- - -When you want to iterate on system tests, you might want to create slow resources first. - -If you need to set up some external resources for your tests (for example compute instances in Google Cloud) -you should set them up and teardown in the setUp/tearDown methods of your tests. -Since those resources might be slow to create, you might want to add some helpers that -set them up and tear them down separately via manual operations. This way you can iterate on -the tests without waiting for setUp and tearDown with every test. - -In this case, you should build in a mechanism to skip setUp and tearDown in case you manually -created the resources. A somewhat complex example of that can be found in -``tests.providers.google.cloud.operators.test_cloud_sql_system.py`` and the helper is -available in ``tests.providers.google.cloud.operators.test_cloud_sql_system_helper.py``. - -When the helper is run with ``--action create`` to create cloud sql instances which are very slow -to create and set-up so that you can iterate on running the system tests without -losing the time for creating theme every time. A temporary file is created to prevent from -setting up and tearing down the instances when running the test. - -This example also shows how you can use the random number generated at the entry of Breeze if you -have it in your variables.env (see the previous chapter). In the case of Cloud SQL, you cannot reuse the -same instance name for a week so we generate a random number that is used across the whole session -and store it in ``/random.txt`` file so that the names are unique during tests. - - -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!! - -Do not forget to delete manually created resources before leaving the -Breeze session. They are usually expensive to run. - -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Important !!!!!!!!!!!!!!!!!!!!!!!!!!!! - -1. Enter breeze - -.. code-block:: bash - - breeze down - breeze --python 3.8 --db-reset --forward-credentials - -2. Run create action in helper (to create slowly created resources): - -.. code-block:: bash - - python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action create - -3. Run the tests: - -.. code-block:: bash - - pytest -o faulthandler_timeout=2400 \ - --system=google tests/providers/google/cloud/operators/test_compute_system.py - -4. Run delete action in helper: - -.. code-block:: bash - - python tests/providers/google/cloud/operators/test_cloud_sql_system_helper.py --action delete - - -Local and Remote Debugging in IDE -================================= - -One of the great benefits of using the local virtualenv and Breeze is an option to run -local debugging in your IDE graphical interface. - -When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate -container. This makes it a little harder to use with IDE built-in debuggers. -Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions). -See additional details on -`remote debugging `_. - -You can set up your remote debugging session as follows: - -.. image:: images/setup_remote_debugging.png - :align: center - :alt: Setup remote debugging - -Note that on macOS, you have to use a real IP address of your host rather than the default -localhost because on macOS the container runs in a virtual machine with a different IP address. - -Make sure to configure source code mapping in the remote debugging configuration to map -your local sources to the ``/opt/airflow`` location of the sources within the container: - -.. image:: images/source_code_mapping_ide.png - :align: center - :alt: Source code mapping - -Setup VM on GCP with SSH forwarding ------------------------------------ - -Below are the steps you need to take to set up your virtual machine in the Google Cloud. - -1. The next steps will assume that you have configured environment variables with the name of the network and - a virtual machine, project ID and the zone where the virtual machine will be created - - .. code-block:: bash - - PROJECT_ID="" - GCP_ZONE="europe-west3-a" - GCP_NETWORK_NAME="airflow-debugging" - GCP_INSTANCE_NAME="airflow-debugging-ci" - -2. It is necessary to configure the network and firewall for your machine. - The firewall must have unblocked access to port 22 for SSH traffic and any other port for the debugger. - In the example for the debugger, we will use port 5555. - - .. code-block:: bash - - gcloud compute --project="${PROJECT_ID}" networks create "${GCP_NETWORK_NAME}" \ - --subnet-mode=auto - - gcloud compute --project="${PROJECT_ID}" firewall-rules create "${GCP_NETWORK_NAME}-allow-ssh" \ - --network "${GCP_NETWORK_NAME}" \ - --allow tcp:22 \ - --source-ranges 0.0.0.0/0 - - gcloud compute --project="${PROJECT_ID}" firewall-rules create "${GCP_NETWORK_NAME}-allow-debugger" \ - --network "${GCP_NETWORK_NAME}" \ - --allow tcp:5555 \ - --source-ranges 0.0.0.0/0 - -3. If you have a network, you can create a virtual machine. To save costs, you can create a `Preemptible - virtual machine ` that is automatically deleted for up - to 24 hours. - - .. code-block:: bash - - gcloud beta compute --project="${PROJECT_ID}" instances create "${GCP_INSTANCE_NAME}" \ - --zone="${GCP_ZONE}" \ - --machine-type=f1-micro \ - --subnet="${GCP_NETWORK_NAME}" \ - --image=debian-11-bullseye-v20220120 \ - --image-project=debian-cloud \ - --preemptible - - To check the public IP address of the machine, you can run the command - - .. code-block:: bash - - gcloud compute --project="${PROJECT_ID}" instances describe "${GCP_INSTANCE_NAME}" \ - --zone="${GCP_ZONE}" \ - --format='value(networkInterfaces[].accessConfigs[0].natIP.notnull().list())' - -4. The SSH Daemon's default configuration does not allow traffic forwarding to public addresses. - To change it, modify the ``GatewayPorts`` options in the ``/etc/ssh/sshd_config`` file to ``Yes`` - and restart the SSH daemon. - - .. code-block:: bash - - gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \ - --zone="${GCP_ZONE}" -- \ - sudo sed -i "s/#\?\s*GatewayPorts no/GatewayPorts Yes/" /etc/ssh/sshd_config - - gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \ - --zone="${GCP_ZONE}" -- \ - sudo service sshd restart - -5. To start port forwarding, run the following command: - - .. code-block:: bash - - gcloud beta compute --project="${PROJECT_ID}" ssh "${GCP_INSTANCE_NAME}" \ - --zone="${GCP_ZONE}" -- \ - -N \ - -R 0.0.0.0:5555:localhost:5555 \ - -v - -If you have finished using the virtual machine, remember to delete it. - - .. code-block:: bash - - gcloud beta compute --project="${PROJECT_ID}" instances delete "${GCP_INSTANCE_NAME}" \ - --zone="${GCP_ZONE}" - -You can use the GCP service for free if you use the `Free Tier `__. - -DAG Testing -=========== - -To ease and speed up the process of developing DAGs, you can use -py:class:`~airflow.executors.debug_executor.DebugExecutor`, which is a single process executor -for debugging purposes. Using this executor, you can run and debug DAGs from your IDE. - -To set up the IDE: - -1. Add ``main`` block at the end of your DAG file to make it runnable. -It will run a backfill job: - -.. code-block:: python - - if __name__ == "__main__": - dag.clear() - dag.run() - - -2. Set up ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in the run configuration of your IDE. - Make sure to also set up all environment variables required by your DAG. - -3. Run and debug the DAG file. - -Additionally, ``DebugExecutor`` can be used in a fail-fast mode that will make -all other running or scheduled tasks fail immediately. To enable this option, set -``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``. - -Also, with the Airflow CLI command ``airflow dags test``, you can execute one complete run of a DAG: - -.. code-block:: bash - - # airflow dags test [dag_id] [execution_date] - airflow dags test example_branch_operator 2018-01-01 - -By default ``/files/dags`` folder is mounted from your local ``/files/dags`` and this is -the directory used by airflow scheduler and webserver to scan dags for. You can place your dags there -to test them. - -The DAGs can be run in the main version of Airflow but they also work -with older versions. - - -Tracking SQL statements -======================= - -You can run tests with SQL statements tracking. To do this, use the ``--trace-sql`` option and pass the -columns to be displayed as an argument. Each query will be displayed on a separate line. -Supported values: - -* ``num`` - displays the query number; -* ``time`` - displays the query execution time; -* ``trace`` - displays the simplified (one-line) stack trace; -* ``sql`` - displays the SQL statements; -* ``parameters`` - display SQL statement parameters. - -If you only provide ``num``, then only the final number of queries will be displayed. - -By default, pytest does not display output for successful tests, if you still want to see them, you must -pass the ``--capture=no`` option. - -If you run the following command: - -.. code-block:: bash - - pytest --trace-sql=num,sql,parameters --capture=no \ - tests/jobs/test_scheduler_job.py -k test_process_dags_queries_count_05 - -On the screen you will see database queries for the given test. - -SQL query tracking does not work properly if your test runs subprocesses. Only queries from the main process -are tracked. - -Code Coverage -============= - -Airflow's CI process automatically uploads the code coverage report to codecov.io. - -Viewing the Coverage Report Online: ------------------------------------ -For the most recent coverage report of the main branch, visit: https://codecov.io/gh/apache/airflow. - -Generating Local Coverage Reports: ----------------------------------- -If you wish to obtain coverage reports for specific areas of the codebase on your local machine, follow these steps: - -a. Initiate a breeze shell. - -b. Execute one of the commands below based on the desired coverage area: - - - **Core:** ``python scripts/cov/core_coverage.py`` - - **REST API:** ``python scripts/cov/restapi_coverage.py`` - - **CLI:** ``python scripts/cov/cli_coverage.py`` - - **Webserver:** ``python scripts/cov/www_coverage.py`` - -c. After execution, the coverage report will be available at: http://localhost:28000/dev/coverage/index.html. - - .. note:: - - In order to see the coverage report, you must start webserver first in breeze environment via `airflow webserver`. - Once you enter `breeze`, you can start `tmux` (terminal multiplexer) and split the terminal (by pressing `ctrl-B "` for example) - to contiinue testing and run the webserver in one tetminal and run tests in the second one (you can switch between - the terminals with `ctrl-B `). - -Modules Not Fully Covered: --------------------------- -Each coverage command provides a list of modules that aren't fully covered. If you wish to enhance coverage for a particular module: - -a. Work on the module to improve its coverage. - -b. Once coverage reaches 100%, you can safely remove the module from the list of modules that are not fully covered. - This list is inside each command's source code. diff --git a/airflow/providers/MANAGING_PROVIDERS_LIFECYCLE.rst b/airflow/providers/MANAGING_PROVIDERS_LIFECYCLE.rst index 04cd81d2fefd3..da41b446211b6 100644 --- a/airflow/providers/MANAGING_PROVIDERS_LIFECYCLE.rst +++ b/airflow/providers/MANAGING_PROVIDERS_LIFECYCLE.rst @@ -29,7 +29,8 @@ new provider. Another recommendation that will help you is to look for a provider that works similar to yours. That way it will help you to set up tests and other dependencies. -First, you need to set up your local development environment. See `Contribution Quick Start `_ +First, you need to set up your local development environment. See +`Contributors Quick Start <../../contributing-docs/03_contributors_quick_start.rst>`_ if you did not set up your local environment yet. We recommend using ``breeze`` to develop locally. This way you easily be able to have an environment more similar to the one executed by GitHub CI workflow. @@ -186,14 +187,15 @@ by ``pip``). Integration tests ----------------- -See `Airflow Integration Tests `_ +See `Airflow Integration Tests <../../contributing-docs/testing/integration-tests.rst>`_ Documentation ------------- An important part of building a new provider is the documentation. -Some steps for documentation occurs automatically by ``pre-commit`` see `Installing pre-commit guide `_ +Some steps for documentation occurs automatically by ``pre-commit`` see +`Installing pre-commit guide <../../contributing-docs/03_contributors_quick_start.rst#pre-commit>`_ Those are important files in the airflow source tree that affect providers. The ``pyproject.toml`` in root Airflow folder is automatically generated based on content of ``provider.yaml`` file in each provider diff --git a/airflow/providers/apache/beam/README.md b/airflow/providers/apache/beam/README.md index b542cda098698..9400546390e11 100644 --- a/airflow/providers/apache/beam/README.md +++ b/airflow/providers/apache/beam/README.md @@ -62,7 +62,7 @@ pip install apache-airflow-providers-apache-beam[google] In Airflow 2.0, all operators, transfers, hooks, sensors, secrets for the `apache.beam` provider are in the `airflow.providers.apache.beam` package. You can read more about the naming conventions used -in [Naming conventions for provider packages](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#naming-conventions-for-provider-packages) +in [Naming conventions for provider packages](https://github.com/apache/airflow/blob/main/contributing-docs/11_provider_packages.rst#naming-conventions-for-provider-packages) ## Operators diff --git a/airflow/reproducible_build.yaml b/airflow/reproducible_build.yaml index 3d777fc30bd45..392eddc44e621 100644 --- a/airflow/reproducible_build.yaml +++ b/airflow/reproducible_build.yaml @@ -1,2 +1,2 @@ release-notes-hash: c1deec1f2ca47e6db62309b185e0598c -source-date-epoch: 1705387201 +source-date-epoch: 1706215197 diff --git a/airflow/settings.py b/airflow/settings.py index 19874d1d219d4..e533ae97eef64 100644 --- a/airflow/settings.py +++ b/airflow/settings.py @@ -218,7 +218,8 @@ def __init__(self): raise RuntimeError( "Your test accessed the DB but `_AIRFLOW_SKIP_DB_TESTS` is set.\n" "Either make sure your test does not use database or mark the test with `@pytest.mark.db_test`\n" - "See https://github.com/apache/airflow/blob/main/TESTING.rst#best-practices-for-db-tests on how " + "See https://github.com/apache/airflow/blob/main/contributing-docs/testing/unit_tests.rst#" + "best-practices-for-db-tests on how " "to deal with it and consult examples." ) diff --git a/chart/README.md b/chart/README.md index 615225b27afa8..c5a927c14c3ce 100644 --- a/chart/README.md +++ b/chart/README.md @@ -60,4 +60,4 @@ Full documentation for Helm Chart (latest **stable** release) lives [on the webs ## Contributing -Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst). +Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/contributing-docs/README.rst). diff --git a/contributing-docs/01_roles_in_airflow_project.rst b/contributing-docs/01_roles_in_airflow_project.rst new file mode 100644 index 0000000000000..73c84c75b89db --- /dev/null +++ b/contributing-docs/01_roles_in_airflow_project.rst @@ -0,0 +1,177 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Roles in Airflow project +======================== + +There are several roles within the Airflow Open-Source community. + +For detailed information for each role, see: `Committers and PMC members <../COMMITTERS.rst>`__. + +.. contents:: :local: + +PMC Member +---------- + +The PMC (Project Management Committee) is a group of maintainers that drives changes in the way that +Airflow is managed as a project. + +Considering Apache, the role of the PMC is primarily to ensure that Airflow conforms to Apache's processes +and guidelines. + +Committers/Maintainers +---------------------- + +You will often see the term "committer" or "maintainer" in the context of the Airflow project. This is a person +who has write access to the Airflow repository and can merge pull requests. Committers (also known as maintainers) +are also responsible for reviewing pull requests and guiding contributors to make their first contribution. +They are also responsible for making sure that the project is moving forward and that the quality of the +code is maintained. + +The term "committer" and "maintainer" is used interchangeably. The term "committer" is the official term used by the +Apache Software Foundation, while "maintainer" is more commonly used in the Open Source community and is used +in context of GitHub in a number of guidelines and documentation, so this document will mostly use "maintainer", +when speaking about Github, Pull Request, Github Issues and Discussions. On the other hand, "committer" is more +often used in devlist discussions, official communications, Airflow website and every time when we formally +refer to the role. + +The official list of committers can be found `here `__. + +Additionally, committers are listed in a few other places (some of these may only be visible to existing committers): + +* https://whimsy.apache.org/roster/committee/airflow +* https://github.com/orgs/apache/teams/airflow-committers/members + +Committers are responsible for: + +* Championing one or more items on the `Roadmap `__ +* Reviewing & Merging Pull-Requests +* Scanning and responding to GitHub issues +* Responding to questions on the dev mailing list (dev@airflow.apache.org) + +Release managers +---------------- + +The task of release managers is to prepare and release Airflow artifacts (airflow, providers, Helm Chart, Python client. +The release managers are usually PMC members and the process of releasing is described in the `dev `__ +documentation where we keep information and tools used for releasing. + +Contributors +------------ + +A contributor is anyone who wants to contribute code, documentation, tests, ideas, or anything to the +Apache Airflow project. + +Contributors are responsible for: + +* Fixing bugs +* Adding features +* Championing one or more items on the `Roadmap `__. + +Security Team +------------- + +Security issues in Airflow are handled by the Airflow Security Team. The team consists +of selected PMC members that are interested in looking at, discussing and fixing +security issues, but it can also include committers and non-committer contributors that are +not PMC members yet and have been approved by the PMC members in a vote. You can request to +be added to the team by sending a message to private@airflow.apache.org. However, the team +should be small and focused on solving security issues, so the requests will be evaluated +on a case-by-case basis and the team size will be kept relatively small, limited to only actively +security-focused contributors. + +There are certain expectations from the members of the security team: + +* They are supposed to be active in assessing, discussing, fixing and releasing the + security issues in Airflow. While it is perfectly understood that as volunteers, we might have + periods of lower activity, prolonged lack of activity and participation will result in removal + from the team, pending PMC decision (the decision on removal can be taken by `LAZY CONSENSUS `_ among + all the PMC members on private@airflow.apache.org mailing list). + +* They are not supposed to reveal the information about pending and unfixed security issues to anyone + (including their employers) unless specifically authorized by the security team members, specifically + if diagnosing and solving the issue might involve the need of external experts - for example security + experts that are available through Airflow stakeholders. The intent about involving 3rd parties has + to be discussed and agreed upon at security@airflow.apache.org. + +* They have to have an `ICLA `_ signed with + Apache Software Foundation. + +* The security team members might inform 3rd parties about fixes, for example in order to assess if the fix + is solving the problem or in order to assess its applicability to be applied by 3rd parties, as soon + as a PR solving the issue is opened in the public airflow repository. + +* In case of critical security issues, the members of the security team might iterate on a fix in a + private repository and only open the PR in the public repository once the fix is ready to be released, + with the intent of minimizing the time between the fix being available and the fix being released. In this + case the PR might be sent to review and comment to the PMC members on private list, in order to request + an expedited voting on the release. The voting for such release might be done on the + private@airflow.apache.org mailing list and should be made public at the dev@apache.airflow.org + mailing list as soon as the release is ready to be announced. + +* The security team members working on the fix might be mentioned as remediation developers in the CVE + including their job affiliation if they want to. + +* Community members acting as release managers are by default members of the security team and unless they + want to, they do not have to be involved in discussing and solving the issues. They are responsible for + releasing the CVE information (announcement and publishing to security indexes) as part of the + release process. This is facilitated by the security tool provided by the Apache Software Foundation. + +* Severity of the issue is determined based on the criteria described in the + `Severity Rating blog post `_ by the Apache Software + Foundation Security team. + +Handling security issues is something of a chore, it takes vigilance, requires quick reaction and responses +and often requires to act outside of the regular "day" job. This means that not everyone can keep up with +being part of the security team for long while being engaged and active. While we do not expect all the +security team members to be active all the time, and - since we are volunteers, it's perfectly understandable +that work, personal life, family and generally life might not help with being active. And this is not a +considered as being failure, it's more stating the fact of life. + +Also prolonged time of being exposed to handling "other's" problems and discussing similar kinds of problem +and responses might be tiring and might lead to burnout. + +However, for those who have never done that before, participation in the security team might be an interesting +experience and a way to learn a lot about security and security issue handling. We have a lot of +established processes and tools that make the work of the security team members easier, so this can be +treated as a great learning experience for some community members. And knowing that this is not +a "lifetime" assignment, but rather a temporary engagement might make it easier for people to decide to +join the security team. + +That's why we've introduced rotation of the security team members. + +Periodically - every 3-4 months (depending on actual churn of the security issues that are reported to us), +we re-evaluate the engagement and activity of the security team members, and we ask them if they want to +continue being part of the security team, taking into account their engagement since the last team refinement. +Generally speaking if the engagement during the last period was marginal, the person is considered as a +candidate for removing from the team and it requires a deliberate confirmation of re-engagement to take +the person off-the-list. + +At the same time we open up the possibility to other people in the community to join the team and make +a "call for new security team members" where community members can volunteer to join the security team. +Such volunteering should happen on the private@ list. The current members of the security team as well +as PMC members can also nominate other community members to join the team and those new team members +have to be well recognized and trusted by the community and accepted by the PMC. + +The proposal of team refinement is passed to the PMC as LAZY CONSENSUS (or VOTE if consensus cannot +be reached). In case the consensus cannot be reached for the whole list, we can split it and ask for +lazy consensus for each person separately. + +------------- + +You can follow this with the `How to communicate <02_how_to_communicate.rst>`__ document to learn more how +to communicate with the Airflow community members. diff --git a/contributing-docs/02_how_to_communicate.rst b/contributing-docs/02_how_to_communicate.rst new file mode 100644 index 0000000000000..e264014d67fce --- /dev/null +++ b/contributing-docs/02_how_to_communicate.rst @@ -0,0 +1,152 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +How to communicate +================== + +Apache Airflow is a Community within Apache Software Foundation. As the motto of +the Apache Software Foundation states "Community over Code" - people in the +community are far more important than their contribution. + +This means that communication plays a big role in it, and this chapter is all about it. + +In our communication, everyone is expected to follow the `ASF Code of Conduct `_. + +.. contents:: :local: + +Various Communication channels +------------------------------ + +We have various channels of communication - starting from the official devlist, comments +in the PR, Slack, wiki. + +All those channels can be used for different purposes. +You can join the channels via links at the `Airflow Community page `_ + +* The `Apache Airflow devlist `_ for: + * official communication + * general issues, asking community for opinion + * discussing proposals + * voting +* The `Airflow CWiki `_ for: + * detailed discussions on big proposals (Airflow Improvement Proposals also name AIPs) +* GitHub `Pull Requests (PRs) `_ for: + * discussing implementation details of PRs + * not for architectural discussions (use the devlist for that) +* The deprecated `JIRA issues `_ for: + * checking out old but still valuable issues that are not on GitHub yet + * mentioning the JIRA issue number in the title of the related PR you would like to open on GitHub + +**IMPORTANT** +We don't create new issues on JIRA anymore. The reason we still look at JIRA issues is that there are valuable +tickets inside of it. However, each new PR should be created on `GitHub issues `_ +as stated in `Contribution Workflow Example `_ + +Slack details +------------- + +* The `Apache Airflow Slack `_ for: + * ad-hoc questions related to development and asking for review (#development channel) + * asking for help with first contribution PRs (#development-first-pr-support channel) + * troubleshooting (#troubleshooting channel) + * using Breeze (#airflow-breeze channel) + * improving and maintaining documentation (#documentation channel) + * group talks (including SIG - special interest groups) (#sig-* channels) + * notifications (#announcements channel) + * random queries (#random channel) + * regional announcements (#users-* channels) + * occasional discussions (wherever appropriate including group and 1-1 discussions) + +Please exercise caution against posting same questions across multiple channels. Doing so not only prevents +redundancy but also promotes more efficient and effective communication for everyone involved. + +Devlist details +--------------- + +The devlist is the most important and official communication channel. Often at Apache project you can +hear "if it is not in the devlist - it did not happen". If you discuss and agree with someone from the +community on something important for the community (including if it is with maintainer or PMC member) the +discussion must be captured and re-shared on devlist in order to give other members of the community to +participate in it. + +We are using certain prefixes for email subjects for different purposes. Start your email with one of those: + * ``[DISCUSS]`` - if you want to discuss something but you have no concrete proposal yet + * ``[PROPOSAL]`` - if usually after "[DISCUSS]" thread discussion you want to propose something and see + what other members of the community think about it. + * ``[AIP-NN]`` - if the mail is about one of the `Airflow Improvement Proposals `_ + * ``[VOTE]`` - if you would like to start voting on a proposal discussed before in a "[PROPOSAL]" thread + * ``[ANNOUNCE]`` - only used by PMC members to announce important things to the community such as + releases or big changes in the project + +Voting is governed by the rules described in `Voting `_ + +What to expect from the community +--------------------------------- + +We are all devoting our time for community as individuals who except for being active in Apache Airflow have +families, daily jobs, right for vacation. Sometimes we are in different timezones or simply are +busy with day-to-day duties that our response time might be delayed. For us it's crucial +to remember to respect each other in the project with no formal structure. +There are no managers, departments, most of us are autonomous in our opinions, decisions. +All of it makes Apache Airflow community a great space for open discussion and mutual respect +for various opinions. + +Disagreements are expected, discussions might include strong opinions and contradicting statements. +Sometimes you might get two maintainers asking you to do things differently. This all happened in the past +and will continue to happen. As a community we have some mechanisms to facilitate discussion and come to +a consensus, conclusions or we end up voting to make important decisions. It is important that these +decisions are not treated as personal wins or losses. At the end it's the community that we all care about +and what's good for community, should be accepted even if you have a different opinion. There is a nice +motto that you should follow in case you disagree with community decision "Disagree but engage". Even +if you do not agree with a community decision, you should follow it and embrace (but you are free to +express your opinion that you don't agree with it). + +As a community - we have high requirements for code quality. This is mainly because we are a distributed +and loosely organized team. We have both - contributors that commit one commit only, and people who add +more commits. It happens that some people assume informal "stewardship" over parts of code for some time - +but at any time we should make sure that the code can be taken over by others, without excessive communication. +Setting high requirements for the code (fairly strict code review, static code checks, requirements of +automated tests, pre-commit checks) is the best way to achieve that - by only accepting good quality +code. Thanks to full test coverage we can make sure that we will be able to work with the code in the future. +So do not be surprised if you are asked to add more tests or make the code cleaner - +this is for the sake of maintainability. + +Rules for new contributors +-------------------------- + +Here are a few rules that are important to keep in mind when you enter our community: + +* Do not be afraid to ask questions +* The communication is asynchronous - do not expect immediate answers, ping others on slack + (#development channel) if blocked +* There is a #newbie-questions channel in slack as a safe place to ask questions +* You can ask one of the maintainers to be a mentor for you, maintainers can guide you within the community +* You can apply to more structured `Apache Mentoring Programme `_ +* It's your responsibility as an author to take your PR from start-to-end including leading communication + in the PR +* It's your responsibility as an author to ping maintainers to review your PR - be mildly annoying sometimes, + it's OK to be slightly annoying with your change - it is also a sign for maintainers that you care +* Be considerate to the high code quality/test coverage requirements for Apache Airflow +* If in doubt - ask the community for their opinion or propose to vote at the devlist +* Discussions should concern subject matters - judge or criticize the merit but never criticize people +* It's OK to express your own emotions while communicating - it helps other people to understand you +* Be considerate for feelings of others. Tell about how you feel not what you think of others + +--------------- + +If you want to quick start your contribution, you can follow with +`Contributors Quick Start <03_contributors_quick_start.rst>`__ diff --git a/CONTRIBUTORS_QUICK_START.rst b/contributing-docs/03_contributors_quick_start.rst similarity index 83% rename from CONTRIBUTORS_QUICK_START.rst rename to contributing-docs/03_contributors_quick_start.rst index 4a115af39644a..bed3928d9c43e 100644 --- a/CONTRIBUTORS_QUICK_START.rst +++ b/contributing-docs/03_contributors_quick_start.rst @@ -30,7 +30,7 @@ you follow the guide. There are three ways you can run the Airflow dev env: 1. With a Docker Containers and Docker Compose (on your local machine). This environment is managed - with `Breeze `_ tool written in Python that makes the environment + with `Breeze `_ tool written in Python that makes the environment management, yeah you guessed it - a breeze. 2. With a local virtual environment (on your local machine). 3. With a remote, managed environment (via remote development environment) @@ -41,8 +41,7 @@ Before deciding which method to choose, there are a couple of factors to conside and allows integration tests with a number of integrations (cassandra, mongo, mysql, etc.). However, it also requires **4GB RAM, 40GB disk space and at least 2 cores**. * If you are working on a basic feature, installing Airflow on a local environment might be sufficient. - For a comprehensive venv tutorial - visit - `Virtual Env guide `_ + For a comprehensive venv tutorial - visit `Local virtualenv <07_local_virtualenv.rst>`_ * You need to have usually a paid account to access managed, remote virtual environment. Local machine development @@ -52,7 +51,7 @@ If you do not work in remote development environment, you need those prerequisit 1. Docker Community Edition (you can also use Colima, see instructions below) 2. Docker Compose -3. pyenv (you can also use pyenv-virtualenv or virtualenvwrapper) +3. Hatch (you can also use pyenv, pyenv-virtualenv or virtualenvwrapper) The below setup describes `Ubuntu installation `_. It might be slightly different on different machines. @@ -141,13 +140,16 @@ Docker Compose docker-compose --version -Pyenv and setting up virtual-env --------------------------------- - Note: You might have issues with pyenv if you have a Mac with an M1 chip. Consider using virtualenv as an alternative. +Setting up virtual-env +---------------------- -1. Install pyenv and configure your shell's environment for Pyenv as suggested in Pyenv `README `_ +1. While you can use any virtualenv manager, we recommend using `Hatch `__ + as your build and integration frontend, and we already use ``hatchling`` build backend for Airflow. + You can read more about Hatch and it's use in Airflow in `Local virtualenv <07_local_virtualenv.rst>`_. + See [PEP-517](https://peps.python.org/pep-0517/#terminology-and-goals) for explanation of what the + frontend and backend meaning is. -2. After installing pyenv, you need to install a few more required packages for Airflow. The below command adds +2. After creating, you need to install a few more required packages for Airflow. The below command adds basic system-level dependencies on Debian/Ubuntu-like system. You will have to adapt it to install similar packages if your operating system is MacOS or another flavour of Linux @@ -167,56 +169,7 @@ like system, this command will install all necessary dependencies that should be libssl-dev locales lsb-release openssh-client sasl2-bin \ software-properties-common sqlite3 sudo unixodbc unixodbc-dev -3. Restart your shell so the path changes take effect and verifying installation - -.. code-block:: bash - - exec $SHELL - pyenv --version - -4. Checking available version, installing required Python version to pyenv and verifying it - -.. code-block:: bash - -For Architectures other than MacOS/ARM - -.. code-block:: bash - - pyenv install --list - pyenv install 3.8.5 - pyenv versions - -For MacOS/Arm (3.9.1 is the first version of Python to support MacOS/ARM, but 3.8.10 works too) - -.. code-block:: bash - - pyenv install --list - pyenv install 3.8.10 - pyenv versions - -5. Creating new virtual environment named ``airflow-env`` for installed version python. In next chapter virtual - environment ``airflow-env`` will be used for installing airflow. - -.. code-block:: bash - -For Architectures other than MacOS/ARM - -.. code-block:: bash - - pyenv virtualenv 3.8.5 airflow-env - -For MacOS/Arm (3.9.1 is the first version of Python to support MacOS/ARM, but 3.8.10 works too) - -.. code-block:: bash - - pyenv virtualenv 3.8.10 airflow-env - -6. Entering virtual environment ``airflow-env`` - -.. code-block:: bash - - pyenv activate airflow-env - +3. With Hatch you can enter virtual environment with ``hatch env shell`` command: Forking and cloning Project --------------------------- @@ -477,7 +430,7 @@ For more information visit : |Breeze documentation| .. |Breeze documentation| raw:: html - Breeze documentation + Breeze documentation Following are some of important topics of Breeze documentation: @@ -631,27 +584,27 @@ on macOS, install via pre-commit uninstall -- For more information on visit |STATIC_CODE_CHECKS.rst| +- For more information on visit |08_static_code_checks.rst| -.. |STATIC_CODE_CHECKS.rst| raw:: html +.. |08_static_code_checks.rst| raw:: html - - STATIC_CODE_CHECKS.rst + + 08_static_code_checks.rst -- Following are some of the important links of STATIC_CODE_CHECKS.rst +- Following are some of the important links of 08_static_code_checks.rst - |Pre-commit Hooks| .. |Pre-commit Hooks| raw:: html - + Pre-commit Hooks - |Running Static Code Checks via Breeze| .. |Running Static Code Checks via Breeze| raw:: html - Running Static Code Checks via Breeze @@ -737,94 +690,42 @@ All Tests are inside ./tests directory. breeze --backend postgres --postgres-version 15 --python 3.8 --db-reset testing tests --test-type All --integration mongo +- For more information on Testing visit : |09_testing.rst| -- For more information on Testing visit : |TESTING.rst| - -.. |TESTING.rst| raw:: html - - TESTING.rst - -- Following are the some of important topics of TESTING.rst - - - |Airflow Test Infrastructure| - - .. |Airflow Test Infrastructure| raw:: html - - - Airflow Test Infrastructure - - - - |Airflow Unit Tests| - - .. |Airflow Unit Tests| raw:: html - - Airflow Unit - Tests - - - - |Helm Unit Tests| - - .. |Helm Unit Tests| raw:: html - - Helm Unit Tests - - - - - |Airflow Integration Tests| - - .. |Airflow Integration Tests| raw:: html - - - Airflow Integration Tests - - - - |Running Tests with Kubernetes| - - .. |Running Tests with Kubernetes| raw:: html - - - Running Tests with Kubernetes - - - - |Airflow System Tests| - - .. |Airflow System Tests| raw:: html - - Airflow - System Tests + .. |09_testing.rst| raw:: html + 09_testing.rst - |Local and Remote Debugging in IDE| .. |Local and Remote Debugging in IDE| raw:: html - Local and Remote Debugging in IDE Contribution guide ################## -- To know how to contribute to the project visit |CONTRIBUTING.rst| +- To know how to contribute to the project visit |README.rst| -.. |CONTRIBUTING.rst| raw:: html +.. |README.rst| raw:: html - CONTRIBUTING.rst + README.rst -- Following are some of important links of CONTRIBUTING.rst +- Following are some of important links of Contribution documentation - |Types of contributions| .. |Types of contributions| raw:: html - + Types of contributions - - |Roles of contributor| .. |Roles of contributor| raw:: html - Roles of + Roles of contributor @@ -832,7 +733,7 @@ Contribution guide .. |Workflow for a contribution| raw:: html - + Workflow for a contribution @@ -881,7 +782,7 @@ describes how to do it. .. |Syncing fork| raw:: html - + Update new changes made to apache:airflow project to your fork @@ -889,7 +790,7 @@ describes how to do it. .. |Rebasing pull request| raw:: html - + Rebasing pull request Using your IDE @@ -899,8 +800,8 @@ If you are familiar with Python development and use your favourite editors, Airf similarly to other projects of yours. However, if you need specific instructions for your IDE you will find more detailed instructions here: -* `Pycharm/IntelliJ `_ -* `Visual Studio Code `_ +* `Pycharm/IntelliJ `_ +* `Visual Studio Code `_ Using Remote development environments @@ -909,5 +810,11 @@ Using Remote development environments In order to use remote development environment, you usually need a paid account, but you do not have to setup local machine for development. -* `GitPod `_ -* `GitHub Codespaces `_ +* `GitPod `_ +* `GitHub Codespaces `_ + + +---------------- + +Once you have your environment set up, you can start contributing to Airflow. You can find more +about ways you can contribute in the `How to contribute <04_how_to_contribute.rst>`_ document. diff --git a/contributing-docs/04_how_to_contribute.rst b/contributing-docs/04_how_to_contribute.rst new file mode 100644 index 0000000000000..e62c071d5d4b3 --- /dev/null +++ b/contributing-docs/04_how_to_contribute.rst @@ -0,0 +1,102 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +How to contribute +================= + +There are various ways how you can contribute to Apache Airflow. Here is a short overview of +some of those ways that involve creating issues and pull requests on GitHub. + +.. contents:: :local: + +Report Bugs +----------- + +Report bugs through `GitHub `__. + +Please report relevant information and preferably code that exhibits the problem. + +Report security issues +---------------------- + +If you want to report a security finding, please follow the +`Security policy `_ + + +Fix Bugs +-------- + +Look through the GitHub issues for bugs. Anything is open to whoever wants to implement it. + + +Issue reporting and resolution process +-------------------------------------- + +An unusual element of the Apache Airflow project is that you can open a PR to +fix an issue or make an enhancement, without needing to open an issue first. +This is intended to make it as easy as possible to contribute to the project. + +If you however feel the need to open an issue (usually a bug or feature request) +consider starting with a `GitHub Discussion `_ instead. +In the vast majority of cases discussions are better than issues - you should only open +issues if you are sure you found a bug and have a reproducible case, +or when you want to raise a feature request that will not require a lot of discussion. +If you have a very important topic to discuss, start a discussion on the +`Devlist `_ instead. + +The Apache Airflow project uses a set of labels for tracking and triaging issues, as +well as a set of priorities and milestones to track how and when the enhancements and bug +fixes make it into an Airflow release. This is documented as part of +the `Issue reporting and resolution process `_, + +Implement Features +------------------ + +Look through the `GitHub issues labeled "kind:feature" +`__ for features. + +Any unassigned feature request issue is open to whoever wants to implement it. + +We've created the operators, hooks, macros and executors we needed, but we've +made sure that this part of Airflow is extensible. New operators, hooks, macros +and executors are very welcomed! + +Improve Documentation +--------------------- + +Airflow could always use better documentation, whether as part of the official +Airflow docs, in docstrings, ``docs/*.rst`` or even on the web as blog posts or +articles. + +See the `Docs README `__ for more information about contributing to Airflow docs. + +Submit Feedback +--------------- + +The best way to send feedback is to `open an issue on GitHub `__. + +If you are proposing a new feature: + +- Explain in detail how it would work. +- Keep the scope as narrow as possible to make it easier to implement. +- Remember that this is a volunteer-driven project, and that contributions are + welcome :) + +------------------------------- + +If you want to know more about creating Pull Requests (PRs), reading pull request guidelines +and learn about coding standards we have, follow to the `Pull Request <05_pull_requests.rst>`_ document. diff --git a/contributing-docs/05_pull_requests.rst b/contributing-docs/05_pull_requests.rst new file mode 100644 index 0000000000000..c2d20670a12c2 --- /dev/null +++ b/contributing-docs/05_pull_requests.rst @@ -0,0 +1,248 @@ + + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Pull Requests +============= + +This document describes how you can create Pull Requests and describes coding standards we use when +implementing them. + +.. contents:: :local: + +Pull Request guidelines +======================= + +Before you submit a Pull Request (PR) from your forked repo, check that it meets +these guidelines: + +- Include tests, either as doctests, unit tests, or both, to your pull request. + + The airflow repo uses `GitHub Actions `__ to + run the tests and `codecov `__ to track + coverage. You can set up both for free on your fork. It will help you make sure you do not + break the build with your PR and that you help increase coverage. + Also we advise to install locally `pre-commit hooks <08_static_code_checks.rst#pre-commit-hooks>`__ to + apply various checks, code generation and formatting at the time you make a local commit - which + gives you near-immediate feedback on things you need to fix before you push your code to the PR, or in + many case it will even fix it for you locally so that you can add and commit it straight away. + +- Follow our project's `Coding style and best practices`_. Usually we attempt to enforce the practices by + having appropriate pre-commits. There are checks amongst them that aren't currently enforced + programmatically (either because they are too hard or just not yet done). + +- Maintainers will not merge a PR that regresses linting or does not pass CI tests (unless you have good + justification that it a transient error or something that is being fixed in other PR). + +- Maintainers will not merge PRs that have unresolved conversation. Note! This is experimental - to be + assessed at the end of January 2024 if we want to continue it. + +- We prefer that you ``rebase`` your PR (and do it quite often) rather than merge. It leads to + easier reviews and cleaner changes where you know exactly what changes you've done. You can learn more + about rebase vs. merge workflow in `Rebase and merge your pull request `__ + and `Rebase your fork `__. Make sure to resolve all conflicts + during rebase. + +- When merging PRs, Maintainer will use **Squash and Merge** which means then your PR will be merged as one + commit, regardless of the number of commits in your PR. During the review cycle, you can keep a commit + history for easier review, but if you need to, you can also squash all commits to reduce the + maintenance burden during rebase. + +- Add an `Apache License `__ header to all new files. If you + have ``pre-commit`` installed, pre-commit will do it automatically for you. If you hesitate to install + pre-commit for your local repository - for example because it takes a few seconds to commit your changes, + this one thing might be a good reason to convince anyone to install pre-commit. + +- If your PR adds functionality, make sure to update the docs as part of the same PR, not only + code and tests. Docstring is often sufficient. Make sure to follow the Sphinx compatible standards. + +- Make sure your code fulfills all the + `static code checks <08_static_code_checks.rst#static-code-checks>`__ we have in our code. The easiest way + to make sure of that is - again - to install `pre-commit hooks <08_static_code_checks.rst#pre-commit-hooks>`__ + +- Make sure your PR is small and focused on one change only - avoid adding unrelated changes, mixing + adding features and refactoring. Keeping to that rule will make it easier to review your PR and will make + it easier for release managers if they decide that your change should be cherry-picked to release it in a + bug-fix release of Airflow. If you want to add a new feature and refactor the code, it's better to split the + PR to several smaller PRs. It's also quite a good and common idea to keep a big ``Draft`` PR if you have + a bigger change that you want to make and then create smaller PRs from it that are easier to review and + merge and cherry-pick. It takes a long time (and a lot of attention and focus of a reviewer to review + big PRs so by splitting it to smaller PRs you actually speed up the review process and make it easier + for your change to be eventually merged. + +- Run relevant tests locally before opening PR. Often tests are placed in the files that are corresponding + to the changed code (for example for ``airflow/cli/cli_parser.py`` changes you have tests in + ``tests/cli/test_cli_parser.py``). However there are a number of cases where the tests that should run + are placed elsewhere - you can either run tests for the whole ``TEST_TYPE`` that is relevant (see + ``breeze testing tests --help`` output for available test types) or you can run all tests, or eventually + you can push your code to PR and see results of the tests in the CI. + +- You can use any supported python version to run the tests, but the best is to check + if it works for the oldest supported version (Python 3.8 currently). In rare cases + tests might fail with the oldest version when you use features that are available in newer Python + versions. For that purpose we have ``airflow.compat`` package where we keep back-ported + useful features from newer versions. + +- Adhere to guidelines for commit messages described in this `article `__. + This makes the lives of those who come after you (and your future self) a lot easier. + +Experimental Requirement to resolve all conversations +===================================================== + +In December 2023 we enabled - experimentally - the requirement to resolve all the open conversations in a +PR in order to make it merge-able. You will see in the status of the PR that it needs to have all the +conversations resolved before it can be merged. + +This is an experiment and we will evaluate by the end of January 2024. If it turns out to be a good idea, +we will keep it enabled in the future. + +The goal of this experiment is to make it easier to see when there are some conversations that are not +resolved for everyone involved in the PR - author, reviewers and maintainers who try to figure out if +the PR is ready to merge and - eventually - merge it. The goal is also to use conversations more as a "soft" way +to request changes and limit the use of ``Request changes`` status to only those cases when the maintainer +is sure that the PR should not be merged in the current state. That should lead to faster review/merge +cycle and less problems with stalled PRs that have ``Request changes`` status but all the issues are +already solved (assuming that maintainers will start treating the conversations this way). + +.. _coding_style: + +Coding style and best practices +=============================== + +Most of our coding style rules are enforced programmatically by ruff and mypy, which are run automatically +with static checks and on every Pull Request (PR), but there are some rules that are not yet automated and +are more Airflow specific or semantic than style. + +Don't Use Asserts Outside Tests +------------------------------- + +Our community agreed that to various reasons we do not use ``assert`` in production code of Apache Airflow. +For details check the relevant `mailing list thread `_. + +In other words instead of doing: + +.. code-block:: python + + assert some_predicate() + +you should do: + +.. code-block:: python + + if not some_predicate(): + handle_the_case() + +The one exception to this is if you need to make an assert for type checking (which should be almost a last resort) you can do this: + +.. code-block:: python + + if TYPE_CHECKING: + assert isinstance(x, MyClass) + + +Database Session Handling +------------------------- + +**Explicit is better than implicit.** If a function accepts a ``session`` parameter it should not commit the +transaction itself. Session management is up to the caller. + +To make this easier, there is the ``create_session`` helper: + +.. code-block:: python + + from sqlalchemy.orm import Session + + from airflow.utils.session import create_session + + + def my_call(x, y, *, session: Session): + ... + # You MUST not commit the session here. + + + with create_session() as session: + my_call(x, y, session=session) + +.. warning:: + **DO NOT** add a default to the ``session`` argument **unless** ``@provide_session`` is used. + +If this function is designed to be called by "end-users" (i.e. DAG authors) then using the ``@provide_session`` wrapper is okay: + +.. code-block:: python + + from sqlalchemy.orm import Session + + from airflow.utils.session import NEW_SESSION, provide_session + + + @provide_session + def my_method(arg, *, session: Session = NEW_SESSION): + ... + # You SHOULD not commit the session here. The wrapper will take care of commit()/rollback() if exception + +In both cases, the ``session`` argument is a `keyword-only argument`_. This is the most preferred form if +possible, although there are some exceptions in the code base where this cannot be used, due to backward +compatibility considerations. In most cases, ``session`` argument should be last in the argument list. + +.. _`keyword-only argument`: https://www.python.org/dev/peps/pep-3102/ + + +Don't use time() for duration calculations +----------------------------------------- + +If you wish to compute the time difference between two events with in the same process, use +``time.monotonic()``, not ``time.time()`` nor ``timezone.utcnow()``. + +If you are measuring duration for performance reasons, then ``time.perf_counter()`` should be used. (On many +platforms, this uses the same underlying clock mechanism as monotonic, but ``perf_counter`` is guaranteed to be +the highest accuracy clock on the system, monotonic is simply "guaranteed" to not go backwards.) + +If you wish to time how long a block of code takes, use ``Stats.timer()`` -- either with a metric name, which +will be timed and submitted automatically: + +.. code-block:: python + + from airflow.stats import Stats + + ... + + with Stats.timer("my_timer_metric"): + ... + +or to time but not send a metric: + +.. code-block:: python + + from airflow.stats import Stats + + ... + + with Stats.timer() as timer: + ... + + log.info("Code took %.3f seconds", timer.duration) + +For full docs on ``timer()`` check out `airflow/stats.py`_. + +If the start_date of a duration calculation needs to be stored in a database, then this has to be done using +datetime objects. In all other cases, using datetime for duration calculation MUST be avoided as creating and +diffing datetime operations are (comparatively) slow. + +----------- + +If you want to learn what are the options for your development environment, follow to the +`Development environments <06_development_environments.rst>`__ document. diff --git a/contributing-docs/06_development_environments.rst b/contributing-docs/06_development_environments.rst new file mode 100644 index 0000000000000..a99cffabae5da --- /dev/null +++ b/contributing-docs/06_development_environments.rst @@ -0,0 +1,160 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Development Environments +======================== + +There are two environments, available on Linux and macOS, that you can use to +develop Apache Airflow. + +.. contents:: :local: + +Local virtualenv Development Environment +---------------------------------------- + +All details about using and running local virtualenv environment for Airflow can be found +in `07_local_virtualenv.rst <07_local_virtualenv.rst>`__. + +Benefits: + +- Packages are installed locally. No container environment is required. +- You can benefit from local debugging within your IDE. You can follow the `Contributors quick start `__ + to set up your local virtualenv and connect your IDE with the environment. +- With the virtualenv in your IDE, you can benefit from auto completion and running tests directly from the IDE. + +Limitations: + +- You have to maintain your dependencies and local environment consistent with + other development environments that you have on your local machine. + +- You cannot run tests that require external components, such as mysql, + postgres database, hadoop, mongo, cassandra, redis, etc. + + The tests in Airflow are a mixture of unit and integration tests and some of + them require these components to be set up. Local virtualenv supports only + real unit tests. Technically, to run integration tests, you can configure + and install the dependencies on your own, but it is usually complex. + Instead, you are recommended to use + `Breeze development environment `__ with all required packages + pre-installed. + +- You need to make sure that your local environment is consistent with other + developer environments. This often leads to a "works for me" syndrome. The + Breeze container-based solution provides a reproducible environment that is + consistent with other developers. + +- You are **STRONGLY** encouraged to also install and use `pre-commit hooks <08_static_code_checks.rst#pre-commit-hooks>`_ + for your local virtualenv development environment. + Pre-commit hooks can speed up your development cycle a lot. + +Typically you can connect your local virtualenv environments easily with your IDE +and use it for development: + +- `PyCharm/IntelliJ `__ quick start instructions +- `VSCode `__ quick start instructions + +Breeze Development Environment +------------------------------ + +All details about using and running Airflow Breeze can be found in +`Breeze `__. + +The Airflow Breeze solution is intended to ease your local development as "*It's +a Breeze to develop Airflow*". + +Benefits: + +- Breeze is a complete environment that includes external components, such as + mysql database, hadoop, mongo, cassandra, redis, etc., required by some of + Airflow tests. Breeze provides a pre-configured Docker Compose environment + where all these services are available and can be used by tests + automatically. + +- Breeze environment is almost the same as used in the CI automated builds. + So, if the tests run in your Breeze environment, they will work in the CI as well. + See `<../../CI.rst>`_ for details about Airflow CI. + +Limitations: + +- Breeze environment takes significant space in your local Docker cache. There + are separate environments for different Python and Airflow versions, and + each of the images takes around 3GB in total. + +- Though Airflow Breeze setup is automated, it takes time. The Breeze + environment uses pre-built images from DockerHub and it takes time to + download and extract those images. Building the environment for a particular + Python version takes less than 10 minutes. + +- Breeze environment runs in the background taking resources, such as disk space, CPU and memory. + You can stop the environment manually after you use it + or even use a ``bare`` environment to decrease resource usage. + +.. note:: + + Breeze CI images are not supposed to be used in production environments. + They are optimized for repeatability of tests, maintainability and speed of building rather + than production performance. For production purposes you should use DockerHub published + `PROD images `__ and customize/extend them as needed. + +Remote development environments +------------------------------- + +There are also remote development environments that you can use to develop Airflow: + +- `CodeSpaces `_ - a browser-based development + environment that you can use to develop Airflow in a browser. It is based on GitHub CodeSpaces and + is available for all GitHub users (free version has number of hours/month limitations). + +- `GitPod `_ - a browser-based development + environment that you can use to develop Airflow in a browser. It is based on GitPod and + is a paid service. + + +When to use which environment +----------------------------- + +The table below summarizes differences between the environments: + ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| **Property** | **Local virtualenv** | **Breeze environment** | **Remote environments** | ++==========================+==================================+=======================================+========================================+ +| Dev machine needed | - (-) You need a dev PC | - (-) You need a dev PC | (+) Works with remote setup | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Test coverage | - (-) unit tests only | - (+) integration and unit tests | (*/-) integration tests (extra config) | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Setup | - (+) automated with breeze cmd | - (+) automated with breeze cmd | (+) automated with CodeSpaces/GitPod | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Installation difficulty | - (-) depends on the OS setup | - (+) works whenever Docker works | (+) works in a modern browser/VSCode | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Team synchronization | - (-) difficult to achieve | - (+) reproducible within team | (+) reproducible within team | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Reproducing CI failures | - (-) not possible in many cases | - (+) fully reproducible | (+) reproduce CI failures | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Ability to update | - (-) requires manual updates | - (+) automated update via breeze cmd | (+/-) can be rebuild on demand | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| Disk space and CPU usage | - (+) relatively lightweight | - (-) uses GBs of disk and many CPUs | (-) integration tests (extra config) | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ +| IDE integration | - (+) straightforward | - (-) via remote debugging only | (-) integration tests (extra config) | ++--------------------------+----------------------------------+---------------------------------------+----------------------------------------+ + +Typically, you are recommended to use multiple of these environments depending on your needs. + + +----------- + +If you want to learn more details about setting up your local virtualenv, follow to the +`Local virtualenv <07_local_virtualenv.rst>`__ document. diff --git a/LOCAL_VIRTUALENV.rst b/contributing-docs/07_local_virtualenv.rst similarity index 88% rename from LOCAL_VIRTUALENV.rst rename to contributing-docs/07_local_virtualenv.rst index e80c0b95bfb1b..33e3428a810e7 100644 --- a/LOCAL_VIRTUALENV.rst +++ b/contributing-docs/07_local_virtualenv.rst @@ -16,8 +16,6 @@ specific language governing permissions and limitations under the License. -.. contents:: :local: - Local Virtual Environment (virtualenv) ====================================== @@ -28,28 +26,13 @@ harder to debug the tests and to use your IDE to run them. That's why we recommend using local virtualenv for development and testing. -The simplest way to install Airflow in local virtualenv is to use ``pip``: - -.. code:: bash - - pip install -e ".[devel,]" # for example: pip install -e ".[devel,google,postgres]" - -This will install Airflow in 'editable' mode - where sources of Airflow are taken directly from the source -code rather than moved to the installation directory. You need to run this command in the virtualenv you -want to install Airflow in - and you need to have the virtualenv activated. - -While you can use any virtualenv manager, we recommend using `Hatch `__ -as your development environment front-end, and we already use Hatch backend ``hatchling`` for Airflow. - -Hatchling is automatically installed when you build Airflow but since airflow build system uses -``PEP`` compliant ``pyproject.toml`` file, you can use any front-end build system that supports -``PEP 517`` and ``PEP 518``. You can also use ``pip`` to install Airflow in editable mode. +.. contents:: :local: -Prerequisites -============= +Installation in local virtualenv +-------------------------------- Required Software Packages --------------------------- +.......................... Use system-level package managers like yum, apt-get for Linux, or Homebrew for macOS to install required software packages: @@ -68,27 +51,28 @@ of required packages. - MacOs with ARM architectures require graphviz for venv setup, refer `here `_ to install graphviz - The helm chart tests need helm to be installed as a pre requisite. Refer `here `_ to install and setup helm -Extra Packages --------------- +Installing Airflow +.................. -.. note:: +The simplest way to install Airflow in local virtualenv is to use ``pip``: - Only ``pip`` installation is currently officially supported. - Make sure you have the latest pip installed, reference `version `_ +.. code:: bash - While there are some successes with using other tools like `poetry `_ or - `pip-tools `_, they do not share the same workflow as - ``pip`` - especially when it comes to constraint vs. requirements management. - Installing via ``Poetry`` or ``pip-tools`` is not currently supported. + pip install -e ".[devel,]" # for example: pip install -e ".[devel,google,postgres]" - There are known issues with ``bazel`` that might lead to circular dependencies when using it to install - Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing - the problem in `this PR `_ so it might be that - newer versions of ``bazel`` will handle it. +This will install Airflow in 'editable' mode - where sources of Airflow are taken directly from the source +code rather than moved to the installation directory. You need to run this command in the virtualenv you +want to install Airflow in - and you need to have the virtualenv activated. - If you wish to install airflow using those tools you should use the constraint files and convert - them to appropriate format and workflow that your tool requires. +While you can use any virtualenv manager, we recommend using `Hatch `__ +as your development environment front-end, and we already use Hatch backend ``hatchling`` for Airflow. +Hatchling is automatically installed when you build Airflow but since airflow build system uses +``PEP`` compliant ``pyproject.toml`` file, you can use any front-end build system that supports +``PEP 517`` and ``PEP 518``. You can also use ``pip`` to install Airflow in editable mode. + +Extras (optional dependencies) +.............................. You can also install extra packages (like ``[ssh]``, etc) via ``pip install -e [devel,EXTRA1,EXTRA2 ...]``. However, some of them may @@ -107,15 +91,33 @@ you should set LIBRARY\_PATH before running ``pip install``: export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/opt/openssl/lib/ -You are STRONGLY encouraged to also install and use `pre-commit hooks `_ +You are STRONGLY encouraged to also install and use `pre-commit hooks <08_static_code_checks.rst#pre-commit-hooks>`_ for your local virtualenv development environment. Pre-commit hooks can speed up your development cycle a lot. The full list of extras is available in ``_ and can be easily retrieved using hatch via +.. note:: + + Only ``pip`` installation is currently officially supported. + Make sure you have the latest pip installed, reference `version `_ + + While there are some successes with using other tools like `poetry `_ or + `pip-tools `_, they do not share the same workflow as + ``pip`` - especially when it comes to constraint vs. requirements management. + Installing via ``Poetry`` or ``pip-tools`` is not currently supported. + + There are known issues with ``bazel`` that might lead to circular dependencies when using it to install + Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing + the problem in `this PR `_ so it might be that + newer versions of ``bazel`` will handle it. + + If you wish to install airflow using those tools you should use the constraint files and convert + them to appropriate format and workflow that your tool requires. + Using Hatch -=========== +----------- Airflow uses `hatch `_ as a build and development tool of choice. It is one of popular build tools and environment managers for Python, maintained by the Python Packaging Authority. @@ -127,7 +129,7 @@ easily used by hatch to create your local venvs. This is not necessary for you t Airflow, but it is a convenient way to manage your local Python versions and virtualenvs. Installing Hatch ----------------- +................ You can install hat using various other ways (including Gui installers). @@ -160,7 +162,7 @@ or install all Python versions that are used in Airflow: hatch python install all Manage your virtualenvs with Hatch ----------------------------------- +.................................. Airflow has some pre-defined virtualenvs that you can use to develop and test airflow. You can see the list of available envs with: @@ -271,6 +273,34 @@ You can also build only ``wheel`` or ``sdist`` packages: hatch build -t wheel hatch build -t sdist +Local and Remote Debugging in IDE +--------------------------------- + +One of the great benefits of using the local virtualenv and Breeze is an option to run +local debugging in your IDE graphical interface. + +When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate +container. This makes it a little harder to use with IDE built-in debuggers. +Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions). +See additional details on +`remote debugging `_. + +You can set up your remote debugging session as follows: + +.. image:: images/setup_remote_debugging.png + :align: center + :alt: Setup remote debugging + +Note that on macOS, you have to use a real IP address of your host rather than the default +localhost because on macOS the container runs in a virtual machine with a different IP address. + +Make sure to configure source code mapping in the remote debugging configuration to map +your local sources to the ``/opt/airflow`` location of the sources within the container: + +.. image:: images/source_code_mapping_ide.png + :align: center + :alt: Source code mapping + Developing Providers -------------------- @@ -323,7 +353,7 @@ install the dependencies automatically when you create or switch to a developmen Installing recommended version of dependencies -============================================== +---------------------------------------------- Whatever virtualenv solution you use, when you want to make sure you are using the same version of dependencies as in main, you can install recommended version of the dependencies by using @@ -358,18 +388,17 @@ These are examples of the development options available with the local virtualen This document describes minimum requirements and instructions for using a standalone version of the local virtualenv. - Running Tests ------------- -Running tests is described in `TESTING.rst `_. +Running tests is described in `Testing documentation <09_testing.rst>`_. While most of the tests are typical unit tests that do not require external components, there are a number of Integration tests. You can technically use local virtualenv to run those tests, but it requires to set up all necessary dependencies for all the providers you are going to tests and also setup databases - and sometimes other external components (for integration test). -So, generally it should be easier to use the `Breeze `__ development environment +So, generally it should be easier to use the `Breeze `__ development environment (especially for Integration tests). @@ -384,3 +413,9 @@ the built-in Airflow command (however you needs a CLI client tool for each datab airflow db shell The command will explain what CLI tool is needed for the database you have configured. + + +----------- + +As the next step, it is important to learn about `Static code checks <08_static_code_checks.rst>`__.that are +used to automate code quality checks. Your code must pass the static code checks to get merged. diff --git a/STATIC_CODE_CHECKS.rst b/contributing-docs/08_static_code_checks.rst similarity index 91% rename from STATIC_CODE_CHECKS.rst rename to contributing-docs/08_static_code_checks.rst index e221b4afa1b8a..c49eea44302bf 100644 --- a/STATIC_CODE_CHECKS.rst +++ b/contributing-docs/08_static_code_checks.rst @@ -15,8 +15,6 @@ specific language governing permissions and limitations under the License. -.. contents:: :local: - Static code checks ================== @@ -26,8 +24,9 @@ All the static code checks can be run through pre-commit hooks. The pre-commit hooks perform all the necessary installation when you run them for the first time. See the table below to identify which pre-commit checks require the Breeze Docker images. -You can also run some `static code check `_ via -`Breeze `_ environment. +You can also run the checks via `Breeze `_ environment. + +.. contents:: :local: Pre-commit hooks ---------------- @@ -44,7 +43,7 @@ We have integrated the fantastic `pre-commit `__ framewo in our development workflow. To install and use it, you need at least Python 3.8 locally. Installing pre-commit hooks -........................... +--------------------------- It is the best to use pre-commit hooks when you have your local virtualenv for Airflow activated since then pre-commit hooks and other dependencies are @@ -62,12 +61,10 @@ temporarily when you commit your code with ``--no-verify`` switch or skip certai to much disturbing your local workflow. See `Available pre-commit checks <#available-pre-commit-checks>`_ and `Using pre-commit <#using-pre-commit>`_ -.. note:: Additional prerequisites might be needed - - The pre-commit hooks use several external linters that need to be installed before pre-commit is run. - Each of the checks installs its own environment, so you do not need to install those, but there are some - checks that require locally installed binaries. On Linux, you typically install - them with ``sudo apt install``, on macOS - with ``brew install``. +The pre-commit hooks use several external linters that need to be installed before pre-commit is run. +Each of the checks installs its own environment, so you do not need to install those, but there are some +checks that require locally installed binaries. On Linux, you typically install +them with ``sudo apt install``, on macOS - with ``brew install``. The current list of prerequisites is limited to ``xmllint``: @@ -77,7 +74,7 @@ The current list of prerequisites is limited to ``xmllint``: Some pre-commit hooks also require the Docker Engine to be configured as the static checks are executed in the Docker environment (See table in the `Available pre-commit checks <#available-pre-commit-checks>`_ . You should build the images -locally before installing pre-commit checks as described in `Breeze docs `__. +locally before installing pre-commit checks as described in `Breeze docs `__. Sometimes your image is outdated and needs to be rebuilt because some dependencies have been changed. In such cases, the Docker-based pre-commit will inform you that you should rebuild the image. @@ -86,7 +83,7 @@ In case you do not have your local images built, the pre-commit hooks fail and p instructions on what needs to be done. Enabling pre-commit hooks -......................... +------------------------- To turn on pre-commit checks for ``commit`` operations in git, enter: @@ -109,65 +106,11 @@ For details on advanced usage of the install method, use: pre-commit install --help Available pre-commit checks -........................... +--------------------------- This table lists pre-commit hooks used by Airflow. The ``Image`` column indicates which hooks require Breeze Docker image to be built locally. -.. note:: Manual pre-commits - - Most of the checks we run are configured to run automatically when you commit the code. However, - there are some checks that are not run automatically and you need to run them manually. Those - checks are marked with ``manual`` in the ``Description`` column in the table below. You can run - them manually by running ``pre-commit run --hook-stage manual ``. - -.. note:: Disabling particular checks - - In case you have a problem with running particular ``pre-commit`` check you can still continue using the - benefits of having ``pre-commit`` installed, with some of the checks disabled. In order to disable - checks you might need to set ``SKIP`` environment variable to coma-separated list of checks to skip. For example, - when you want to skip some checks (ruff/mypy for example), you should be able to do it by setting - ``export SKIP=ruff,mypy-core,``. You can also add this to your ``.bashrc`` or ``.zshrc`` if you - do not want to set it manually every time you enter the terminal. - - In case you do not have breeze image configured locally, you can also disable all checks that require breeze - the image by setting ``SKIP_BREEZE_PRE_COMMITS`` to "true". This will mark the tests as "green" automatically - when run locally (note that those checks will anyway run in CI). - -.. note:: Mypy checks - - When we run mypy checks locally when committing a change, one of the ``mypy-*`` checks is run, ``mypy-airflow``, - ``mypy-dev``, ``mypy-providers``, ``mypy-docs``, depending on the files you are changing. The mypy checks - are run by passing those changed files to mypy. This is way faster than running checks for all files (even - if mypy cache is used - especially when you change a file in airflow core that is imported and used by many - files). However, in some cases, it produces different results than when running checks for the whole set - of files, because ``mypy`` does not even know that some types are defined in other files and it might not - be able to follow imports properly if they are dynamic. Therefore in CI we run ``mypy`` check for whole - directories (``airflow`` - excluding providers, ``providers``, ``dev`` and ``docs``) to make sure - that we catch all ``mypy`` errors - so you can experience different results when running mypy locally and - in CI. If you want to run mypy checks for all files locally, you can do it by running the following - command (example for ``airflow`` files): - - .. code-block:: bash - - pre-commit run --hook-stage manual mypy- --all-files - - For example: - - .. code-block:: bash - - pre-commit run --hook-stage manual mypy-airflow --all-files - - -.. note:: Mypy volume cache - - MyPy uses a separate docker-volume (called ``mypy-cache-volume``) that keeps the cache of last MyPy - execution in order to speed MyPy checks up (sometimes by order of magnitude). While in most cases MyPy - will handle refreshing the cache when and if needed, there are some cases when it won't (cache invalidation - is the hard problem in computer science). This might happen for example when we upgrade MyPY. In such - cases you might need to manually remove the cache volume by running ``breeze down --cleanup-mypy-cache``. - - .. BEGIN AUTO-GENERATED STATIC CHECK LIST +-----------------------------------------------------------+--------------------------------------------------------------+---------+ @@ -422,7 +365,7 @@ require Breeze Docker image to be built locally. .. END AUTO-GENERATED STATIC CHECK LIST Using pre-commit -................ +---------------- After installation, pre-commit hooks are run automatically when you commit the code. But you can run pre-commit hooks manually as needed. @@ -479,6 +422,59 @@ You can always skip running the tests by providing ``--no-verify`` flag to the To check other usage types of the pre-commit framework, see `Pre-commit website `__. +Disabling particular checks +--------------------------- + +In case you have a problem with running particular ``pre-commit`` check you can still continue using the +benefits of having ``pre-commit`` installed, with some of the checks disabled. In order to disable +checks you might need to set ``SKIP`` environment variable to coma-separated list of checks to skip. For example, +when you want to skip some checks (ruff/mypy for example), you should be able to do it by setting +``export SKIP=ruff,mypy-core,``. You can also add this to your ``.bashrc`` or ``.zshrc`` if you +do not want to set it manually every time you enter the terminal. + +In case you do not have breeze image configured locally, you can also disable all checks that require breeze +the image by setting ``SKIP_BREEZE_PRE_COMMITS`` to "true". This will mark the tests as "green" automatically +when run locally (note that those checks will anyway run in CI). + +Manual pre-commits +------------------ + +Most of the checks we run are configured to run automatically when you commit the code. However, +there are some checks that are not run automatically and you need to run them manually. Those +checks are marked with ``manual`` in the ``Description`` column in the table below. You can run +them manually by running ``pre-commit run --hook-stage manual ``. + +Mypy checks +----------- + +When we run mypy checks locally when committing a change, one of the ``mypy-*`` checks is run, ``mypy-airflow``, +``mypy-dev``, ``mypy-providers``, ``mypy-docs``, depending on the files you are changing. The mypy checks +are run by passing those changed files to mypy. This is way faster than running checks for all files (even +if mypy cache is used - especially when you change a file in airflow core that is imported and used by many +files). However, in some cases, it produces different results than when running checks for the whole set +of files, because ``mypy`` does not even know that some types are defined in other files and it might not +be able to follow imports properly if they are dynamic. Therefore in CI we run ``mypy`` check for whole +directories (``airflow`` - excluding providers, ``providers``, ``dev`` and ``docs``) to make sure +that we catch all ``mypy`` errors - so you can experience different results when running mypy locally and +in CI. If you want to run mypy checks for all files locally, you can do it by running the following +command (example for ``airflow`` files): + +.. code-block:: bash + + pre-commit run --hook-stage manual mypy- --all-files + +For example: + +.. code-block:: bash + + pre-commit run --hook-stage manual mypy-airflow --all-files + +MyPy uses a separate docker-volume (called ``mypy-cache-volume``) that keeps the cache of last MyPy +execution in order to speed MyPy checks up (sometimes by order of magnitude). While in most cases MyPy +will handle refreshing the cache when and if needed, there are some cases when it won't (cache invalidation +is the hard problem in computer science). This might happen for example when we upgrade MyPY. In such +cases you might need to manually remove the cache volume by running ``breeze down --cleanup-mypy-cache``. + Running static code checks via Breeze ------------------------------------- @@ -571,3 +567,8 @@ Just performing dry run: .. code-block:: bash DRY_RUN="true" pre-commit run --verbose ruff + +----------- + +Once your code passes all the static code checks, you should take a look at `Testing documentation <09_testing.rst>`__ +to learn about various ways to test the code. diff --git a/contributing-docs/09_testing.rst b/contributing-docs/09_testing.rst new file mode 100644 index 0000000000000..8500ded3d5b76 --- /dev/null +++ b/contributing-docs/09_testing.rst @@ -0,0 +1,56 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Airflow Test Infrastructure +=========================== + +* `Unit tests `__ are Python tests that do not require any additional integrations. + Unit tests are available both in the `Breeze environment <../dev/breeze/doc/README.rst>`__ + and `local virtualenv <07_local_virtualenv.rst>`__. More about unit tests. + +* `Integration tests `__ are available in the + `Breeze environment <../dev/breeze/doc/README.rst>`__ that is also used for Airflow CI tests. + Integration tests are special tests that require additional services running, such as Postgres, + MySQL, Kerberos, etc. + +* `Docker Compose tests `__ are tests we run to check if our quick + start docker-compose works. + +* `Kubernetes tests `__ are tests we run to check if our Kubernetes + deployment and Kubernetes Pod Operator works. + +* `Helm unit tests `__ are tests we run to verify if Helm Chart is + rendered correctly for various configuration parameters. + +* `System tests `__ are automatic tests that use external systems like + Google Cloud and AWS. These tests are intended for an end-to-end DAG execution. + +You can also run other kind of tests when you are developing airflow packages: + +* `Testing packages `__ is a document that describes how to + manually build and test pre-release candidate packages of airflow and providers. + +* `DAG testing `__ is a document that describes how to test DAGs in a local environment + with ``DebugExecutor``. Note, that this is a legacy method - you can now use dag.test() method to test DAGs. + +------ + +You can learn how to `build documentation <../docs/README.rst>`__ as you will likely need to update +documentation as part of your PR. + +You can also learn about `working with git <10_working_with_git.rst>`__ as you will need to understand how +git branching works and how to rebase your PR. diff --git a/contributing-docs/10_working_with_git.rst b/contributing-docs/10_working_with_git.rst new file mode 100644 index 0000000000000..4d0e4b9bd4250 --- /dev/null +++ b/contributing-docs/10_working_with_git.rst @@ -0,0 +1,199 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + +Working with Git +================ + +In this document you can learn basics of how you should use Git in Airflow project. It explains branching model and stresses +that we are using rebase workflow. It also explains how to sync your fork with the main repository. + +.. contents:: :local: + +Airflow Git Branches +==================== + +All new development in Airflow happens in the ``main`` branch. All PRs should target that branch. + +We also have a ``v2-*-test`` branches that are used to test ``2.*.x`` series of Airflow and where maintainers +cherry-pick selected commits from the main branch. + +Cherry-picking is done with the ``-x`` flag. + +The ``v2-*-test`` branch might be broken at times during testing. Expect force-pushes there so +maintainers should coordinate between themselves on who is working on the ``v2-*-test`` branch - +usually these are developers with the release manager permissions. + +The ``v2-*-stable`` branch is rather stable - there are minimum changes coming from approved PRs that +passed the tests. This means that the branch is rather, well, "stable". + +Once the ``v2-*-test`` branch stabilizes, the ``v2-*-stable`` branch is synchronized with ``v2-*-test``. +The ``v2-*-stable`` branches are used to release ``2.*.x`` releases. + +The general approach is that cherry-picking a commit that has already had a PR and unit tests run +against main is done to ``v2-*-test`` branches, but PRs from contributors towards 2.0 should target +``v2-*-stable`` branches. + +The ``v2-*-test`` branches and ``v2-*-stable`` ones are merged just before the release and that's the +time when they converge. + +The production images are released in DockerHub from: + +* main branch for development +* ``2.*.*``, ``2.*.*rc*`` releases from the ``v2-*-stable`` branch when we prepare release candidates and + final releases. + +How to sync your fork +===================== + +When you have your fork, you should periodically synchronize the main of your fork with the +Apache Airflow main. In order to do that you can ``git pull --rebase`` to your local git repository from +apache remote and push the main (often with ``--force`` to your fork). There is also an easy +way to sync your fork in GitHub's web UI with the `Fetch upstream feature +`_. + +This will force-push the ``main`` branch from ``apache/airflow`` to the ``main`` branch +in your fork. Note that in case you modified the main in your fork, you might loose those changes. + + +How to rebase PR +================ + +A lot of people are unfamiliar with the rebase workflow in Git, but we think it is an excellent workflow, +providing a better alternative to the merge workflow. We've therefore written a short guide for those who +would like to learn it. + + +As of February 2022, GitHub introduced the capability of "Update with Rebase" which make it easy to perform +rebase straight in the GitHub UI, so in cases when there are no conflicts, rebasing to latest version +of ``main`` can be done very easily following the instructions +`in the GitHub blog `_ + +.. image:: images/rebase.png + :align: center + :alt: Update PR with rebase + +However, when you have conflicts, sometimes you will have to perform rebase manually, and resolve the +conflicts, and remainder of the section describes how to approach it. + +As opposed to the merge workflow, the rebase workflow allows us to clearly separate your changes from the +changes of others. It puts the responsibility of rebasing on the +author of the change. It also produces a "single-line" series of commits on the main branch. This +makes it easier to understand what was going on and to find reasons for problems (it is especially +useful for "bisecting" when looking for a commit that introduced some bugs). + +First of all, we suggest you read about the rebase workflow here: +`Merging vs. rebasing `_. This is an +excellent article that describes all the ins/outs of the rebase workflow. I recommend keeping it for future reference. + +The goal of rebasing your PR on top of ``apache/main`` is to "transplant" your change on top of +the latest changes that are merged by others. It also allows you to fix all the conflicts +that arise as a result of other people changing the same files as you and merging the changes to ``apache/main``. + +Here is how rebase looks in practice (you can find a summary below these detailed steps): + +1. You first need to add the Apache project remote to your git repository. This is only necessary once, +so if it's not the first time you are following this tutorial you can skip this step. In this example, +we will be adding the remote +as "apache" so you can refer to it easily: + +* If you use ssh: ``git remote add apache git@github.com:apache/airflow.git`` +* If you use https: ``git remote add apache https://github.com/apache/airflow.git`` + +2. You then need to make sure that you have the latest main fetched from the ``apache`` repository. You can do this + via: + + ``git fetch apache`` (to fetch apache remote) + + ``git fetch --all`` (to fetch all remotes) + +3. Assuming that your feature is in a branch in your repository called ``my-branch`` you can easily check + what is the base commit you should rebase from by: + + ``git merge-base my-branch apache/main`` + + This will print the HASH of the base commit which you should use to rebase your feature from. + For example: ``5abce471e0690c6b8d06ca25685b0845c5fd270f``. Copy that HASH and go to the next step. + + Optionally, if you want better control you can also find this commit hash manually. + + Run: + + ``git log`` + + And find the first commit that you DO NOT want to "transplant". + + Performing: + + ``git rebase HASH`` + + Will "transplant" all commits after the commit with the HASH. + +4. Providing that you weren't already working on your branch, check out your feature branch locally via: + + ``git checkout my-branch`` + +5. Rebase: + + ``git rebase HASH --onto apache/main`` + + For example: + + ``git rebase 5abce471e0690c6b8d06ca25685b0845c5fd270f --onto apache/main`` + +6. If you have no conflicts - that's cool. You rebased. You can now run ``git push --force-with-lease`` to + push your changes to your repository. That should trigger the build in our CI if you have a + Pull Request (PR) opened already. + +7. While rebasing you might have conflicts. Read carefully what git tells you when it prints information + about the conflicts. You need to solve the conflicts manually. This is sometimes the most difficult + part and requires deliberately correcting your code and looking at what has changed since you developed your + changes. + + There are various tools that can help you with this. You can use: + + ``git mergetool`` + + You can configure different merge tools with it. You can also use IntelliJ/PyCharm's excellent merge tool. + When you open a project in PyCharm which has conflicts, you can go to VCS > Git > Resolve Conflicts and there + you have a very intuitive and helpful merge tool. For more information, see + `Resolve conflicts `_. + +8. After you've solved your conflict run: + + ``git rebase --continue`` + + And go either to point 6. or 7, depending on whether you have more commits that cause conflicts in your PR (rebasing applies each + commit from your PR one-by-one). + + + +Summary +------------- + +Useful when you understand the flow but don't remember the steps and want a quick reference. + +``git fetch --all`` +``git merge-base my-branch apache/main`` +``git checkout my-branch`` +``git rebase HASH --onto apache/main`` +``git push --force-with-lease`` + +------- + +Now, once you know it all you can read more about how Airflow repository is a monorepo containing both airflow package and +more than 80 `provider packages <11_provider_packages.rst>`__ and how to develop providers. diff --git a/contributing-docs/11_provider_packages.rst b/contributing-docs/11_provider_packages.rst new file mode 100644 index 0000000000000..ffc6d40a3e78f --- /dev/null +++ b/contributing-docs/11_provider_packages.rst @@ -0,0 +1,233 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Provider packages +================= + +Airflow 2.0 is split into core and providers. They are delivered as separate packages: + +* ``apache-airflow`` - core of Apache Airflow +* ``apache-airflow-providers-*`` - More than 70 provider packages to communicate with external services + +.. contents:: :local: + +Where providers are kept in our repository +------------------------------------------ + +Airflow Providers are stored in the same source tree as Airflow Core (under ``airflow.providers``) package. This +means that Airflow's repository is a monorepo, that keeps multiple packages in a single repository. This has a number +of advantages, because code and CI infrastructure and tests can be shared. Also contributions are happening to a +single repository - so no matter if you contribute to Airflow or Providers, you are contributing to the same +repository and project. + +It has also some disadvantages as this introduces some coupling between those - so contributing to providers might +interfere with contributing to Airflow. Python ecosystem does not yet have proper monorepo support for keeping +several packages in one repository and being able to work on multiple of them at the same time, but we have +high hopes Hatch project that use as our recommended packaging frontend +will `solve this problem in the future `__ + +Therefore, until we can introduce multiple ``pyproject.toml`` for providers information/meta-data about the providers +is kept in ``provider.yaml`` file in the right sub-directory of ``airflow\providers``. This file contains: + +* package name (``apache-airflow-provider-*``) +* user-facing name of the provider package +* description of the package that is available in the documentation +* list of versions of package that have been released so far +* list of dependencies of the provider package +* list of additional-extras that the provider package provides (together with dependencies of those extras) +* list of integrations, operators, hooks, sensors, transfers provided by the provider (useful for documentation generation) +* list of connection types, extra-links, secret backends, auth backends, and logging handlers (useful to both + register them as they are needed by Airflow and to include them in documentation automatically). +* and more ... + +If you want to add dependencies to the provider, you should add them to the corresponding ``provider.yaml`` +and Airflow pre-commits and package generation commands will use them when preparing package information. + +In Airflow 2.0, providers are separated out, and not packaged together with the core when +you build "apache-airflow" package, however when you install airflow project in editable +mode with ``pip install -e ".[devel]"`` they are available in the same environment as Airflow. + +You should only update dependencies for the provider in the corresponding ``provider.yaml`` which is the +source of truth for all information about the provider. + +Some of the packages have cross-dependencies with other providers packages. This typically happens for +transfer operators where operators use hooks from the other providers in case they are transferring +data between the providers. The list of dependencies is maintained (automatically with the +``update-providers-dependencies`` pre-commit) in the ``generated/provider_dependencies.json``. +Same pre-commit also updates generate dependencies in ``pyproject.toml``. + +Cross-dependencies between provider packages are converted into extras - if you need functionality from +the other provider package you can install it adding [extra] after the +``apache-airflow-providers-PROVIDER`` for example: +``pip install apache-airflow-providers-google[amazon]`` in case you want to use GCP +transfer operators from Amazon ECS. + +If you add a new dependency between different providers packages, it will be detected automatically during +and pre-commit will generate new entry in ``generated/provider_dependencies.json`` and update +``pyproject.toml`` so that the package extra dependencies are properly handled when package +might be installed when breeze is restarted or by your IDE or by running ``pip install -e ".[devel]"``. + + +Developing community managed provider packages +---------------------------------------------- + +While you can develop your own providers, Apache Airflow has 60+ providers that are managed by the community. +They are part of the same repository as Apache Airflow (we use ``monorepo`` approach where different +parts of the system are developed in the same repository but then they are packaged and released separately). +All the community-managed providers are in 'airflow/providers' folder and they are all sub-packages of +'airflow.providers' package. All the providers are available as ``apache-airflow-providers-`` +packages when installed by users, but when you contribute to providers you can work on airflow main +and install provider dependencies via ``editable`` extras - without having to manage and install providers +separately, you can easily run tests for the providers and when you run airflow from the ``main`` +sources, all community providers are automatically available for you. + +The capabilities of the community-managed providers are the same as the third-party ones. When +the providers are installed from PyPI, they provide the entry-point containing the metadata as described +in the previous chapter. However when they are locally developed, together with Airflow, the mechanism +of discovery of the providers is based on ``provider.yaml`` file that is placed in the top-folder of +the provider. The ``provider.yaml`` is the single source of truth for the provider metadata and it is +there where you should add and remove dependencies for providers (following by running +``update-providers-dependencies`` pre-commit to synchronize the dependencies with ``pyproject.toml`` +of Airflow). + +The ``provider.yaml`` file is compliant with the schema that is available in +`json-schema specification `_. + +Thanks to that mechanism, you can develop community managed providers in a seamless way directly from +Airflow sources, without preparing and releasing them as packages separately, which would be rather +complicated. + +Regardless if you plan to contribute your provider, when you are developing your own, custom providers, +you can use the above functionality to make your development easier. You can add your provider +as a sub-folder of the ``airflow.providers`` package, add the ``provider.yaml`` file and install airflow +in development mode - then capabilities of your provider will be discovered by airflow and you will see +the provider among other providers in ``airflow providers`` command output. + +Naming Conventions for provider packages +---------------------------------------- + +In Airflow 2.0 we standardized and enforced naming for provider packages, modules and classes. +those rules (introduced as AIP-21) were not only introduced but enforced using automated checks +that verify if the naming conventions are followed. Here is a brief summary of the rules, for +detailed discussion you can go to `AIP-21 Changes in import paths `_ + +The rules are as follows: + +* Provider packages are all placed in 'airflow.providers' + +* Providers are usually direct sub-packages of the 'airflow.providers' package but in some cases they can be + further split into sub-packages (for example 'apache' package has 'cassandra', 'druid' ... providers ) out + of which several different provider packages are produced (apache.cassandra, apache.druid). This is + case when the providers are connected under common umbrella but very loosely coupled on the code level. + +* In some cases the package can have sub-packages but they are all delivered as single provider + package (for example 'google' package contains 'ads', 'cloud' etc. sub-packages). This is in case + the providers are connected under common umbrella and they are also tightly coupled on the code level. + +* Typical structure of provider package: + * example_dags -> example DAGs are stored here (used for documentation and System Tests) + * hooks -> hooks are stored here + * operators -> operators are stored here + * sensors -> sensors are stored here + * secrets -> secret backends are stored here + * transfers -> transfer operators are stored here + +* Module names do not contain word "hooks", "operators" etc. The right type comes from + the package. For example 'hooks.datastore' module contains DataStore hook and 'operators.datastore' + contains DataStore operators. + +* Class names contain 'Operator', 'Hook', 'Sensor' - for example DataStoreHook, DataStoreExportOperator + +* Operator name usually follows the convention: ``Operator`` + (BigQueryExecuteQueryOperator) is a good example + +* Transfer Operators are those that actively push data from one service/provider and send it to another + service (might be for the same or another provider). This usually involves two hooks. The convention + for those ``ToOperator``. They are not named *TransferOperator nor *Transfer. + +* Operators that use external service to perform transfer (for example CloudDataTransferService operators + are not placed in "transfers" package and do not have to follow the naming convention for + transfer operators. + +* It is often debatable where to put transfer operators but we agreed to the following criteria: + + * We use "maintainability" of the operators as the main criteria - so the transfer operator + should be kept at the provider which has highest "interest" in the transfer operator + + * For Cloud Providers or Service providers that usually means that the transfer operators + should land at the "target" side of the transfer + +* Secret Backend name follows the convention: ``Backend``. + +* Tests are grouped in parallel packages under "tests.providers" top level package. Module name is usually + ``test_.py``, + +* System tests (not yet fully automated but allowing to run e2e testing of particular provider) are + named with _system.py suffix. + +Documentation for the community managed providers +------------------------------------------------- + +When you are developing a community-managed provider, you are supposed to make sure it is well tested +and documented. Part of the documentation is ``provider.yaml`` file ``integration`` information and +``version`` information. This information is stripped-out from provider info available at runtime, +however it is used to automatically generate documentation for the provider. + +If you have pre-commits installed, pre-commit will warn you and let you know what changes need to be +done in the ``provider.yaml`` file when you add a new Operator, Hooks, Sensor or Transfer. You can +also take a look at the other ``provider.yaml`` files as examples. + +Well documented provider contains those: + +* index.rst with references to packages, API used and example dags +* configuration reference +* class documentation generated from PyDoc in the code +* example dags +* how-to guides + +You can see for example ``google`` provider which has very comprehensive documentation: + +* `Documentation <../docs/apache-airflow-providers-google>`_ +* `System tests/Example DAGs <../tests/system/providers>`_ + +Part of the documentation are example dags (placed in the ``tests/system`` folder). The reason why +they are in ``tests/system`` is because we are using the example dags for various purposes: + +* showing real examples of how your provider classes (Operators/Sensors/Transfers) can be used +* snippets of the examples are embedded in the documentation via ``exampleinclude::`` directive +* examples are executable as system tests and some of our stakeholders run them regularly to + check if ``system`` level integration is still working, before releasing a new version of the provider. + +Testing the community managed providers +--------------------------------------- + +We have high requirements when it comes to testing the community managed providers. We have to be sure +that we have enough coverage and ways to tests for regressions before the community accepts such +providers. + +* Unit tests have to be comprehensive and they should tests for possible regressions and edge cases + not only "green path" + +* Integration tests where 'local' integration with a component is possible (for example tests with + MySQL/Postgres DB/Trino/Kerberos all have integration tests which run with real, dockerized components + +* System Tests which provide end-to-end testing, usually testing together several operators, sensors, + transfers connecting to a real external system + +------ + +You can read about airflow `dependencies and extras <12_airflow_dependencies_and_extras.rst>`_ . diff --git a/contributing-docs/12_airflow_dependencies_and_extras.rst b/contributing-docs/12_airflow_dependencies_and_extras.rst new file mode 100644 index 0000000000000..c9124d1ccbd6f --- /dev/null +++ b/contributing-docs/12_airflow_dependencies_and_extras.rst @@ -0,0 +1,222 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Airflow dependencies +==================== + +Airflow is not a standard python project. Most of the python projects fall into one of two types - +application or library. As described in +`this StackOverflow question `_, +the decision whether to pin (freeze) dependency versions for a python project depends on the type. For +applications, dependencies should be pinned, but for libraries, they should be open. + +For applications, pinning the dependencies makes it more stable to install in the future - because new +(even transitive) dependencies might cause installation to fail. For libraries - the dependencies should +be open to allow several different libraries with the same requirements to be installed at the same time. + +The problem is that Apache Airflow is a bit of both - application to install and library to be used when +you are developing your own operators and DAGs. + +This - seemingly unsolvable - puzzle is solved by having pinned constraints files. + +.. contents:: :local: + +Pinned constraint files +----------------------- + +.. note:: + + Only ``pip`` installation is officially supported. + + While it is possible to install Airflow with tools like `poetry `_ or + `pip-tools `_, they do not share the same workflow as + ``pip`` - especially when it comes to constraint vs. requirements management. + Installing via ``Poetry`` or ``pip-tools`` is not currently supported. + + There are known issues with ``bazel`` that might lead to circular dependencies when using it to install + Airflow. Please switch to ``pip`` if you encounter such problems. The ``Bazel`` community added support + for cycles in `this PR `_ so it might be that + newer versions of ``bazel`` will handle it. + + If you wish to install airflow using these tools you should use the constraint files and convert + them to appropriate format and workflow that your tool requires. + + +By default when you install ``apache-airflow`` package - the dependencies are as open as possible while +still allowing the ``apache-airflow`` package to install. This means that the ``apache-airflow`` package +might fail to install when a direct or transitive dependency is released that breaks the installation. +In that case, when installing ``apache-airflow``, you might need to provide additional constraints (for +example ``pip install apache-airflow==1.10.2 Werkzeug<1.0.0``) + +There are several sets of constraints we keep: + +* 'constraints' - these are constraints generated by matching the current airflow version from sources + and providers that are installed from PyPI. Those are constraints used by the users who want to + install airflow with pip, they are named ``constraints-.txt``. + +* "constraints-source-providers" - these are constraints generated by using providers installed from + current sources. While adding new providers their dependencies might change, so this set of providers + is the current set of the constraints for airflow and providers from the current main sources. + Those providers are used by CI system to keep "stable" set of constraints. They are named + ``constraints-source-providers-.txt`` + +* "constraints-no-providers" - these are constraints generated from only Apache Airflow, without any + providers. If you want to manage airflow separately and then add providers individually, you can + use them. Those constraints are named ``constraints-no-providers-.txt``. + +The first two can be used as constraints file when installing Apache Airflow in a repeatable way. +It can be done from the sources: + +from the PyPI package: + +.. code-block:: bash + + pip install apache-airflow[google,amazon,async]==2.2.5 \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.5/constraints-3.8.txt" + +The last one can be used to install Airflow in "minimal" mode - i.e when bare Airflow is installed without +extras. + +When you install airflow from sources (in editable mode) you should use "constraints-source-providers" +instead (this accounts for the case when some providers have not yet been released and have conflicting +requirements). + +.. code-block:: bash + + pip install -e ".[devel]" \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt" + + +This also works with extras - for example: + +.. code-block:: bash + + pip install ".[ssh]" \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-source-providers-3.8.txt" + + +There are different set of fixed constraint files for different python major/minor versions and you should +use the right file for the right python version. + +If you want to update just the Airflow dependencies, without paying attention to providers, you can do it +using ``constraints-no-providers`` constraint files as well. + +.. code-block:: bash + + pip install . --upgrade \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt" + + +The ``constraints-.txt`` and ``constraints-no-providers-.txt`` +will be automatically regenerated by CI job every time after the ``pyproject.toml`` is updated and pushed +if the tests are successful. + + +.. note:: + + Only ``pip`` installation is currently officially supported. + + While there are some successes with using other tools like `poetry `_ or + `pip-tools `_, they do not share the same workflow as + ``pip`` - especially when it comes to constraint vs. requirements management. + Installing via ``Poetry`` or ``pip-tools`` is not currently supported. + + There are known issues with ``bazel`` that might lead to circular dependencies when using it to install + Airflow. Please switch to ``pip`` if you encounter such problems. ``Bazel`` community works on fixing + the problem in `this PR `_ so it might be that + newer versions of ``bazel`` will handle it. + + If you wish to install airflow using these tools you should use the constraint files and convert + them to appropriate format and workflow that your tool requires. + + +Optional dependencies (extras) +------------------------------ + +There are a number of extras that can be specified when installing Airflow. Those +extras can be specified after the usual pip install - for example ``pip install -e.[ssh]`` for editable +installation. Note that there are two kinds of extras - ``regular`` extras (used when you install +airflow as a user, but in ``editable`` mode you can also install ``devel`` extras that are necessary if +you want to run airflow locally for testing and ``doc`` extras that install tools needed to build +the documentation. + +This is the full list of these extras: + +Devel extras +............. + +The ``devel`` extras are not available in the released packages. They are only available when you install +Airflow from sources in ``editable`` installation - i.e. one that you are usually using to contribute to +Airflow. They provide tools such as ``pytest`` and ``mypy`` for general purpose development and testing, also +some providers have their own development-related extras tbat allow to install tools necessary to run tests, +where the tools are specific for the provider. + + + .. START DEVEL EXTRAS HERE + +devel, devel-all, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel- +hadoop, devel-mypy, devel-sentry, devel-static-checks, devel-tests + + .. END DEVEL EXTRAS HERE + +Doc extras +........... + +The ``doc`` extras are not available in the released packages. They are only available when you install +Airflow from sources in ``editable`` installation - i.e. one that you are usually using to contribute to +Airflow. They provide tools needed when you want to build Airflow documentation (note that you also need +``devel`` extras installed for airflow and providers in order to build documentation for airflow and +provider packages respectively). The ``doc`` package is enough to build regular documentation, where +``doc_gen`` is needed to generate ER diagram we have describing our database. + + .. START DOC EXTRAS HERE + +doc, doc-gen + + .. END DOC EXTRAS HERE + + +Regular extras +.............. + +Those extras are available as regular Airflow extras and are targeted to be used by Airflow users and +contributors to select features of Airflow they want to use They might install additional providers or +just install dependencies that are necessary to enable the feature. + + .. START REGULAR EXTRAS HERE + +aiobotocore, airbyte, alibaba, all, all-core, all-dbs, amazon, apache-atlas, apache-beam, apache- +cassandra, apache-drill, apache-druid, apache-flink, apache-hdfs, apache-hive, apache-impala, +apache-kafka, apache-kylin, apache-livy, apache-pig, apache-pinot, apache-spark, apache-webhdfs, +apprise, arangodb, asana, async, atlas, atlassian-jira, aws, azure, cassandra, celery, cgroups, +cloudant, cncf-kubernetes, cohere, common-io, common-sql, crypto, databricks, datadog, dbt-cloud, +deprecated-api, dingding, discord, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp, +gcp_api, github, github-enterprise, google, google-auth, graphviz, grpc, hashicorp, hdfs, hive, +http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, ldap, leveldb, microsoft-azure, +microsoft-mssql, microsoft-psrp, microsoft-winrm, mongo, mssql, mysql, neo4j, odbc, openai, +openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, pandas, papermill, password, +pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, salesforce, samba, saml, +segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, spark, sqlite, ssh, statsd, +tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, webhdfs, winrm, yandex, zendesk + + .. END REGULAR EXTRAS HERE + + +----- + +You can now check how to update Airflow's `metadata database <13_metadata_database_updates.rst>`__ if you need +to update structure of the DB. diff --git a/contributing-docs/13_metadata_database_updates.rst b/contributing-docs/13_metadata_database_updates.rst new file mode 100644 index 0000000000000..fc82e0d8bde8e --- /dev/null +++ b/contributing-docs/13_metadata_database_updates.rst @@ -0,0 +1,53 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Metadata Database Updates +========================= + +When developing features, you may need to persist information to the metadata +database. Airflow has `Alembic `__ built-in +module to handle all schema changes. Alembic must be installed on your +development machine before continuing with migration. + + +.. code-block:: bash + + # starting at the root of the project + $ pwd + ~/airflow + # change to the airflow directory + $ cd airflow + $ alembic revision -m "add new field to db" + Generating + ~/airflow/airflow/migrations/versions/a1e23c41f123_add_new_field_to_db.py + +Note that migration file names are standardized by pre-commit hook ``update-migration-references``, so that they sort alphabetically and indicate +the Airflow version in which they first appear (the alembic revision ID is removed). As a result you should expect to see a pre-commit failure +on the first attempt. Just stage the modified file and commit again +(or run the hook manually before committing). + +After your new migration file is run through pre-commit it will look like this: + +.. code-block:: + + 1234_A_B_C_add_new_field_to_db.py + +This represents that your migration is the 1234th migration and expected for release in Airflow version A.B.C. + +-------- + +You can also learn how to setup your `Node environment <14_node_environment_setup.rst>`__ if you want to develop Airflow UI. diff --git a/contributing-docs/14_node_environment_setup.rst b/contributing-docs/14_node_environment_setup.rst new file mode 100644 index 0000000000000..15c450ebbd2ea --- /dev/null +++ b/contributing-docs/14_node_environment_setup.rst @@ -0,0 +1,118 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Node.js Environment Setup +========================= + +``airflow/www/`` contains all yarn-managed, front-end assets. Flask-Appbuilder +itself comes bundled with jQuery and bootstrap. While they may be phased out +over time, these packages are currently not managed with yarn. + +Make sure you are using recent versions of node and yarn. No problems have been +found with node\>=8.11.3 and yarn\>=1.19.1. The pre-commit framework of ours install +node and yarn automatically when installed - if you use ``breeze`` you do not need to install +neither node nor yarn. + +.. contents:: :local: + +Installing yarn and its packages manually +----------------------------------------- + +To install yarn on macOS: + +1. Run the following commands (taken from `this source `__): + +.. code-block:: bash + + brew install node + brew install yarn + yarn config set prefix ~/.yarn + + +2. Add ``~/.yarn/bin`` to your ``PATH`` so that commands you are installing + could be used globally. + +3. Set up your ``.bashrc`` file and then ``source ~/.bashrc`` to reflect the + change. + +.. code-block:: bash + + export PATH="$HOME/.yarn/bin:$PATH" + +4. Install third-party libraries defined in ``package.json`` by running the following command + +.. code-block:: bash + + yarn install + +Generate Bundled Files with yarn +-------------------------------- + +To parse and generate bundled files for Airflow, run either of the following +commands: + +.. code-block:: bash + + # Compiles the production / optimized js & css + yarn run prod + + # Starts a web server that manages and updates your assets as you modify them + # You'll need to run the webserver in debug mode too: ``airflow webserver -d`` + yarn run dev + +Follow Style Guide +------------------ + +We try to enforce a more consistent style and follow the Javascript/Typescript community +guidelines. + +Once you add or modify any JS/TS code in the project, please make sure it +follows the guidelines defined in `Airbnb +JavaScript Style Guide `__. + +Apache Airflow uses `ESLint `__ as a tool for identifying and +reporting issues in JS/TS, and `Prettier `__ for code formatting. +Most IDE directly integrate with these tools, you can also manually run them with any of the following commands: + +.. code-block:: bash + + # Format code in .js, .jsx, .ts, .tsx, .json, .css, .html files + yarn format + + # Check JS/TS code in .js, .jsx, .ts, .tsx, .html files and report any errors/warnings + yarn run lint + + # Check JS/TS code in .js, .jsx, .ts, .tsx, .html files and report any errors/warnings and fix them if possible + yarn run lint:fix + + # Run tests for all .test.js, .test.jsx, .test.ts, test.tsx files + yarn test + +React, JSX and Chakra +----------------------------- + +In order to create a more modern UI, we have started to include `React `__ in the ``airflow/www/`` project. +If you are unfamiliar with React then it is recommended to check out their documentation to understand components and jsx syntax. + +We are using `Chakra UI `__ as a component and styling library. Notably, all styling is done in a theme file or +inline when defining a component. There are a few shorthand style props like ``px`` instead of ``padding-right, padding-left``. +To make this work, all Chakra styling and css styling are completely separate. It is best to think of the React components as a separate app +that lives inside of the main app. + +------ + +If you happen to change architecture of Airflow, you can learn how we create our `Architecture diagrams <15_architecture_diagrams.rst>`__. diff --git a/contributing-docs/15_architecture_diagrams.rst b/contributing-docs/15_architecture_diagrams.rst new file mode 100644 index 0000000000000..1a190544a1c12 --- /dev/null +++ b/contributing-docs/15_architecture_diagrams.rst @@ -0,0 +1,64 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Architecture Diagrams +===================== + +We started to use (and gradually convert old diagrams to use it) `Diagrams `_ +as our tool of choice to generate diagrams. The diagrams are generated from Python code and can be +automatically updated when the code changes. The diagrams are generated using pre-commit hooks (See +static checks below) but they can also be generated manually by running the corresponding Python code. + +To run the code you need to install the dependencies in the virtualenv you use to run it: +* ``pip install diagrams rich``. You need to have graphviz installed in your +system (``brew install graphviz`` on macOS for example). + +The source code of the diagrams are next to the generated diagram, the difference is that the source +code has ``.py`` extension and the generated diagram has ``.png`` extension. The pre-commit hook ``generate-airflow-diagrams`` +will look for ``diagram_*.py`` files in the ``docs`` subdirectories +to find them and runs them when the sources changed and the diagrams are not up to date (the +pre-commit will automatically generate an .md5sum hash of the sources and store it next to the diagram +file). + +In order to generate the diagram manually you can run the following command: + +.. code-block:: bash + + python .py + +You can also generate all diagrams by: + +.. code-block:: bash + + pre-commit run generate-airflow-diagrams + +or with Breeze: + +.. code-block:: bash + + breeze static-checks --type generate-airflow-diagrams --all-files + +When you iterate over a diagram, you can also setup a "save" action in your IDE to run the python +file automatically when you save the diagram file. + +Once you've done iteration and you are happy with the diagram, you can commit the diagram, the source +code and the .md5sum file. The pre-commit hook will then not run the diagram generation until the +source code for it changes. + +---- + +You can now see an overview of the whole `contribution workflow <16_contribution_workflow.rst>`__ diff --git a/contributing-docs/16_contribution_workflow.rst b/contributing-docs/16_contribution_workflow.rst new file mode 100644 index 0000000000000..0ee07c955894e --- /dev/null +++ b/contributing-docs/16_contribution_workflow.rst @@ -0,0 +1,317 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Contribution Workflow +===================== + +.. contents:: :local: + +Typically, you start your first contribution by reviewing open tickets +at `GitHub issues `__. + +If you create pull-request, you don't have to create an issue first, but if you want, you can do it. +Creating an issue will allow you to collect feedback or share plans with other people. + +For example, you want to have the following sample ticket assigned to you: +`#7782: Add extra CC: to the emails sent by Airflow `_. + +In general, your contribution includes the following stages: + +.. image:: images/workflow.png + :align: center + :alt: Contribution Workflow + +1. Make your own `fork `__ of + the Apache Airflow `main repository `__. + +2. Create a `local virtualenv <07_local_virtualenv.rst>`_, + initialize the `Breeze environment `__, and + install `pre-commit framework <08_static_code_checks.rst#pre-commit-hooks>`__. + If you want to add more changes in the future, set up your fork and enable GitHub Actions. + +3. Join `devlist `__ + and set up a `Slack account `__. + +4. Make the change and create a `Pull Request (PR) from your fork `__. + +5. Ping @ #development slack, comment @people. Be annoying. Be considerate. + +Step 1: Fork the Apache Airflow Repo +------------------------------------ +From the `apache/airflow `_ repo, +`create a fork `_: + +.. image:: images/fork.png + :align: center + :alt: Creating a fork + + +Step 2: Configure Your Environment +---------------------------------- + +You can use several development environments for Airflow. If you prefer to have development environments +on your local machine, you might choose Local Virtualenv, or dockerized Breeze environment, however we +also have support for popular remote development environments: GitHub Codespaces and GitPodify. +You can see the differences between the various environments in `Development environments `__. + +The local env instructions can be found in full in the `Local virtualenv <07_local_virtualenv.rst>`_ file. + +The Breeze Docker Compose env is to maintain a consistent and common development environment so that you +can replicate CI failures locally and work on solving them locally rather by pushing to CI. + +The Breeze instructions can be found in full in the `Breeze documentation <../dev/breeze/doc/README.rst>`_ file. + +You can configure the Docker-based Breeze development environment as follows: + +1. Install the latest versions of the `Docker Community Edition `_ and +`Docker Compose `_ and add them to the PATH. + +2. Install `jq`_ on your machine. The exact command depends on the operating system (or Linux distribution) you use. + +.. _jq: https://stedolan.github.io/jq/ + +For example, on Ubuntu: + +.. code-block:: bash + + sudo apt install jq + +or on macOS with `Homebrew `_ + +.. code-block:: bash + + brew install jq + +3. Enter Breeze, and run the following in the Airflow source code directory: + +.. code-block:: bash + + breeze + +Breeze starts with downloading the Airflow CI image from +the Docker Hub and installing all required dependencies. + +This will enter the Docker environment and mount your local sources +to make them immediately visible in the environment. + +4. Create a local virtualenv, for example: + +.. code-block:: bash + + mkvirtualenv myenv --python=python3.9 + +5. Initialize the created environment: + +.. code-block:: bash + + ./scripts/tools/initialize_virtualenv.py + + +6. Open your IDE (for example, PyCharm) and select the virtualenv you created + as the project's default virtualenv in your IDE. + +Step 3: Connect with People +--------------------------- + +For effective collaboration, make sure to join the following Airflow groups: + +- Mailing lists: + + - Developer's mailing list ``_ + (quite substantial traffic on this list) + + - All commits mailing list: ``_ + (very high traffic on this list) + + - Airflow users mailing list: ``_ + (reasonably small traffic on this list) + +- `Issues on GitHub `__ + +- `Slack (chat) `__ + +Step 4: Prepare PR +------------------ + +1. Update the local sources to address the issue. + + For example, to address this example issue, do the following: + + * Read about `email configuration in Airflow `__. + + * Find the class you should modify. For the example GitHub issue, + this is `email.py `__. + + * Find the test class where you should add tests. For the example ticket, + this is `test_email.py `__. + + * Make sure your fork's main is synced with Apache Airflow's main before you create a branch. See + `How to sync your fork <#how-to-sync-your-fork>`_ for details. + + * Create a local branch for your development. Make sure to use latest + ``apache/main`` as base for the branch. See `How to Rebase PR <#how-to-rebase-pr>`_ for some details + on setting up the ``apache`` remote. Note, some people develop their changes directly in their own + ``main`` branches - this is OK and you can make PR from your main to ``apache/main`` but we + recommend to always create a local branch for your development. This allows you to easily compare + changes, have several changes that you work on at the same time and many more. + If you have ``apache`` set as remote then you can make sure that you have latest changes in your main + by ``git pull apache main`` when you are in the local ``main`` branch. If you have conflicts and + want to override your locally changed main you can override your local changes with + ``git fetch apache; git reset --hard apache/main``. + + * Modify the class and add necessary code and unit tests. + + * Run and fix all the `static checks <08_static_code_checks.rst>`__. If you have + `pre-commits installed <08_static_code_checks.rst#pre-commit-hooks>`__, + this step is automatically run while you are committing your code. If not, you can do it manually + via ``git add`` and then ``pre-commit run``. + + * Run the appropriate tests as described in `Testing documentation <09_testing.rst>`__. + + * Consider adding a newsfragment to your PR so you can add an entry in the release notes. + The following newsfragment types are supported: + + * `significant` + * `feature` + * `improvement` + * `bugfix` + * `doc` + * `misc` + + To add a newsfragment, create an ``rst`` file named ``{pr_number}.{type}.rst`` (e.g. ``1234.bugfix.rst``) + and place in either `newsfragments `__ for core newsfragments, + or `chart/newsfragments `__ for helm chart newsfragments. + + In general newsfragments must be one line. For newsfragment type ``significant``, you may include summary and body separated by a blank line, similar to ``git`` commit messages. + +2. Rebase your fork, squash commits, and resolve all conflicts. See `How to rebase PR <#how-to-rebase-pr>`_ + if you need help with rebasing your change. Remember to rebase often if your PR takes a lot of time to + review/fix. This will make rebase process much easier and less painful and the more often you do it, + the more comfortable you will feel doing it. + +3. Re-run static code checks again. + +4. Make sure your commit has a good title and description of the context of your change, enough + for maintainers reviewing it to understand why you are proposing a change. Make sure to follow other + PR guidelines described in `Pull Request guidelines <#pull-request-guidelines>`_. + Create Pull Request! Make yourself ready for the discussion! + +5. The ``static checks`` and ``tests`` in your PR serve as a first-line-of-check, whether the PR + passes the quality bar for Airflow. It basically means that until you get your PR green, it is not + likely to get reviewed by maintainers unless you specifically ask for it and explain that you would like + to get first pass of reviews and explain why achieving ``green`` status for it is not easy/feasible/desired. + Similarly if your PR contains ``[WIP]`` in the title or it is marked as ``Draft`` it is not likely to get + reviewed by maintainers unless you specifically ask for it and explain why and what specifically you want + to get reviewed before it reaches ``Ready for review`` status. This might happen if you want to get initial + feedback on the direction of your PR or if you want to get feedback on the design of your PR. + +6. Avoid @-mentioning individual maintainers in your PR, unless you have good reason to believe that they are + available, have time and/or interest in your PR. Generally speaking there are no "exclusive" reviewers for + different parts of the code. Reviewers review PRs and respond when they have some free time to spare and + when they feel they can provide some valuable feedback. If you want to get attention of maintainers, you can just + follow-up on your PR and ask for review in general, however be considerate and do not expect "immediate" + reviews. People review when they have time, most of the maintainers do such reviews in their + free time, which is taken away from their families and other interests, so allow sufficient time before you + follow-up - but if you see no reaction in several days, do follow-up, as with the number of PRs we have + daily, some of them might simply fall through the cracks, and following up shows your interest in completing + the PR as well as puts it at the top of "Recently commented" PRs. However, be considerate and mindful of + the time zones, holidays, busy periods, and expect that some discussions and conversation might take time + and get stalled occasionally. Generally speaking it's the author's responsibility to follow-up on the PR when + they want to get it reviewed and merged. + + +Step 5: Pass PR Review +---------------------- + +.. image:: images/review.png + :align: center + :alt: PR Review + +Note that maintainers will use **Squash and Merge** instead of **Rebase and Merge** +when merging PRs and your commit will be squashed to single commit. + +When a reviewer starts a conversation it is expected that you respond to questions, suggestions, doubts, +and generally it's great if all such conversations seem to converge to a common understanding. You do not +necessarily have to apply all the suggestions (often they are just opinions and suggestions even if they are +coming from seasoned maintainers) - it's perfectly ok that you respond to it with your own opinions and +understanding of the problem and your approach and if you have good arguments, presenting them is a good idea. + +The reviewers might leave several types of responses: + +* ``General PR comment`` - which usually means that there is a question/opinion/suggestion on how the PR can be + improved, or it's an ask to explain how you understand the PR. You can usually quote some parts of such + general comment and respond to it in your comments. Often comments that are raising questions in general + might lead to different discussions, even a request to move the discussion to the devlist or even lead to + completely new PRs created as a spin-off of the discussion. + +* ``Comment/Conversation around specific lines of code`` - such conversation usually flags a potential + improvement, or a potential problem with the code. It's a good idea to respond to such comments and explain + your approach and understanding of the problem. The whole idea of a conversation is try to reach a consensus + on a good way to address the problem. As an author you can resolve the conversation if you think the + problem raised in the comment is resolved or ask the reviewer to re-review, confirm If you do not understand + the comment, you can ask for clarifications. Generally assume good intention of the person who is reviewing + your code and resolve conversations also having good intentions. Understand that it's not a person that + is criticised or argued with, but rather the code and the approach. The important thing is to take care + about quality of the the code and the project and want to make sure that the code is good. + + It's ok to mark the conversation resolved by anyone who can do it - it could be the author, who thinks + the arguments are changes implemented make the conversation resolved, or the maintainer/person who + started the conversation or it can be even marked as resolved by the maintainer who attempts to merge the + PR and thinks that all conversations are resolved. However if you want to make sure attention and decision + on merging the PR is given by maintainer, make sure you monitor, follow-up and close the conversations when + you think they are resolved (ideally explaining why you think the conversation is resolved). + +* ``Request changes`` - this is where maintainer is pretty sure that you should make a change to your PR + because it contains serious flaw, design misconception, or a bug or it is just not in-line with the common + approach Airflow community took on the issue. Usually you should respond to such request and either fix + the problem or convince the maintainer that they were wrong (it happens more often than you think). + Sometimes even if you do not agree with the request, it's a good idea to make the change anyway, because + it might be a good idea to follow the common approach in the project. Sometimes it might even happen that + two maintainers will have completely different opinions on the same issue and you will have to lead the + discussion to try to achieve consensus. If you cannot achieve consensus and you think it's an important + issue, you can ask for a vote on the issue by raising a devlist discussion - where you explain your case + and follow up the discussion with a vote when you cannot achieve consensus there. The ``Request changes`` + status can be withdrawn by the maintainer, but if they don't - such PR cannot be merged - maintainers have + the right to veto any code modification according to the `Apache Software Foundation rules `_. + +* ``Approval`` - this is given by a maintainer after the code has been reviewed and the maintainer agrees that + it is a good idea to merge it. There might still be some unresolved conversations, requests and questions on + such PR and you are expected to resolve them before the PR is merged. But the ``Approval`` status is a sign + of trust from the maintainer who gave the approval that they think the PR is good enough as long as their + comments will be resolved and they put the trust in the hands of the author and - possibly - other + maintainers who will merge the request that they can do that without follow-up re-review and verification. + + +You need to have ``Approval`` of at least one maintainer (if you are maintainer yourself, it has to be +another maintainer). Ideally you should have 2 or more maintainers reviewing the code that touches +the core of Airflow - we do not have enforcement about ``2+`` reviewers required for Core of Airflow, +but maintainers will generally ask in the PR if they think second review is needed. + +Your PR can be merged by a maintainer who will see that the PR is approved, all conversations are resolved +and the code looks good. The criteria for PR being merge-able are: + +* ``green status for static checks and tests`` +* ``conversations resolved`` +* ``approval from 1 (or more for core changes) maintainers`` +* no unresolved ``Request changes`` + +Once you reach the status, you do not need to do anything to get the PR merged. One of the maintainers +will merge such PRs. However if you see that for a few days such a PR is not merged, do not hesitate to comment +on your PR and mention that you think it is ready to be merged. Also, it's a good practice to rebase your PR +to latest ``main``, because there could be other changes merged in the meantime that might cause conflicts or +fail tests or static checks, so by rebasing a PR that has been build few days ago you make sure that it +still passes the tests and static checks today. diff --git a/contributing-docs/README.rst b/contributing-docs/README.rst new file mode 100644 index 0000000000000..d2fc5c7c7e859 --- /dev/null +++ b/contributing-docs/README.rst @@ -0,0 +1,118 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Contributors' guide +=================== + +Contributions are welcome and are greatly appreciated! Every little bit helps, +and credit will always be given. + +This index of linked documents aims to explain the subject of contributions if you have not contributed to +any Open Source project, but it will also help people who have contributed to other projects learn about the +rules of that community. + +.. contents:: :local: + +New Contributor +--------------- + +If you are a new contributor, please follow the `Contributors Quick Start <./contributing-docs/03_contributors_quick_start.rst>`__ +guide to get a gentle step-by-step introduction to setting up the development environment and making your +first contribution. + +If you are new to the project, you might need some help in understanding how the dynamics +of the community works and you might need to get some mentorship from other members of the +community - mostly Airflow committers (maintainers). Mentoring new members of the community is part of +maintainers job so do not be afraid of asking them to help you. You can do it +via comments in your PR, asking on a devlist or via Slack. For your convenience, +we have a dedicated ``#development-first-pr-support`` Slack channel where you can ask any questions +about making your first Pull Request (PR) contribution to the Airflow codebase - it's a safe space +where it is expected that people asking questions do not know a lot Airflow (yet!). +If you need help with Airflow see the Slack channel #troubleshooting. + +To check on how mentoring works for the projects under Apache Software Foundation's +`Apache Community Development - Mentoring `_. + +Basic contributing tasks +------------------------ + +You can learn about various roles and communication channels in the Airflow project, + +* `Roles in Airflow Project <01_roles_in_airflow_project.rst>`__ describes + the roles in the Airflow project and how they relate to each other. + +* `How to communicate <02_how_to_communicate.rst>`__ + describes how to communicate with the community and how to get help. + +You can learn how to setup your environment for development and how to develop and test code: + +* `Contributors quick start <03_contributors_quick_start.rst>`__ describes + how to set up your development environment and make your first contribution. There are also more + detailed documents describing how to set up your development environment for specific IDE/environment: + +* `How to contribute <04_how_to_contribute.rst>`__ describes various ways how you can contribute to Airflow. + +* `Pull requests <05_pull_requests.rst>`__ describes how you can create pull requests and you can learn + there what are the pull request guidelines and coding standards. + +* `Development environment <06_development_environments.rst>`__ describes the developments environment + used in Airflow. + + * `Local virtualenv <07_local_virtualenv.rst>`__ describes the setup and details of the local virtualenv + development environment. + + * `Breeze <../dev/breeze/doc/README.rst>`__ describes the setup and details of the Breeze development environment. + +* `Static code checks <08_static_code_checks.rst>`__ describes the static code checks used in Airflow. + +* `Testing <09_testing.rst>`__ describes what kind of tests we have and how to run them. + +* `Building documentation <../docs/README.rst>`__ describes how to build the documentation locally. + +* `Working with Git <10_working_with_git.rst>`__ describes the Git branches used in Airflow, + how to sync your fork and how to rebase your PR. + +Developing providers +-------------------- + +You can learn how Airflow repository is a monorepo split into airflow and provider packages, +and how to contribute to the providers: + +* `Provider packages <11_provider_packages.rst>`__ describes the provider packages and how they + are used in Airflow. + + +Deep dive into specific topics +------------------------------ + +Once you can also dive deeper into specific areas that are important for contributing to Airflow: + +* `Airflow dependencies and extras <12_airflow_dependencies_and_extras.rst>`__ describes + the dependencies - both required and optional (extras) used in Airflow. + +* `Metadata database updates <13_metadata_database_updates.rst>`__ describes + how to make changes in the metadata database. + +* `Node environment setup <14_node_environment_setup.rst>`__ describes how to set up + the node environment for Airflow UI. + +* `Architecture diagram <15_architecture_diagrams.rst>`__ describes how to create and + update the architecture diagrams embedded in Airflow documentation. + +Finally there is an overview of the overall contribution workflow that you should follow + +* `Contribution workflow <16_contribution_workflow.rst>`__ describes the workflow of contributing to Airflow. diff --git a/images/AirflowBreeze_logo.png b/contributing-docs/images/AirflowBreeze_logo.png similarity index 100% rename from images/AirflowBreeze_logo.png rename to contributing-docs/images/AirflowBreeze_logo.png diff --git a/images/airflow_unit_test_mode.png b/contributing-docs/images/airflow_unit_test_mode.png similarity index 100% rename from images/airflow_unit_test_mode.png rename to contributing-docs/images/airflow_unit_test_mode.png diff --git a/images/candidates_for_backtrack_triggers.png b/contributing-docs/images/candidates_for_backtrack_triggers.png similarity index 100% rename from images/candidates_for_backtrack_triggers.png rename to contributing-docs/images/candidates_for_backtrack_triggers.png diff --git a/images/database_view.png b/contributing-docs/images/database_view.png similarity index 100% rename from images/database_view.png rename to contributing-docs/images/database_view.png diff --git a/images/disk_space_osx.png b/contributing-docs/images/disk_space_osx.png similarity index 100% rename from images/disk_space_osx.png rename to contributing-docs/images/disk_space_osx.png diff --git a/images/docker_socket.png b/contributing-docs/images/docker_socket.png similarity index 100% rename from images/docker_socket.png rename to contributing-docs/images/docker_socket.png diff --git a/images/docker_wsl_integration.png b/contributing-docs/images/docker_wsl_integration.png similarity index 100% rename from images/docker_wsl_integration.png rename to contributing-docs/images/docker_wsl_integration.png diff --git a/images/fork.png b/contributing-docs/images/fork.png similarity index 100% rename from images/fork.png rename to contributing-docs/images/fork.png diff --git a/images/pycharm_debug_breeze.png b/contributing-docs/images/pycharm_debug_breeze.png similarity index 100% rename from images/pycharm_debug_breeze.png rename to contributing-docs/images/pycharm_debug_breeze.png diff --git a/images/quick_start/add Interpreter.png b/contributing-docs/images/quick_start/add Interpreter.png similarity index 100% rename from images/quick_start/add Interpreter.png rename to contributing-docs/images/quick_start/add Interpreter.png diff --git a/images/quick_start/add_configuration.png b/contributing-docs/images/quick_start/add_configuration.png similarity index 100% rename from images/quick_start/add_configuration.png rename to contributing-docs/images/quick_start/add_configuration.png diff --git a/images/quick_start/add_env_variable.png b/contributing-docs/images/quick_start/add_env_variable.png similarity index 100% rename from images/quick_start/add_env_variable.png rename to contributing-docs/images/quick_start/add_env_variable.png diff --git a/images/quick_start/airflow_clone.png b/contributing-docs/images/quick_start/airflow_clone.png similarity index 100% rename from images/quick_start/airflow_clone.png rename to contributing-docs/images/quick_start/airflow_clone.png diff --git a/images/quick_start/airflow_fork.png b/contributing-docs/images/quick_start/airflow_fork.png similarity index 100% rename from images/quick_start/airflow_fork.png rename to contributing-docs/images/quick_start/airflow_fork.png diff --git a/images/quick_start/airflow_gitpod_open_ports.png b/contributing-docs/images/quick_start/airflow_gitpod_open_ports.png similarity index 100% rename from images/quick_start/airflow_gitpod_open_ports.png rename to contributing-docs/images/quick_start/airflow_gitpod_open_ports.png diff --git a/images/quick_start/airflow_gitpod_url.png b/contributing-docs/images/quick_start/airflow_gitpod_url.png similarity index 100% rename from images/quick_start/airflow_gitpod_url.png rename to contributing-docs/images/quick_start/airflow_gitpod_url.png diff --git a/images/quick_start/ci_tests.png b/contributing-docs/images/quick_start/ci_tests.png similarity index 100% rename from images/quick_start/ci_tests.png rename to contributing-docs/images/quick_start/ci_tests.png diff --git a/images/quick_start/click_on_clone.png b/contributing-docs/images/quick_start/click_on_clone.png similarity index 100% rename from images/quick_start/click_on_clone.png rename to contributing-docs/images/quick_start/click_on_clone.png diff --git a/images/quick_start/creating_branch_1.png b/contributing-docs/images/quick_start/creating_branch_1.png similarity index 100% rename from images/quick_start/creating_branch_1.png rename to contributing-docs/images/quick_start/creating_branch_1.png diff --git a/images/quick_start/creating_branch_2.png b/contributing-docs/images/quick_start/creating_branch_2.png similarity index 100% rename from images/quick_start/creating_branch_2.png rename to contributing-docs/images/quick_start/creating_branch_2.png diff --git a/images/quick_start/local_airflow.png b/contributing-docs/images/quick_start/local_airflow.png similarity index 100% rename from images/quick_start/local_airflow.png rename to contributing-docs/images/quick_start/local_airflow.png diff --git a/images/quick_start/postgresql_connection.png b/contributing-docs/images/quick_start/postgresql_connection.png similarity index 100% rename from images/quick_start/postgresql_connection.png rename to contributing-docs/images/quick_start/postgresql_connection.png diff --git a/images/quick_start/pr1.png b/contributing-docs/images/quick_start/pr1.png similarity index 100% rename from images/quick_start/pr1.png rename to contributing-docs/images/quick_start/pr1.png diff --git a/images/quick_start/pr2.png b/contributing-docs/images/quick_start/pr2.png similarity index 100% rename from images/quick_start/pr2.png rename to contributing-docs/images/quick_start/pr2.png diff --git a/images/quick_start/pr3.png b/contributing-docs/images/quick_start/pr3.png similarity index 100% rename from images/quick_start/pr3.png rename to contributing-docs/images/quick_start/pr3.png diff --git a/images/quick_start/pycharm_clone.png b/contributing-docs/images/quick_start/pycharm_clone.png similarity index 100% rename from images/quick_start/pycharm_clone.png rename to contributing-docs/images/quick_start/pycharm_clone.png diff --git a/images/quick_start/start_airflow_tmux.png b/contributing-docs/images/quick_start/start_airflow_tmux.png similarity index 100% rename from images/quick_start/start_airflow_tmux.png rename to contributing-docs/images/quick_start/start_airflow_tmux.png diff --git a/images/quick_start/start_airflow_tmux_gitpod.png b/contributing-docs/images/quick_start/start_airflow_tmux_gitpod.png similarity index 100% rename from images/quick_start/start_airflow_tmux_gitpod.png rename to contributing-docs/images/quick_start/start_airflow_tmux_gitpod.png diff --git a/images/quick_start/vscode_add_configuration_1.png b/contributing-docs/images/quick_start/vscode_add_configuration_1.png similarity index 100% rename from images/quick_start/vscode_add_configuration_1.png rename to contributing-docs/images/quick_start/vscode_add_configuration_1.png diff --git a/images/quick_start/vscode_add_configuration_2.png b/contributing-docs/images/quick_start/vscode_add_configuration_2.png similarity index 100% rename from images/quick_start/vscode_add_configuration_2.png rename to contributing-docs/images/quick_start/vscode_add_configuration_2.png diff --git a/images/quick_start/vscode_add_configuration_3.png b/contributing-docs/images/quick_start/vscode_add_configuration_3.png similarity index 100% rename from images/quick_start/vscode_add_configuration_3.png rename to contributing-docs/images/quick_start/vscode_add_configuration_3.png diff --git a/images/quick_start/vscode_add_env_variable.png b/contributing-docs/images/quick_start/vscode_add_env_variable.png similarity index 100% rename from images/quick_start/vscode_add_env_variable.png rename to contributing-docs/images/quick_start/vscode_add_env_variable.png diff --git a/images/quick_start/vscode_click_on_clone.png b/contributing-docs/images/quick_start/vscode_click_on_clone.png similarity index 100% rename from images/quick_start/vscode_click_on_clone.png rename to contributing-docs/images/quick_start/vscode_click_on_clone.png diff --git a/images/quick_start/vscode_clone.png b/contributing-docs/images/quick_start/vscode_clone.png similarity index 100% rename from images/quick_start/vscode_clone.png rename to contributing-docs/images/quick_start/vscode_clone.png diff --git a/images/quick_start/vscode_creating_branch_1.png b/contributing-docs/images/quick_start/vscode_creating_branch_1.png similarity index 100% rename from images/quick_start/vscode_creating_branch_1.png rename to contributing-docs/images/quick_start/vscode_creating_branch_1.png diff --git a/images/quick_start/vscode_creating_branch_2.png b/contributing-docs/images/quick_start/vscode_creating_branch_2.png similarity index 100% rename from images/quick_start/vscode_creating_branch_2.png rename to contributing-docs/images/quick_start/vscode_creating_branch_2.png diff --git a/images/rebase.png b/contributing-docs/images/rebase.png similarity index 100% rename from images/rebase.png rename to contributing-docs/images/rebase.png diff --git a/images/review.png b/contributing-docs/images/review.png similarity index 100% rename from images/review.png rename to contributing-docs/images/review.png diff --git a/images/run_unittests.png b/contributing-docs/images/run_unittests.png similarity index 100% rename from images/run_unittests.png rename to contributing-docs/images/run_unittests.png diff --git a/images/setup_remote_debugging.png b/contributing-docs/images/setup_remote_debugging.png similarity index 100% rename from images/setup_remote_debugging.png rename to contributing-docs/images/setup_remote_debugging.png diff --git a/images/source_code_mapping_ide.png b/contributing-docs/images/source_code_mapping_ide.png similarity index 100% rename from images/source_code_mapping_ide.png rename to contributing-docs/images/source_code_mapping_ide.png diff --git a/images/workflow.png b/contributing-docs/images/workflow.png similarity index 100% rename from images/workflow.png rename to contributing-docs/images/workflow.png diff --git a/contributing-docs/quick-start-ide/README.rst b/contributing-docs/quick-start-ide/README.rst new file mode 100644 index 0000000000000..2aed423c2c05a --- /dev/null +++ b/contributing-docs/quick-start-ide/README.rst @@ -0,0 +1,42 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +IDE integration +=============== + +This document describes how to set up your IDE to work with Airflow. + + +Local development environments +------------------------------ + +- `PyCharm/IntelliJ `__ quick start instructions +- `VSCode `__ quick start instructions + + +Remote development environments +------------------------------- + +There are also remote development environments that you can use to develop Airflow: + +- `CodeSpaces `_ - a browser-based development + environment that you can use to develop Airflow in a browser. It is based on GitHub CodeSpaces and + is available for all GitHub users (free version has number of hours/month limitations). + +- `GitPod `_ - a browser-based development + environment that you can use to develop Airflow in a browser. It is based on GitPod and + is a paid service. diff --git a/CONTRIBUTORS_QUICK_START_CODESPACES.rst b/contributing-docs/quick-start-ide/contributors_quick_start_codespaces.rst similarity index 89% rename from CONTRIBUTORS_QUICK_START_CODESPACES.rst rename to contributing-docs/quick-start-ide/contributors_quick_start_codespaces.rst index ac20a35aa0b13..cc25114a65e59 100644 --- a/CONTRIBUTORS_QUICK_START_CODESPACES.rst +++ b/contributing-docs/quick-start-ide/contributors_quick_start_codespaces.rst @@ -38,8 +38,8 @@ Setup and develop using GitHub Codespaces 3. Once the codespace starts your terminal should be already in ``Breeze`` environment and you should be able to edit and run the tests in VS Code interface. -4. You can use `Quick start quide for Visual Studio Code `_ for details +4. You can use `Quick start quide for Visual Studio Code `_ for details as Codespaces use Visual Studio Code as interface. -Follow the `Quick start `_ for typical development tasks. +Follow the `Quick start <03_contributors_quick_start.rst>`_ for typical development tasks. diff --git a/CONTRIBUTORS_QUICK_START_GITPOD.rst b/contributing-docs/quick-start-ide/contributors_quick_start_gitpod.rst similarity index 97% rename from CONTRIBUTORS_QUICK_START_GITPOD.rst rename to contributing-docs/quick-start-ide/contributors_quick_start_gitpod.rst index 21b603ae61b8f..27405c338d092 100644 --- a/CONTRIBUTORS_QUICK_START_GITPOD.rst +++ b/contributing-docs/quick-start-ide/contributors_quick_start_gitpod.rst @@ -93,4 +93,4 @@ the first time you run tests. root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \ --email admin@example.com --firstname foo --lastname bar -Follow the `Quick start `_ for typical development tasks. +Follow the `Quick start <03_contributors_quick_start.rst>`_ for typical development tasks. diff --git a/CONTRIBUTORS_QUICK_START_PYCHARM.rst b/contributing-docs/quick-start-ide/contributors_quick_start_pycharm.rst similarity index 97% rename from CONTRIBUTORS_QUICK_START_PYCHARM.rst rename to contributing-docs/quick-start-ide/contributors_quick_start_pycharm.rst index 88c04c5545036..1bf8585b7cc7a 100644 --- a/CONTRIBUTORS_QUICK_START_PYCHARM.rst +++ b/contributing-docs/quick-start-ide/contributors_quick_start_pycharm.rst @@ -129,4 +129,4 @@ Creating a branch alt="Giving a name to a branch"> -Follow the `Quick start `_ for typical development tasks. +Follow the `Quick start <03_contributors_quick_start.rst>`_ for typical development tasks. diff --git a/CONTRIBUTORS_QUICK_START_VSCODE.rst b/contributing-docs/quick-start-ide/contributors_quick_start_vscode.rst similarity index 98% rename from CONTRIBUTORS_QUICK_START_VSCODE.rst rename to contributing-docs/quick-start-ide/contributors_quick_start_vscode.rst index 4042e1f216eb6..03c684fc6f90f 100644 --- a/CONTRIBUTORS_QUICK_START_VSCODE.rst +++ b/contributing-docs/quick-start-ide/contributors_quick_start_vscode.rst @@ -135,4 +135,4 @@ Creating a branch alt="Giving a name to a branch"> -Follow the `Quick start `_ for typical development tasks. +Follow the `Quick start <03_contributors_quick_start.rst>`_ for typical development tasks. diff --git a/contributing-docs/testing/README.rst b/contributing-docs/testing/README.rst new file mode 100644 index 0000000000000..829a9fbbfcf97 --- /dev/null +++ b/contributing-docs/testing/README.rst @@ -0,0 +1,22 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Those files describe various kind of tests that are available in Airflow. + +----- + +For all kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/dag_testing.rst b/contributing-docs/testing/dag_testing.rst new file mode 100644 index 0000000000000..7e311171ce019 --- /dev/null +++ b/contributing-docs/testing/dag_testing.rst @@ -0,0 +1,63 @@ + + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +DAG Testing +=========== + +To ease and speed up the process of developing DAGs, you can use +py:class:`~airflow.executors.debug_executor.DebugExecutor`, which is a single process executor +for debugging purposes. Using this executor, you can run and debug DAGs from your IDE. + +To set up the IDE: + +1. Add ``main`` block at the end of your DAG file to make it runnable. +It will run a backfill job: + +.. code-block:: python + + if __name__ == "__main__": + dag.clear() + dag.run() + + +2. Set up ``AIRFLOW__CORE__EXECUTOR=DebugExecutor`` in the run configuration of your IDE. + Make sure to also set up all environment variables required by your DAG. + +3. Run and debug the DAG file. + +Additionally, ``DebugExecutor`` can be used in a fail-fast mode that will make +all other running or scheduled tasks fail immediately. To enable this option, set +``AIRFLOW__DEBUG__FAIL_FAST=True`` or adjust ``fail_fast`` option in your ``airflow.cfg``. + +Also, with the Airflow CLI command ``airflow dags test``, you can execute one complete run of a DAG: + +.. code-block:: bash + + # airflow dags test [dag_id] [execution_date] + airflow dags test example_branch_operator 2018-01-01 + +By default ``/files/dags`` folder is mounted from your local ``/files/dags`` and this is +the directory used by airflow scheduler and webserver to scan dags for. You can place your dags there +to test them. + +The DAGs can be run in the main version of Airflow but they also work +with older versions. + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/docker_compose_tests.rst b/contributing-docs/testing/docker_compose_tests.rst new file mode 100644 index 0000000000000..63c1ab404b101 --- /dev/null +++ b/contributing-docs/testing/docker_compose_tests.rst @@ -0,0 +1,103 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Airflow Docker Compose Tests +============================ + +This document describes how to run tests for Airflow Docker Compose deployment. + +.. contents:: :local: + +Running Docker Compose Tests with Breeze +---------------------------------------- + +We also test in CI whether the Docker Compose that we expose in our documentation via +`Running Airflow in Docker `_ +works as expected. Those tests are run in CI ("Test docker-compose quick start") +and you can run them locally as well. + +The way the tests work: + +1. They first build the Airflow production image +2. Then they take the Docker Compose file of ours and use the image to start it +3. Then they perform some simple DAG trigger tests which checks whether Airflow is up and can process + an example DAG + +This is done in a local environment, not in the Breeze CI image. It uses ``COMPOSE_PROJECT_NAME`` set to +``quick-start`` to avoid conflicts with other docker compose deployments you might have. + +The complete test can be performed using Breeze. The prerequisite to that +is to have ``docker-compose`` (Docker Compose v1) or ``docker compose`` plugin (Docker Compose v2) +available on the path. + +Running complete test with breeze: + +.. code-block:: bash + + breeze prod-image build --python 3.8 + breeze testing docker-compose-tests + +In case the test fails, it will dump the logs from the running containers to the console and it +will shutdown the Docker Compose deployment. In case you want to debug the Docker Compose deployment +created for the test, you can pass ``--skip-docker-compose-deletion`` flag to Breeze or +export ``SKIP_DOCKER_COMPOSE_DELETION`` set to "true" variable and the deployment +will not be deleted after the test. + +You can also specify maximum timeout for the containers with ``--wait-for-containers-timeout`` flag. +You can also add ``-s`` option to the command pass it to underlying pytest command +to see the output of the test as it happens (it can be also set via +``WAIT_FOR_CONTAINERS_TIMEOUT`` environment variable) + +The test can be also run manually with ``pytest docker_tests/test_docker_compose_quick_start.py`` +command, provided that you have a local airflow venv with ``dev`` extra set and the +``DOCKER_IMAGE`` environment variable is set to the image you want to test. The variable defaults +to ``ghcr.io/apache/airflow/main/prod/python3.8:latest`` which is built by default +when you run ``breeze prod-image build --python 3.8``. also the switches ``--skip-docker-compose-deletion`` +and ``--wait-for-containers-timeout`` can only be passed via environment variables. + +If you want to debug the deployment using ``docker compose`` commands after ``SKIP_DOCKER_COMPOSE_DELETION`` +was used, you should set ``COMPOSE_PROJECT_NAME`` to ``quick-start`` because this is what the test uses: + +.. code-block:: bash + + export COMPOSE_PROJECT_NAME=quick-start + +You can also add ``--project-name quick-start`` to the ``docker compose`` commands you run. +When the test will be re-run it will automatically stop previous deployment and start a new one. + +Running Docker Compose deployment manually +------------------------------------------ + +You can also (independently of Pytest test) run docker-compose deployment manually with the image you built using +the prod image build command above. + +.. code-block:: bash + + export AIRFLOW_IMAGE_NAME=ghcr.io/apache/airflow/main/prod/python3.8:latest + +and follow the instructions in the +`Running Airflow in Docker `_ +but make sure to use the docker-compose file from the sources in +``docs/apache-airflow/stable/howto/docker-compose/`` folder. + +Then, the usual ``docker compose`` and ``docker`` commands can be used to debug such running instances. +The test performs a simple API call to trigger a DAG and wait for it, but you can follow our +documentation to connect to such running docker compose instances and test it manually. + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/helm_unit_tests.rst b/contributing-docs/testing/helm_unit_tests.rst new file mode 100644 index 0000000000000..266be65d81db0 --- /dev/null +++ b/contributing-docs/testing/helm_unit_tests.rst @@ -0,0 +1,111 @@ + + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Helm Unit Tests +--------------- + +On the Airflow Project, we have decided to stick with pythonic testing for our Helm chart. This makes our chart +easier to test, easier to modify, and able to run with the same testing infrastructure. To add Helm unit tests +add them in ``helm_tests``. + +.. code-block:: python + + class TestBaseChartTest: + ... + +To render the chart create a YAML string with the nested dictionary of options you wish to test. You can then +use our ``render_chart`` function to render the object of interest into a testable Python dictionary. Once the chart +has been rendered, you can use the ``render_k8s_object`` function to create a k8s model object. It simultaneously +ensures that the object created properly conforms to the expected resource spec and allows you to use object values +instead of nested dictionaries. + +Example test here: + +.. code-block:: python + + from tests.charts.common.helm_template_generator import render_chart, render_k8s_object + + git_sync_basic = """ + dags: + gitSync: + enabled: true + """ + + + class TestGitSyncScheduler: + def test_basic(self): + helm_settings = yaml.safe_load(git_sync_basic) + res = render_chart( + "GIT-SYNC", + helm_settings, + show_only=["templates/scheduler/scheduler-deployment.yaml"], + ) + dep: k8s.V1Deployment = render_k8s_object(res[0], k8s.V1Deployment) + assert "dags" == dep.spec.template.spec.volumes[1].name + + +To execute all Helm tests using breeze command and utilize parallel pytest tests, you can run the +following command (but it takes quite a long time even in a multi-processor machine). + +.. code-block:: bash + + breeze testing helm-tests + +You can also execute tests from a selected package only. Tests in ``tests/chart`` are grouped by packages +so rather than running all tests, you can run only tests from a selected package. For example: + +.. code-block:: bash + + breeze testing helm-tests --helm-test-package basic + +Will run all tests from ``tests-charts/basic`` package. + + +You can also run Helm tests individually via the usual ``breeze`` command. Just enter breeze and run the +tests with pytest as you would do with regular unit tests (you can add ``-n auto`` command to run Helm +tests in parallel - unlike most of the regular unit tests of ours that require a database, the Helm tests are +perfectly safe to be run in parallel (and if you have multiple processors, you can gain significant +speedups when using parallel runs): + +.. code-block:: bash + + breeze + +This enters breeze container. + +.. code-block:: bash + + pytest helm_tests -n auto + +This runs all chart tests using all processors you have available. + +.. code-block:: bash + + pytest helm_tests/test_airflow_common.py -n auto + +This will run all tests from ``tests_airflow_common.py`` file using all processors you have available. + +.. code-block:: bash + + pytest helm_tests/test_airflow_common.py + +This will run all tests from ``tests_airflow_common.py`` file sequentially. + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/images/testing/k9s.png b/contributing-docs/testing/images/k9s.png similarity index 100% rename from images/testing/k9s.png rename to contributing-docs/testing/images/k9s.png diff --git a/images/testing/kubeconfig-env.png b/contributing-docs/testing/images/kubeconfig-env.png similarity index 100% rename from images/testing/kubeconfig-env.png rename to contributing-docs/testing/images/kubeconfig-env.png diff --git a/images/testing/kubernetes-virtualenv.png b/contributing-docs/testing/images/kubernetes-virtualenv.png similarity index 100% rename from images/testing/kubernetes-virtualenv.png rename to contributing-docs/testing/images/kubernetes-virtualenv.png diff --git a/images/pycharm/configure_test_runner.png b/contributing-docs/testing/images/pycharm/configure_test_runner.png similarity index 100% rename from images/pycharm/configure_test_runner.png rename to contributing-docs/testing/images/pycharm/configure_test_runner.png diff --git a/images/pycharm/pycharm_add_to_context.png b/contributing-docs/testing/images/pycharm/pycharm_add_to_context.png similarity index 100% rename from images/pycharm/pycharm_add_to_context.png rename to contributing-docs/testing/images/pycharm/pycharm_add_to_context.png diff --git a/images/pycharm/pycharm_create_tool.png b/contributing-docs/testing/images/pycharm/pycharm_create_tool.png similarity index 100% rename from images/pycharm/pycharm_create_tool.png rename to contributing-docs/testing/images/pycharm/pycharm_create_tool.png diff --git a/images/pycharm/running_unittests.png b/contributing-docs/testing/images/pycharm/running_unittests.png similarity index 100% rename from images/pycharm/running_unittests.png rename to contributing-docs/testing/images/pycharm/running_unittests.png diff --git a/images/testing/pytest-runner.png b/contributing-docs/testing/images/pytest-runner.png similarity index 100% rename from images/testing/pytest-runner.png rename to contributing-docs/testing/images/pytest-runner.png diff --git a/images/testing/run-test.png b/contributing-docs/testing/images/run-test.png similarity index 100% rename from images/testing/run-test.png rename to contributing-docs/testing/images/run-test.png diff --git a/images/vscode_add_pytest_settings.png b/contributing-docs/testing/images/vscode_add_pytest_settings.png similarity index 100% rename from images/vscode_add_pytest_settings.png rename to contributing-docs/testing/images/vscode_add_pytest_settings.png diff --git a/images/vscode_configure_python_tests.png b/contributing-docs/testing/images/vscode_configure_python_tests.png similarity index 100% rename from images/vscode_configure_python_tests.png rename to contributing-docs/testing/images/vscode_configure_python_tests.png diff --git a/images/vscode_install_python_extension.png b/contributing-docs/testing/images/vscode_install_python_extension.png similarity index 100% rename from images/vscode_install_python_extension.png rename to contributing-docs/testing/images/vscode_install_python_extension.png diff --git a/images/vscode_run_tests.png b/contributing-docs/testing/images/vscode_run_tests.png similarity index 100% rename from images/vscode_run_tests.png rename to contributing-docs/testing/images/vscode_run_tests.png diff --git a/images/vscode_select_pytest_framework.png b/contributing-docs/testing/images/vscode_select_pytest_framework.png similarity index 100% rename from images/vscode_select_pytest_framework.png rename to contributing-docs/testing/images/vscode_select_pytest_framework.png diff --git a/contributing-docs/testing/integration_tests.rst b/contributing-docs/testing/integration_tests.rst new file mode 100644 index 0000000000000..2d9766b00744a --- /dev/null +++ b/contributing-docs/testing/integration_tests.rst @@ -0,0 +1,170 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Airflow Integration Tests +========================= + +Some of the tests in Airflow are integration tests. These tests require ``airflow`` Docker +image and extra images with integrations (such as ``celery``, ``mongodb``, etc.). +The integration tests are all stored in the ``tests/integration`` folder. + +.. contents:: :local: + +Enabling Integrations +--------------------- + +Airflow integration tests cannot be run in the local virtualenv. They can only run in the Breeze +environment with enabled integrations and in the CI. See `CI `_ for details about Airflow CI. + +When you are in the Breeze environment, by default, all integrations are disabled. This enables only true unit tests +to be executed in Breeze. You can enable the integration by passing the ``--integration `` +switch when starting Breeze. You can specify multiple integrations by repeating the ``--integration`` switch +or using the ``--integration all-testable`` switch that enables all testable integrations and +``--integration all`` switch that enables all integrations. + +NOTE: Every integration requires a separate container with the corresponding integration image. +These containers take precious resources on your PC, mainly the memory. The started integrations are not stopped +until you stop the Breeze environment with the ``stop`` command and started with the ``start`` command. + +The following integrations are available: + +.. BEGIN AUTO-GENERATED INTEGRATION LIST + ++--------------+----------------------------------------------------+ +| Identifier | Description | ++==============+====================================================+ +| cassandra | Integration required for Cassandra hooks. | ++--------------+----------------------------------------------------+ +| celery | Integration required for Celery executor tests. | ++--------------+----------------------------------------------------+ +| kafka | Integration required for Kafka hooks. | ++--------------+----------------------------------------------------+ +| kerberos | Integration that provides Kerberos authentication. | ++--------------+----------------------------------------------------+ +| mongo | Integration required for MongoDB hooks. | ++--------------+----------------------------------------------------+ +| openlineage | Integration required for Openlineage hooks. | ++--------------+----------------------------------------------------+ +| otel | Integration required for OTEL/opentelemetry hooks. | ++--------------+----------------------------------------------------+ +| pinot | Integration required for Apache Pinot hooks. | ++--------------+----------------------------------------------------+ +| statsd | Integration required for Satsd hooks. | ++--------------+----------------------------------------------------+ +| trino | Integration required for Trino hooks. | ++--------------+----------------------------------------------------+ + +.. END AUTO-GENERATED INTEGRATION LIST' + +To start the ``mongo`` integration only, enter: + +.. code-block:: bash + + breeze --integration mongo + +To start ``mongo`` and ``cassandra`` integrations, enter: + +.. code-block:: bash + + breeze --integration mongo --integration cassandra + +To start all testable integrations, enter: + +.. code-block:: bash + + breeze --integration all-testable + +To start all integrations, enter: + +.. code-block:: bash + + breeze --integration all-testable + +Note that Kerberos is a special kind of integration. Some tests run differently when +Kerberos integration is enabled (they retrieve and use a Kerberos authentication token) and differently when the +Kerberos integration is disabled (they neither retrieve nor use the token). Therefore, one of the test jobs +for the CI system should run all tests with the Kerberos integration enabled to test both scenarios. + +Running Integration Tests +------------------------- + +All tests using an integration are marked with a custom pytest marker ``pytest.mark.integration``. +The marker has a single parameter - the name of integration. + +Example of the ``celery`` integration test: + +.. code-block:: python + + @pytest.mark.integration("celery") + def test_real_ping(self): + hook = RedisHook(redis_conn_id="redis_default") + redis = hook.get_conn() + + assert redis.ping(), "Connection to Redis with PING works." + +The markers can be specified at the test level or the class level (then all tests in this class +require an integration). You can add multiple markers with different integrations for tests that +require more than one integration. + +If such a marked test does not have a required integration enabled, it is skipped. +The skip message clearly says what is needed to use the test. + +To run all tests with a certain integration, use the custom pytest flag ``--integration``. +You can pass several integration flags if you want to enable several integrations at once. + +**NOTE:** If an integration is not enabled in Breeze or CI, +the affected test will be skipped. + +To run only ``mongo`` integration tests: + +.. code-block:: bash + + pytest --integration mongo tests/integration + +To run integration tests for ``mongo`` and ``celery``: + +.. code-block:: bash + + pytest --integration mongo --integration celery tests/integration + + +Here is an example of the collection limited to the ``providers/apache`` sub-directory: + +.. code-block:: bash + + pytest --integration cassandra tests/integrations/providers/apache + +Running Integration Tests from the Host +--------------------------------------- + +You can also run integration tests using Breeze from the host. + +Runs all integration tests: + + .. code-block:: bash + + breeze testing integration-tests --db-reset --integration all-testable + +Runs all mongo DB tests: + + .. code-block:: bash + + breeze testing integration-tests --db-reset --integration mongo + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/k8s_tests.rst b/contributing-docs/testing/k8s_tests.rst new file mode 100644 index 0000000000000..463cc75d36f94 --- /dev/null +++ b/contributing-docs/testing/k8s_tests.rst @@ -0,0 +1,662 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Kubernetes tests +================ + +Airflow has tests that are run against real Kubernetes cluster. We are using +`Kind `_ to create and run the cluster. We integrated the tools to start/stop/ +deploy and run the cluster tests in our repository and into Breeze development environment. + +KinD has a really nice ``kind`` tool that you can use to interact with the cluster. Run ``kind --help`` to +learn more. + +.. contents:: :local: + +K8S test environment +-------------------- + +Before running ``breeze k8s`` cluster commands you need to setup the environment. This is done +by ``breeze k8s setup-env`` command. Breeze in this command makes sure to download tools that +are needed to run k8s tests: Helm, Kind, Kubectl in the right versions and sets up a +Python virtualenv that is needed to run the tests. All those tools and env are setup in +``.build/.k8s-env`` folder. You can activate this environment yourselves as usual by sourcing +``bin/activate`` script, but since we are supporting multiple clusters in the same installation +it is best if you use ``breeze k8s shell`` with the right parameters specifying which cluster +to use. + +Multiple cluster support +------------------------ + +The main feature of ``breeze k8s`` command is that it allows you to manage multiple KinD clusters - one +per each combination of Python and Kubernetes version. This is used during CI where we can run same +tests against those different clusters - even in parallel. + +The cluster name follows the pattern ``airflow-python-X.Y-vA.B.C`` where X.Y is a major/minor Python version +and A.B.C is Kubernetes version. Example cluster name: ``airflow-python-3.8-v1.24.0`` + +Most of the commands can be executed in parallel for multiple images/clusters by adding ``--run-in-parallel`` +to create clusters or deploy airflow. Similarly checking for status, dumping logs and deleting clusters +can be run with ``--all`` flag and they will be executed sequentially for all locally created clusters. + +Per-cluster configuration files +------------------------------- + +Once you start the cluster, the configuration for it is stored in a dynamically created folder - separate +folder for each python/kubernetes_version combination. The folder is ``./build/.k8s-clusters/`` + +There are two files there: + +* kubectl config file stored in .kubeconfig file - our scripts set the ``KUBECONFIG`` variable to it +* KinD cluster configuration in .kindconfig.yml file - our scripts set the ``KINDCONFIG`` variable to it + +The ``KUBECONFIG`` file is automatically used when you enter any of the ``breeze k8s`` commands that use +``kubectl`` or when you run ``kubectl`` in the k8s shell. The ``KINDCONFIG`` file is used when cluster is +started but You and the ``k8s`` command can inspect it to know for example what port is forwarded to the +webserver running in the cluster. + +The files are deleted by ``breeze k8s delete-cluster`` command. + +Managing Kubernetes Cluster +--------------------------- + +For your testing, you manage Kind cluster with ``k8s`` breeze command group. Those commands allow to +created: + +.. image:: ../dev/breeze/doc/images/output_k8s.svg + :width: 100% + :alt: Breeze k8s + +The command group allows you to setup environment, start/stop/recreate/status Kind Kubernetes cluster, +configure cluster (via ``create-cluster``, ``configure-cluster`` command). Those commands can be run with +``--run-in-parallel`` flag for all/selected clusters and they can be executed in parallel. + +In order to deploy Airflow, the PROD image of Airflow need to be extended and example dags and POD +template files should be added to the image. This is done via ``build-k8s-image``, ``upload-k8s-image``. +This can also be done for all/selected images/clusters in parallel via ``--run-in-parallel`` flag. + +Then Airflow (by using Helm Chart) can be deployed to the cluster via ``deploy-airflow`` command. +This can also be done for all/selected images/clusters in parallel via ``--run-in-parallel`` flag. You can +pass extra options when deploying airflow to configure your depliyment. + +You can check the status, dump logs and finally delete cluster via ``status``, ``logs``, ``delete-cluster`` +commands. This can also be done for all created clusters in parallel via ``--all`` flag. + +You can interact with the cluster (via ``shell`` and ``k9s`` commands). + +You can run set of k8s tests via ``tests`` command. You can also run tests in parallel on all/selected +clusters by ``--run-in-parallel`` flag. + + +Running tests with Kubernetes Cluster +------------------------------------- + +You can either run all tests or you can select which tests to run. You can also enter interactive virtualenv +to run the tests manually one by one. + + +Running Kubernetes tests via breeze: + +.. code-block:: bash + + breeze k8s tests + breeze k8s tests TEST TEST [TEST ...] + +Optionally add ``--executor``: + +.. code-block:: bash + + breeze k8s tests --executor CeleryExecutor + breeze k8s tests --executor CeleryExecutor TEST TEST [TEST ...] + +Entering shell with Kubernetes Cluster +-------------------------------------- + +This shell is prepared to run Kubernetes tests interactively. It has ``kubectl`` and ``kind`` cli tools +available in the path, it has also activated virtualenv environment that allows you to run tests via pytest. + +The virtualenv is available in ./.build/.k8s-env/ +The binaries are available in ``.build/.k8s-env/bin`` path. + +.. code-block:: bash + + breeze k8s shell + +Optionally add ``--executor``: + +.. code-block:: bash + + breeze k8s shell --executor CeleryExecutor + + +K9s CLI - debug Kubernetes in style! +------------------------------------ + +Breeze has built-in integration with fantastic k9s CLI tool, that allows you to debug the Kubernetes +installation effortlessly and in style. K9S provides terminal (but windowed) CLI that helps you to: + +- easily observe what's going on in the Kubernetes cluster +- observe the resources defined (pods, secrets, custom resource definitions) +- enter shell for the Pods/Containers running, +- see the log files and more. + +You can read more about k9s at `https://k9scli.io/ `_ + +Here is the screenshot of k9s tools in operation: + +.. image:: images/k9s.png + :align: center + :alt: K9S tool + + +You can enter the k9s tool via breeze (after you deployed Airflow): + +.. code-block:: bash + + breeze k8s k9s + +You can exit k9s by pressing Ctrl-C. + +Typical testing pattern for Kubernetes tests +-------------------------------------------- + +The typical session for tests with Kubernetes looks like follows: + + +1. Prepare the environment: + +.. code-block:: bash + + breeze k8s setup-env + +The first time you run it, it should result in creating the virtualenv and installing good versions +of kind, kubectl and helm. All of them are installed in ``./build/.k8s-env`` (binaries available in ``bin`` +sub-folder of it). + +.. code-block:: text + + Initializing K8S virtualenv in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env + Reinstalling PIP version in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env + Installing necessary packages in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env + The ``kind`` tool is not downloaded yet. Downloading 0.14.0 version. + Downloading from: https://github.com/kubernetes-sigs/kind/releases/download/v0.14.0/kind-darwin-arm64 + The ``kubectl`` tool is not downloaded yet. Downloading 1.24.3 version. + Downloading from: https://storage.googleapis.com/kubernetes-release/release/v1.24.3/bin/darwin/arm64/kubectl + The ``helm`` tool is not downloaded yet. Downloading 3.9.2 version. + Downloading from: https://get.helm.sh/helm-v3.9.2-darwin-arm64.tar.gz + Extracting the darwin-arm64/helm to /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin + Moving the helm to /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin/helm + + +This prepares the virtual environment for tests and downloads the right versions of the tools +to ``./build/.k8s-env`` + +2. Create the KinD cluster: + +.. code-block:: bash + + breeze k8s create-cluster + +Should result in KinD creating the K8S cluster. + +.. code-block:: text + + Config created in /Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kindconfig.yaml: + + # Licensed to the Apache Software Foundation (ASF) under one + # or more contributor license agreements. See the NOTICE file + # distributed with this work for additional information + # regarding copyright ownership. The ASF licenses this file + # to you under the Apache License, Version 2.0 (the + # "License"); you may not use this file except in compliance + # with the License. You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, + # software distributed under the License is distributed on an + # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + # KIND, either express or implied. See the License for the + # specific language governing permissions and limitations + # under the License. + --- + kind: Cluster + apiVersion: kind.x-k8s.io/v1alpha4 + networking: + ipFamily: ipv4 + apiServerAddress: "127.0.0.1" + apiServerPort: 48366 + nodes: + - role: control-plane + - role: worker + extraPortMappings: + - containerPort: 30007 + hostPort: 18150 + listenAddress: "127.0.0.1" + protocol: TCP + + + + Creating cluster "airflow-python-3.8-v1.24.2" ... + ✓ Ensuring node image (kindest/node:v1.24.2) 🖼 + ✓ Preparing nodes 📦 📦 + ✓ Writing configuration 📜 + ✓ Starting control-plane 🕹️ + ✓ Installing CNI 🔌 + ✓ Installing StorageClass 💾 + ✓ Joining worker nodes 🚜 + Set kubectl context to "kind-airflow-python-3.8-v1.24.2" + You can now use your cluster with: + + kubectl cluster-info --context kind-airflow-python-3.8-v1.24.2 + + Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/ + + KinD Cluster API server URL: http://localhost:48366 + Connecting to localhost:18150. Num try: 1 + Error when connecting to localhost:18150 : ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) + + Airflow webserver is not available at port 18150. Run `breeze k8s deploy-airflow --python 3.8 --kubernetes-version v1.24.2` to (re)deploy airflow + + KinD cluster airflow-python-3.8-v1.24.2 created! + + NEXT STEP: You might now configure your cluster by: + + breeze k8s configure-cluster + +3. Configure cluster for Airflow - this will recreate namespace and upload test resources for Airflow. + +.. code-block:: bash + + breeze k8s configure-cluster + +.. code-block:: text + + Configuring airflow-python-3.8-v1.24.2 to be ready for Airflow deployment + Deleting K8S namespaces for kind-airflow-python-3.8-v1.24.2 + Error from server (NotFound): namespaces "airflow" not found + Error from server (NotFound): namespaces "test-namespace" not found + Creating namespaces + namespace/airflow created + namespace/test-namespace created + Created K8S namespaces for cluster kind-airflow-python-3.8-v1.24.2 + + Deploying test resources for cluster kind-airflow-python-3.8-v1.24.2 + persistentvolume/test-volume created + persistentvolumeclaim/test-volume created + service/airflow-webserver-node-port created + Deployed test resources for cluster kind-airflow-python-3.8-v1.24.2 + + + NEXT STEP: You might now build your k8s image by: + + breeze k8s build-k8s-image + +4. Check the status of the cluster + +.. code-block:: bash + + breeze k8s status + +Should show the status of current KinD cluster. + +.. code-block:: text + + ======================================================================================================================== + Cluster: airflow-python-3.8-v1.24.2 + + * KUBECONFIG=/Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kubeconfig + * KINDCONFIG=/Users/jarek/IdeaProjects/airflow/.build/.k8s-clusters/airflow-python-3.8-v1.24.2/.kindconfig.yaml + + Cluster info: airflow-python-3.8-v1.24.2 + + Kubernetes control plane is running at https://127.0.0.1:48366 + CoreDNS is running at https://127.0.0.1:48366/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + + To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. + + Storage class for airflow-python-3.8-v1.24.2 + + NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE + standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 83s + + Running pods for airflow-python-3.8-v1.24.2 + + NAME READY STATUS RESTARTS AGE + coredns-6d4b75cb6d-rwp9d 1/1 Running 0 71s + coredns-6d4b75cb6d-vqnrc 1/1 Running 0 71s + etcd-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s + kindnet-ckc8l 1/1 Running 0 69s + kindnet-qqt8k 1/1 Running 0 71s + kube-apiserver-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s + kube-controller-manager-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s + kube-proxy-6g7hn 1/1 Running 0 69s + kube-proxy-dwfvp 1/1 Running 0 71s + kube-scheduler-airflow-python-3.8-v1.24.2-control-plane 1/1 Running 0 84s + + KinD Cluster API server URL: http://localhost:48366 + Connecting to localhost:18150. Num try: 1 + Error when connecting to localhost:18150 : ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) + + Airflow webserver is not available at port 18150. Run `breeze k8s deploy-airflow --python 3.8 --kubernetes-version v1.24.2` to (re)deploy airflow + + + Cluster healthy: airflow-python-3.8-v1.24.2 + +5. Build the image base on PROD Airflow image. You need to build the PROD image first (the command will + guide you if you did not - either by running the build separately or passing ``--rebuild-base-image`` flag + +.. code-block:: bash + + breeze k8s build-k8s-image + +.. code-block:: text + + Building the K8S image for Python 3.8 using airflow base image: ghcr.io/apache/airflow/main/prod/python3.8:latest + + [+] Building 0.1s (8/8) FINISHED + => [internal] load build definition from Dockerfile 0.0s + => => transferring dockerfile: 301B 0.0s + => [internal] load .dockerignore 0.0s + => => transferring context: 35B 0.0s + => [internal] load metadata for ghcr.io/apache/airflow/main/prod/python3.8:latest 0.0s + => [1/3] FROM ghcr.io/apache/airflow/main/prod/python3.8:latest 0.0s + => [internal] load build context 0.0s + => => transferring context: 3.00kB 0.0s + => CACHED [2/3] COPY airflow/example_dags/ /opt/airflow/dags/ 0.0s + => CACHED [3/3] COPY airflow/kubernetes_executor_templates/ /opt/airflow/pod_templates/ 0.0s + => exporting to image 0.0s + => => exporting layers 0.0s + => => writing image sha256:c0bdd363c549c3b0731b8e8ce34153d081f239ee2b582355b7b3ffd5394c40bb 0.0s + => => naming to ghcr.io/apache/airflow/main/prod/python3.8-kubernetes:latest + + NEXT STEP: You might now upload your k8s image by: + + breeze k8s upload-k8s-image + + +5. Upload the image to KinD cluster - this uploads your image to make it available for the KinD cluster. + +.. code-block:: bash + + breeze k8s upload-k8s-image + +.. code-block:: text + + K8S Virtualenv is initialized in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env + Good version of kind installed: 0.14.0 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin + Good version of kubectl installed: 1.25.0 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin + Good version of helm installed: 3.9.2 in /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin + Stable repo is already added + Uploading Airflow image ghcr.io/apache/airflow/main/prod/python3.8-kubernetes to cluster airflow-python-3.8-v1.24.2 + Image: "ghcr.io/apache/airflow/main/prod/python3.8-kubernetes" with ID "sha256:fb6195f7c2c2ad97788a563a3fe9420bf3576c85575378d642cd7985aff97412" not yet present on node "airflow-python-3.8-v1.24.2-worker", loading... + Image: "ghcr.io/apache/airflow/main/prod/python3.8-kubernetes" with ID "sha256:fb6195f7c2c2ad97788a563a3fe9420bf3576c85575378d642cd7985aff97412" not yet present on node "airflow-python-3.8-v1.24.2-control-plane", loading... + + NEXT STEP: You might now deploy airflow by: + + breeze k8s deploy-airflow + + +7. Deploy Airflow to the cluster - this will use Airflow Helm Chart to deploy Airflow to the cluster. + +.. code-block:: bash + + breeze k8s deploy-airflow + +.. code-block:: text + + Deploying Airflow for cluster airflow-python-3.8-v1.24.2 + Deploying kind-airflow-python-3.8-v1.24.2 with airflow Helm Chart. + Copied chart sources to /private/var/folders/v3/gvj4_mw152q556w2rrh7m46w0000gn/T/chart_edu__kir/chart + Deploying Airflow from /private/var/folders/v3/gvj4_mw152q556w2rrh7m46w0000gn/T/chart_edu__kir/chart + NAME: airflow + LAST DEPLOYED: Tue Aug 30 22:57:54 2022 + NAMESPACE: airflow + STATUS: deployed + REVISION: 1 + TEST SUITE: None + NOTES: + Thank you for installing Apache Airflow 2.3.4! + + Your release is named airflow. + You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser: + + Airflow Webserver: kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow + Default Webserver (Airflow UI) Login credentials: + username: admin + password: admin + Default Postgres connection credentials: + username: postgres + password: postgres + port: 5432 + + You can get Fernet Key value by running the following: + + echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode) + + WARNING: + Kubernetes workers task logs may not persist unless you configure log persistence or remote logging! + Logging options can be found at: https://airflow.apache.org/docs/helm-chart/stable/manage-logs.html + (This warning can be ignored if logging is configured with environment variables or secrets backend) + + ########################################################### + # WARNING: You should set a static webserver secret key # + ########################################################### + + You are using a dynamically generated webserver secret key, which can lead to + unnecessary restarts of your Airflow components. + + Information on how to set a static webserver secret key can be found here: + https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key + Deployed kind-airflow-python-3.8-v1.24.2 with airflow Helm Chart. + + Airflow for Python 3.8 and K8S version v1.24.2 has been successfully deployed. + + The KinD cluster name: airflow-python-3.8-v1.24.2 + The kubectl cluster name: kind-airflow-python-3.8-v1.24.2. + + + KinD Cluster API server URL: http://localhost:48366 + Connecting to localhost:18150. Num try: 1 + Established connection to webserver at http://localhost:18150/health and it is healthy. + Airflow Web server URL: http://localhost:18150 (admin/admin) + + NEXT STEP: You might now run tests or interact with airflow via shell (kubectl, pytest etc.) or k9s commands: + + + breeze k8s tests + + breeze k8s shell + + breeze k8s k9s + + +8. Run Kubernetes tests + +Note that the tests are executed in production container not in the CI container. +There is no need for the tests to run inside the Airflow CI container image as they only +communicate with the Kubernetes-run Airflow deployed via the production image. +Those Kubernetes tests require virtualenv to be created locally with airflow installed. +The virtualenv required will be created automatically when the scripts are run. + +8a) You can run all the tests + +.. code-block:: bash + + breeze k8s tests + +.. code-block:: text + + Running tests with kind-airflow-python-3.8-v1.24.2 cluster. + Command to run: pytest kubernetes_tests + ========================================================================================= test session starts ========================================================================================== + platform darwin -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /Users/jarek/IdeaProjects/airflow/.build/.k8s-env/bin/python + cachedir: .pytest_cache + rootdir: /Users/jarek/IdeaProjects/airflow/kubernetes_tests + plugins: anyio-3.6.1, instafail-0.4.2, xdist-2.5.0, forked-1.4.0, timeouts-1.2.1, cov-3.0.0 + setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s + collected 55 items + + test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag PASSED [ 1%] + test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure PASSED [ 3%] + test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_already_checked_on_failure PASSED [ 5%] + test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_already_checked_on_success ... + +8b) You can enter an interactive shell to run tests one-by-one + +This enters the virtualenv in ``.build/.k8s-env`` folder: + +.. code-block:: bash + + breeze k8s shell + +Once you enter the environment, you receive this information: + +.. code-block:: text + + Entering interactive k8s shell. + + (kind-airflow-python-3.8-v1.24.2:KubernetesExecutor)> + +In a separate terminal you can open the k9s CLI: + +.. code-block:: bash + + breeze k8s k9s + +Use it to observe what's going on in your cluster. + +9. Debugging in IntelliJ/PyCharm + +It is very easy to running/debug Kubernetes tests with IntelliJ/PyCharm. Unlike the regular tests they are +in ``kubernetes_tests`` folder and if you followed the previous steps and entered the shell using +``breeze k8s shell`` command, you can setup your IDE very easy to run (and debug) your +tests using the standard IntelliJ Run/Debug feature. You just need a few steps: + +9a) Add the virtualenv as interpreter for the project: + +.. image:: images/kubernetes-virtualenv.png + :align: center + :alt: Kubernetes testing virtualenv + +The virtualenv is created in your "Airflow" source directory in the +``.build/.k8s-env`` folder and you have to find ``python`` binary and choose +it when selecting interpreter. + +9b) Choose pytest as test runner: + +.. image:: images/pytest-runner.png + :align: center + :alt: Pytest runner + +9c) Run/Debug tests using standard "Run/Debug" feature of IntelliJ + +.. image:: images/run-test.png + :align: center + :alt: Run/Debug tests + + +NOTE! The first time you run it, it will likely fail with +``kubernetes.config.config_exception.ConfigException``: +``Invalid kube-config file. Expected key current-context in kube-config``. You need to add KUBECONFIG +environment variable copying it from the result of "breeze k8s tests": + +.. code-block:: bash + + echo ${KUBECONFIG} + + /home/jarek/code/airflow/.build/.kube/config + +.. image:: images/kubeconfig-env.png + :align: center + :alt: Run/Debug tests + + +The configuration for Kubernetes is stored in your "Airflow" source directory in ".build/.kube/config" file +and this is where KUBECONFIG env should point to. + +You can iterate with tests while you are in the virtualenv. All the tests requiring Kubernetes cluster +are in "kubernetes_tests" folder. You can add extra ``pytest`` parameters then (for example ``-s`` will +print output generated test logs and print statements to the terminal immediately. You should have +kubernetes_tests as your working directory. + +.. code-block:: bash + + pytest test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure -s + +You can modify the tests or KubernetesPodOperator and re-run them without re-deploying +Airflow to KinD cluster. + +10. Dumping logs + +Sometimes You want to see the logs of the clister. This can be done with ``breeze k8s logs``. + +.. code-block:: bash + + breeze k8s logs + +11. Redeploying airflow + +Sometimes there are side effects from running tests. You can run ``breeze k8s deploy-airflow --upgrade`` +without recreating the whole cluster. + +.. code-block:: bash + + breeze k8s deploy-airflow --upgrade + +If needed you can also delete the cluster manually (within the virtualenv activated by +``breeze k8s shell``: + +.. code-block:: bash + + kind get clusters + kind delete clusters + +Kind has also useful commands to inspect your running cluster: + +.. code-block:: text + + kind --help + +12. Stop KinD cluster when you are done + +.. code-block:: bash + + breeze k8s delete-cluster + +.. code-block:: text + + Deleting KinD cluster airflow-python-3.8-v1.24.2! + Deleting cluster "airflow-python-3.8-v1.24.2" ... + KinD cluster airflow-python-3.8-v1.24.2 deleted! + + +Running complete k8s tests +-------------------------- + +You can also run complete k8s tests with + +.. code-block:: bash + + breeze k8s run-complete-tests + +This will create cluster, build images, deploy airflow run tests and finally delete clusters as single +command. It is the way it is run in our CI, you can also run such complete tests in parallel. + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/system_tests.rst b/contributing-docs/testing/system_tests.rst new file mode 100644 index 0000000000000..a8a1a9fc85660 --- /dev/null +++ b/contributing-docs/testing/system_tests.rst @@ -0,0 +1,168 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Airflow System Tests +==================== + +System tests need to communicate with external services/systems that are available +if you have appropriate credentials configured for your tests. +The system tests derive from the ``tests.test_utils.system_test_class.SystemTests`` class. They should also +be marked with ``@pytest.marker.system(SYSTEM)`` where ``system`` designates the system +to be tested (for example, ``google.cloud``). These tests are skipped by default. + +You can execute the system tests by providing the ``--system SYSTEM`` flag to ``pytest``. You can +specify several --system flags if you want to execute tests for several systems. + +The system tests execute a specified example DAG file that runs the DAG end-to-end. + +.. contents:: :local: + +Environment for System Tests +---------------------------- + +**Prerequisites:** You may need to set some variables to run system tests. If you need to +add some initialization of environment variables to Breeze, you can add a +``variables.env`` file in the ``files/airflow-breeze-config/variables.env`` file. It will be automatically +sourced when entering the Breeze environment. You can also add some additional +initialization commands in this file if you want to execute something +always at the time of entering Breeze. + +There are several typical operations you might want to perform such as: + +* generating a file with the random value used across the whole Breeze session (this is useful if + you want to use this random number in names of resources that you create in your service +* generate variables that will be used as the name of your resources +* decrypt any variables and resources you keep as encrypted in your configuration files +* install additional packages that are needed in case you are doing tests with 1.10.* Airflow series + (see below) + +Example variables.env file is shown here (this is part of the variables.env file that is used to +run Google Cloud system tests. + +.. code-block:: bash + + # Build variables. This file is sourced by Breeze. + # Also it is sourced during continuous integration build in Cloud Build + + # Auto-export all variables + set -a + + echo + echo "Reading variables" + echo + + # Generate random number that will be used across your session + RANDOM_FILE="/random.txt" + + if [[ ! -f "${RANDOM_FILE}" ]]; then + echo "${RANDOM}" > "${RANDOM_FILE}" + fi + + RANDOM_POSTFIX=$(cat "${RANDOM_FILE}") + + +To execute system tests, specify the ``--system SYSTEM`` +flag where ``SYSTEM`` is a system to run the system tests for. It can be repeated. + + +Forwarding Authentication from the Host +---------------------------------------------------- + +For system tests, you can also forward authentication from the host to your Breeze container. You can specify +the ``--forward-credentials`` flag when starting Breeze. Then, it will also forward the most commonly used +credentials stored in your ``home`` directory. Use this feature with care as it makes your personal credentials +visible to anything that you have installed inside the Docker container. + +Currently forwarded credentials are: + * credentials stored in ``${HOME}/.aws`` for ``aws`` - Amazon Web Services client + * credentials stored in ``${HOME}/.azure`` for ``az`` - Microsoft Azure client + * credentials stored in ``${HOME}/.config`` for ``gcloud`` - Google Cloud client (among others) + * credentials stored in ``${HOME}/.docker`` for ``docker`` client + * credentials stored in ``${HOME}/.snowsql`` for ``snowsql`` - SnowSQL (Snowflake CLI client) + +Adding a New System Test +-------------------------- + +We are working on automating system tests execution (AIP-4) but for now, system tests are skipped when +tests are run in our CI system. But to enable the test automation, we encourage you to add system +tests whenever an operator/hook/sensor is added/modified in a given system. + +* To add your own system tests, derive them from the + ``tests.test_utils.system_tests_class.SystemTest`` class and mark with the + ``@pytest.mark.system(SYSTEM_NAME)`` marker. The system name should follow the path defined in + the ``providers`` package (for example, the system tests from ``tests.providers.google.cloud`` + package should be marked with ``@pytest.mark.system("google.cloud")``. + +* If your system tests need some credential files to be available for an + authentication with external systems, make sure to keep these credentials in the + ``files/airflow-breeze-config/keys`` directory. Mark your tests with + ``@pytest.mark.credential_file()`` so that they are skipped if such a credential file is not there. + The tests should read the right credentials and authenticate them on their own. The credentials are read + in Breeze from the ``/files`` directory. The local "files" folder is mounted to the "/files" folder in Breeze. + +* If your system tests are long-running ones (i.e., require more than 20-30 minutes + to complete), mark them with the ```@pytest.markers.long_running`` marker. + Such tests are skipped by default unless you specify the ``--long-running`` flag to pytest. + +* The system test itself (python class) does not have any logic. Such a test runs + the DAG specified by its ID. This DAG should contain the actual DAG logic + to execute. Make sure to define the DAG in ``providers//example_dags``. These example DAGs + are also used to take some snippets of code out of them when documentation is generated. So, having these + DAGs runnable is a great way to make sure the documentation is describing a working example. Inside + your test class/test method, simply use ``self.run_dag(,)`` to run the DAG. Then, + the system class will take care about running the DAG. Note that the DAG_FOLDER should be + a subdirectory of the ``tests.test_utils.AIRFLOW_MAIN_FOLDER`` + ``providers//example_dags``. + + +A simple example of a system test is available in: + +``tests/providers/google/cloud/operators/test_compute_system.py``. + +It runs two DAGs defined in ``airflow.providers.google.cloud.example_dags.example_compute.py``. + + +The typical system test session +------------------------------- + +Here is the typical session that you need to do to run system tests: + +1. Enter breeze + +.. code-block:: bash + + breeze down + breeze --python 3.8 --db-reset --forward-credentials + +This will: + +* stop the whole environment (i.e. recreates metadata database from the scratch) +* run Breeze with: + + * python 3.8 version + * resetting the Airflow database + * forward your local credentials to Breeze + +3. Run the tests: + +.. code-block:: bash + + pytest -o faulthandler_timeout=2400 \ + --system=google tests/providers/google/cloud/operators/test_compute_system.py + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/testing_packages.rst b/contributing-docs/testing/testing_packages.rst new file mode 100644 index 0000000000000..f24da54de1c26 --- /dev/null +++ b/contributing-docs/testing/testing_packages.rst @@ -0,0 +1,123 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Manually building and testing release candidate packages +======================================================== + +Breeze can be used to test new release candidates of packages - both Airflow and providers. You can easily +turn the CI image of Breeze to install and start Airflow for both Airflow and provider packages - both, +packages that are built from sources and packages that are downloaded from PyPI when they are released +there as release candidates. + +.. contents:: :local: + +Prerequisites +------------- + +The way to test it is rather straightforward: + +1) Make sure that the packages - both ``airflow`` and ``providers`` are placed in the ``dist`` folder + of your Airflow source tree. You can either build them there or download from PyPI (see the next chapter) + +2) You can run ```breeze shell`` or ``breeze start-airflow`` commands with adding the following flags - + ``--mount-sources remove`` and ``--use-packages-from-dist``. The first one removes the ``airflow`` + source tree from the container when starting it, the second one installs ``airflow`` and ``providers`` + packages from the ``dist`` folder when entering breeze. + +Testing pre-release packages +---------------------------- + +There are two ways how you can get Airflow packages in ``dist`` folder - by building them from sources or +downloading them from PyPI. + +.. note :: + + Make sure you run ``rm dist/*`` before you start building packages or downloading them from PyPI because + the packages built there already are not removed manually. + +In order to build apache-airflow from sources, you need to run the following command: + +.. code-block:: bash + + breeze release-management prepare-airflow-package + +In order to build providers from sources, you need to run the following command: + +.. code-block:: bash + + breeze release-management prepare-provider-packages ... + +The packages are built in ``dist`` folder and the command will summarise what packages are available in the +``dist`` folder after it finishes. + +If you want to download the packages from PyPI, you need to run the following command: + +.. code-block:: bash + + pip download apache-airflow-providers-==X.Y.Zrc1 --dest dist --no-deps + +You can use it for both release and pre-release packages. + +Examples of testing pre-release packages +---------------------------------------- + +Few examples below explain how you can test pre-release packages, and combine them with locally build +and released packages. + +The following example downloads ``apache-airflow`` and ``celery`` and ``kubernetes`` provider packages from PyPI and +eventually starts Airflow with the Celery Executor. It also loads example dags and default connections: + +.. code:: bash + + rm dist/* + pip download apache-airflow==2.7.0rc1 --dest dist --no-deps + pip download apache-airflow-providers-cncf-kubernetes==7.4.0rc1 --dest dist --no-deps + pip download apache-airflow-providers-cncf-kubernetes==3.3.0rc1 --dest dist --no-deps + breeze start-airflow --mount-sources remove --use-packages-from-dist --executor CeleryExecutor --load-default-connections --load-example-dags + + +The following example downloads ``celery`` and ``kubernetes`` provider packages from PyPI, builds +``apache-airflow`` package from the main sources and eventually starts Airflow with the Celery Executor. +It also loads example dags and default connections: + +.. code:: bash + + rm dist/* + breeze release-management prepare-airflow-package + pip download apache-airflow-providers-cncf-kubernetes==7.4.0rc1 --dest dist --no-deps + pip download apache-airflow-providers-cncf-kubernetes==3.3.0rc1 --dest dist --no-deps + breeze start-airflow --mount-sources remove --use-packages-from-dist --executor CeleryExecutor --load-default-connections --load-example-dags + +The following example builds ``celery``, ``kubernetes`` provider packages from PyPI, downloads 2.6.3 version +of ``apache-airflow`` package from PyPI and eventually starts Airflow using default executor +for the backend chosen (no example dags, no default connections): + +.. code:: bash + + rm dist/* + pip download apache-airflow==2.6.3 --dest dist --no-deps + breeze release-management prepare-provider-packages celery cncf.kubernetes + breeze start-airflow --mount-sources remove --use-packages-from-dist + +You can mix and match packages from PyPI (final or pre-release candidates) with locally build packages. You +can also choose which providers to install this way since the ``--remove-sources`` flag makes sure that Airflow +installed does not contain all the providers - only those that you explicitly downloaded or built in the +``dist`` folder. This way you can test all the combinations of Airflow + Providers you might need. + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/contributing-docs/testing/unit_tests.rst b/contributing-docs/testing/unit_tests.rst new file mode 100644 index 0000000000000..4b84cf92c783b --- /dev/null +++ b/contributing-docs/testing/unit_tests.rst @@ -0,0 +1,1152 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Airflow Unit Tests +================== + +All unit tests for Apache Airflow are run using `pytest `_ . + +.. contents:: :local: + +Writing Unit Tests +------------------ + +Follow the guidelines when writing unit tests: + +* For standard unit tests that do not require integrations with external systems, make sure to simulate all communications. +* All Airflow tests are run with ``pytest``. Make sure to set your IDE/runners (see below) to use ``pytest`` by default. +* For new tests, use standard "asserts" of Python and ``pytest`` decorators/context managers for testing + rather than ``unittest`` ones. See `pytest docs `_ for details. +* Use a parameterized framework for tests that have variations in parameters. +* Use with ``pytest.warn`` to capture warnings rather than ``recwarn`` fixture. We are aiming for 0-warning in our + tests, so we run Pytest with ``--disable-warnings`` but instead we have ``pytest-capture-warnings`` plugin that + overrides ``recwarn`` fixture behaviour. + + +Airflow configuration for unit tests +------------------------------------ + +Some of the unit tests require special configuration set as the ``default``. This is done automatically by +adding ``AIRFLOW__CORE__UNIT_TEST_MODE=True`` to the environment variables in Pytest auto-used +fixture. This in turn makes Airflow load test configuration from the file +``airflow/config_templates/unit_tests.cfg``. Test configuration from there replaces the original +defaults from ``airflow/config_templates/config.yml``. If you want to add some test-only configuration, +as default for all tests you should add the value to this file. + +You can also - of course - override the values in individual test by patching environment variables following +the usual ``AIRFLOW__SECTION__KEY`` pattern or ``conf_vars`` context manager. + +Airflow unit test types +----------------------- + +Airflow tests in the CI environment are split into several test types. You can narrow down which +test types you want to use in various ``breeze testing`` sub-commands in three ways: + +* via specifying the ``--test-type`` when you run single test type in ``breeze testing tests`` command +* via specifying space separating list of test types via ``--paralleltest-types`` or + ``--exclude-parallel-test-types`` options when you run tests in parallel (in several testing commands) + +Those test types are defined: + +* ``Always`` - those are tests that should be always executed (always sub-folder) +* ``API`` - Tests for the Airflow API (api, api_connexion, api_experimental and api_internal sub-folders) +* ``CLI`` - Tests for the Airflow CLI (cli folder) +* ``Core`` - for the core Airflow functionality (core, executors, jobs, models, ti_deps, utils sub-folders) +* ``Operators`` - tests for the operators (operators folder with exception of Virtualenv Operator tests and + External Python Operator tests that have their own test type). They are skipped by the +``virtualenv_operator`` and ``external_python_operator`` test markers that the tests are marked with. +* ``WWW`` - Tests for the Airflow webserver (www folder) +* ``Providers`` - Tests for all Providers of Airflow (providers folder) +* ``PlainAsserts`` - tests that require disabling ``assert-rewrite`` feature of Pytest (usually because + a buggy/complex implementation of an imported library) (``plain_asserts`` marker) +* ``Other`` - all other tests remaining after the above tests are selected + +There are also Virtualenv/ExternalPython operator test types that are excluded from ``Operators`` test type +and run as separate test types. Those are : + +* ``PythonVenv`` - tests for PythonVirtualenvOperator - selected directly as TestPythonVirtualenvOperator +* ``BranchPythonVenv`` - tests for BranchPythonVirtualenvOperator - selected directly as TestBranchPythonVirtualenvOperator +* ``ExternalPython`` - tests for ExternalPythonOperator - selected directly as TestExternalPythonOperator +* ``BranchExternalPython`` - tests for BranchExternalPythonOperator - selected directly as TestBranchExternalPythonOperator + +We have also tests that run "all" tests (so they do not look at the folder, but at the ``pytest`` markers +the tests are marked with to run with some filters applied. + +* ``All-Postgres`` - tests that require Postgres database. They are only run when backend is Postgres (``backend("postgres")`` marker) +* ``All-MySQL`` - tests that require MySQL database. They are only run when backend is MySQL (``backend("mysql")`` marker) +* ``All-Quarantined`` - tests that are flaky and need to be fixed (``quarantined`` marker) +* ``All`` - all tests are run (this is the default) + + +We also have ``Integration`` tests that are running Integration tests with external software that is run +via ``--integration`` flag in ``breeze`` environment - via ``breeze testing integration-tests``. + +* ``Integration`` - tests that require external integration images running in docker-compose + +This is done for three reasons: + +1. in order to selectively run only subset of the test types for some PRs +2. in order to allow efficient parallel test execution of the tests on Self-Hosted runners + +For case 2. We can utilize memory and CPUs available on both CI and local development machines to run +test in parallel, but we cannot use pytest xdist plugin for that - we need to split the tests into test +types and run each test type with their own instance of database and separate container where the tests +in each type are run with exclusive access to their database and each test within test type runs sequentially. +By the nature of those tests - they rely on shared databases - and they update/reset/cleanup data in the +databases while they are executing. + + +DB and non-DB tests +------------------- + +There are two kinds of unit tests in Airflow - DB and non-DB tests. This chapter describe the differences +between those two types. + +Airflow non-DB tests +.................... + +For the Non-DB tests, they are run once for each tested Python version with ``none`` database backend (which +causes any database access to fail. Those tests are run with ``pytest-xdist`` plugin in parallel which +means that we can efficiently utilised multi-processor machines (including ``self-hosted`` runners with +8 CPUS we have to run the tests with maximum parallelism). + +It's usually straightforward to run those tests in local virtualenv because they do not require any +setup or running database. They also run much faster than DB tests. You can run them with ``pytest`` command +or with ``breeze`` that has all the dependencies needed to run all tests automatically installed. Of course +you can also select just specific test or folder or module for the Pytest to collect/run tests from there, +the example below shows how to run all tests, parallelizing them with ``pytest-xdist`` +(by specifying ``tests`` folder): + +.. code-block:: bash + + pytest tests --skip-db-tests -n auto + + +The ``--skip-db-tests`` flag will only run tests that are not marked as DB tests. + + +You can also run ``breeze`` command to run all the tests (they will run in a separate container, +the selected python version and without access to any database). Adding ``--use-xdist`` flag will run all +tests in parallel using ``pytest-xdist`` plugin. + +We have a dedicated, opinionated ``breeze testing non-db-tests`` command as well that runs non-DB tests +(it is also used in CI to run the non-DB tests, where you do not have to specify extra flags for +parallel running and you can run all the Non-DB tests +(or just a subset of them with ``--parallel-test-types`` or ``--exclude-parallel-test-types``) in parallel: + +.. code-block:: bash + + breeze testing non-db-tests + +You can pass ``--parallel-test-type`` list of test types to execute or ``--exclude--parallel-test-types`` +to exclude them from the default set:. + +.. code-block:: bash + + breeze testing non-db-tests --parallel-test-types "Providers API CLI" + + +.. code-block:: bash + + breeze testing non-db-tests --exclude-parallel-test-types "Providers API CLI" + +You can also run the same commands via ``breeze testing tests`` - by adding the necessary flags manually: + +.. code-block:: bash + + breeze testing tests --skip-db-tests --backend none --use-xdist + +Also you can enter interactive shell with ``breeze`` and run tests from there if you want to iterate +with the tests. Source files in ``breeze`` are mounted as volumes so you can modify them locally and +rerun in Breeze as you will (``-n auto`` will parallelize tests using ``pytest-xdist`` plugin): + +.. code-block:: bash + + breeze shell --backend none --python 3.8 + > pytest tests --skip-db-tests -n auto + + +Airflow DB tests +................ + +Some of the tests of Airflow require a database to connect to in order to run. Those tests store and read data +from Airflow DB using Airflow's core code and it's crucial to run the tests against all real databases +that Airflow supports in order to check if the SQLAlchemy queries are correct and if the database + schema is correct. + +Those tests should be marked with ``@pytest.mark.db`` decorator on one of the levels: + +* test method can be marked with ``@pytest.mark.db`` decorator +* test class can be marked with ``@pytest.mark.db`` decorator +* test module can be marked with ``pytestmark = pytest.mark.db`` at the top level of the module + +For the DB tests, they are run against the multiple databases Airflow support, multiple versions of those +and multiple Python versions it supports. In order to save time for testing not all combinations are +tested but enough various combinations are tested to detect potential problems. + +By default, the DB tests will use sqlite and the "airflow.db" database created and populated in the +``${AIRFLOW_HOME}`` folder. You do not need to do anything to get the database created and initialized, +but if you need to clean and restart the db, you can run tests with ``-with-db-init`` flag - then the +database will be re-initialized. You can also set ``AIRFLOW__DATABASE__SQL_ALCHEMY_CONN`` environment +variable to point to supported database (Postgres, MySQL, etc.) and the tests will use that database. You +might need to run ``airflow db reset`` to initialize the database in that case. + +The "non-DB" tests are perfectly fine to run when you have database around but if you want to just run +DB tests (as happens in our CI for the ``Database`` runs) you can use ``--run-db-tests-only`` flag to filter +out non-DB tests (and obviously you can specify not only on the whole ``tests`` directory but on any +folders/files/tests selection, ``pytest`` supports). + +.. code-block:: bash + + pytest tests/ --run-db-tests-only + +You can also run DB tests with ``breeze`` dockerized environment. You can choose backend to use with +``--backend`` flag. The default is ``sqlite`` but you can also use others such as ``postgres`` or ``mysql``. +You can also select backend version and Python version to use. You can specify the ``test-type`` to run - +breeze will list the test types you can run with ``--help`` and provide auto-complete for them. Example +below runs the ``Core`` tests with ``postgres`` backend and ``3.8`` Python version: + +We have a dedicated, opinionated ``breeze testing db-tests`` command as well that runs DB tests +(it is also used in CI to run the DB tests, where you do not have to specify extra flags for +parallel running and you can run all the DB tests +(or just a subset of them with ``--parallel-test-types`` or ``--exclude-parallel-test-types``) in parallel: + +.. code-block:: bash + + breeze testing non-db-tests --backend postgres + +You can pass ``--parallel-test-type`` list of test types to execute or ``--exclude--parallel-test-types`` +to exclude them from the default set:. + +.. code-block:: bash + + breeze testing db-tests --parallel-test-types "Providers API CLI" + + +.. code-block:: bash + + breeze testing db-tests --exclude-parallel-test-types "Providers API CLI" + +You can also run the same commands via ``breeze testing tests`` - by adding the necessary flags manually: + +.. code-block:: bash + + breeze testing tests --run-db-tests-only --backend postgres --run-tests-in-parallel + + +Also - if you want to iterate with the tests you can enter interactive shell and run the tests iteratively - +either by package/module/test or by test type - whatever ``pytest`` supports. + +.. code-block:: bash + + breeze shell --backend postgres --python 3.8 + > pytest tests --run-db-tests-only + +As explained before, you cannot run DB tests in parallel using ``pytest-xdist`` plugin, but ``breeze`` has +support to split all the tests into test-types to run in separate containers and with separate databases +and you can run the tests using ``--run-tests-in-parallel`` flag (which is automatically enabled when +you use ``breeze testing db-tests`` command): + +.. code-block:: bash + + breeze testing tests --run-db-tests-only --backend postgres --python 3.8 --run-tests-in-parallel + +Examples of marking test as DB test +................................... + +You can apply the marker on method/function/class level with ``@pytest.mark.db_test`` decorator or +at the module level with ``pytestmark = pytest.mark.db_test`` at the top level of the module. + +It's up to the author to decide whether to mark the test, class, or module as "DB-test" - generally the +less DB tests - the better and if we can clearly separate the parts that are DB from non-DB, we should, +but also it's ok if few tests are marked as DB tests when they are not but they are part of the class +or module that is "mostly-DB". + +Sometimes, when your class can be clearly split to DB and non-DB parts, it's better to split the class +into two separate classes and mark only the DB class as DB test. + +Method level: + +.. code-block:: python + + import pytest + + + @pytest.mark.db_test + def test_add_tagging(self, sentry, task_instance): + ... + +Class level: + + +.. code-block:: python + + import pytest + + + @pytest.mark.db_test + class TestDatabricksHookAsyncAadTokenSpOutside: + ... + +Module level (at the top of the module): + +.. code-block:: python + + import pytest + + from airflow.models.baseoperator import BaseOperator + from airflow.models.dag import DAG + from airflow.ti_deps.dep_context import DepContext + from airflow.ti_deps.deps.task_concurrency_dep import TaskConcurrencyDep + + pytestmark = pytest.mark.db_test + + +Best practices for DB tests +........................... + +Usually when you add new tests you add tests "similar" to the ones that are already there. In most cases, +therefore you do not have to worry about the test type - it will be automatically selected for you by the +fact that the Test Class that you add the tests or the whole module will be marked with ``db_test`` marker. + +You should strive to write "pure" non-db unit tests (i.e. DB tests) but sometimes it's just better to plug-in +the existing framework of DagRuns, Dags, Connections and Variables to use the Database directly rather +than having to mock the DB access for example. It's up to you to decide. + +However, if you choose to write DB tests you have to make sure you add the ``db_test`` marker - either to +the test method, class (with decorator) or whole module (with pytestmark at the top level of the module). + +In most cases when you add tests to existing modules or classes, you follow similar tests so you do not +have to do anything, but in some cases you need to decide if your test should be marked as DB test or +whether it should be changed to not use the database at all. + +If your test accesses the database but is not marked properly the Non-DB test in CI will fail with this message: + +.. code :: + + "Your test accessed the DB but `_AIRFLOW_SKIP_DB_TESTS` is set. + Either make sure your test does not use database or mark your test with `@pytest.mark.db_test`. + + +How to verify if DB test is correctly classified +................................................ + +When you add if you want to see if your DB test is correctly classified, you can run the test or group +of tests with ``--skip-db-tests`` flag. + +You can run the all (or subset of) test types if you want to make sure all ot the problems are fixed + + .. code-block:: bash + + breeze testing tests --skip-db-tests tests/your_test.py + +For the whole test suite you can run: + + .. code-block:: bash + + breeze testing non-db-tests + +For selected test types (example - the tests will run for Providers/API/CLI code only: + + .. code-block:: bash + + breeze testing non-db-tests --parallel-test-types "Providers API CLI" + + +How to make your test not depend on DB +...................................... + +This is tricky and there is no single solution. Sometimes we can mock-out the methods that require +DB access or objects that normally require database. Sometimes we can decide to test just sinle method +of class rather than more complex set of steps. Generally speaking it's good to have as many "pure" +unit tests that require no DB as possible comparing to DB tests. They are usually faster an more +reliable as well. + + +Special cases +............. + +There are some tricky test cases that require special handling. Here are some of them: + + +Parameterized tests stability +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The parameterized tests require stable order of parameters if they are run via xdist - because the parameterized +tests are distributed among multiple processes and handled separately. In some cases the parameterized tests +have undefined / random order (or parameters are not hashable - for example set of enums). In such cases +the xdist execution of the tests will fail and you will get an error mentioning "Known Limitations of xdist". +You can see details about the limitation `here `_ + +The error in this case will look similar to: + + .. code-block:: + + Different tests were collected between gw0 and gw7. The difference is: + + +The fix for that is to sort the parameters in ``parametrize``. For example instead of this: + + .. code-block:: python + + @pytest.mark.parametrize("status", ALL_STATES) + def test_method(): + ... + + +do that: + + + .. code-block:: python + + @pytest.mark.parametrize("status", sorted(ALL_STATES)) + def test_method(): + ... + +Similarly if your parameters are defined as result of utcnow() or other dynamic method - you should +avoid that, or assign unique IDs for those parametrized tests. Instead of this: + + .. code-block:: python + + @pytest.mark.parametrize( + "url, expected_dag_run_ids", + [ + ( + f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte=" + f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", + [], + ), + ( + f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte=" + f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", + ["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"], + ), + ], + ) + def test_end_date_gte_lte(url, expected_dag_run_ids): + ... + +Do this: + + .. code-block:: python + + @pytest.mark.parametrize( + "url, expected_dag_run_ids", + [ + pytest.param( + f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_gte=" + f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", + [], + id="end_date_gte", + ), + pytest.param( + f"api/v1/dags/TEST_DAG_ID/dagRuns?end_date_lte=" + f"{urllib.parse.quote((timezone.utcnow() + timedelta(days=1)).isoformat())}", + ["TEST_DAG_RUN_ID_1", "TEST_DAG_RUN_ID_2"], + id="end_date_lte", + ), + ], + ) + def test_end_date_gte_lte(url, expected_dag_run_ids): + ... + + + +Problems with Non-DB test collection +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Sometimes, even if whole module is marked as ``@pytest.mark.db_test`` even parsing the file and collecting +tests will fail when ``--skip-db-tests`` is used because some of the imports od objects created in the +module will read the database. + +Usually what helps is to move such initialization code to inside the tests or pytest fixtures (and pass +objects needed by tests as fixtures rather than importing them from the module). Similarly you might +use DB - bound objects (like Connection) in your ``parametrize`` specification - this will also fail pytest +collection. Move creation of such objects to inside the tests: + +Moving object creation from top-level to inside tests. This code will break collection of tests even if +the test is marked as DB test: + + + .. code-block:: python + + pytestmark = pytest.mark.db_test + + TI = TaskInstance( + task=BashOperator(task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now()), + run_id="fake_run", + state=State.RUNNING, + ) + + + class TestCallbackRequest: + @pytest.mark.parametrize( + "input,request_class", + [ + (CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest), + ( + TaskCallbackRequest( + full_filepath="filepath", + simple_task_instance=SimpleTaskInstance.from_ti(ti=TI), + processor_subdir="/test_dir", + is_failure_callback=True, + ), + TaskCallbackRequest, + ), + ( + DagCallbackRequest( + full_filepath="filepath", + dag_id="fake_dag", + run_id="fake_run", + processor_subdir="/test_dir", + is_failure_callback=False, + ), + DagCallbackRequest, + ), + ( + SlaCallbackRequest( + full_filepath="filepath", + dag_id="fake_dag", + processor_subdir="/test_dir", + ), + SlaCallbackRequest, + ), + ], + ) + def test_from_json(self, input, request_class): + ... + + +Instead - this will not break collection. The TaskInstance is not initialized when the module is parsed, +it will only be initialized when the test gets executed because we moved initialization of it from +top level / parametrize to inside the test: + + .. code-block:: python + + pytestmark = pytest.mark.db_test + + + class TestCallbackRequest: + @pytest.mark.parametrize( + "input,request_class", + [ + (CallbackRequest(full_filepath="filepath", msg="task_failure"), CallbackRequest), + ( + None, # to be generated when test is run + TaskCallbackRequest, + ), + ( + DagCallbackRequest( + full_filepath="filepath", + dag_id="fake_dag", + run_id="fake_run", + processor_subdir="/test_dir", + is_failure_callback=False, + ), + DagCallbackRequest, + ), + ( + SlaCallbackRequest( + full_filepath="filepath", + dag_id="fake_dag", + processor_subdir="/test_dir", + ), + SlaCallbackRequest, + ), + ], + ) + def test_from_json(self, input, request_class): + if input is None: + ti = TaskInstance( + task=BashOperator( + task_id="test", bash_command="true", dag=DAG(dag_id="id"), start_date=datetime.now() + ), + run_id="fake_run", + state=State.RUNNING, + ) + + input = TaskCallbackRequest( + full_filepath="filepath", + simple_task_instance=SimpleTaskInstance.from_ti(ti=ti), + processor_subdir="/test_dir", + is_failure_callback=True, + ) + + +Sometimes it is difficult to rewrite the tests, so you might add conditional handling and mock out some +database-bound methods or objects to avoid hitting the database during test collection. The code below +will hit the Database while parsing the tests, because this is what Variable.setdefault does when +parametrize specification is being parsed - even if test is marked as DB test. + + + .. code-block:: python + + from airflow.models.variable import Variable + + pytestmark = pytest.mark.db_test + + initial_db_init() + + + @pytest.mark.parametrize( + "env, expected", + [ + pytest.param( + {"plain_key": "plain_value"}, + "{'plain_key': 'plain_value'}", + id="env-plain-key-val", + ), + pytest.param( + {"plain_key": Variable.setdefault("plain_var", "banana")}, + "{'plain_key': 'banana'}", + id="env-plain-key-plain-var", + ), + pytest.param( + {"plain_key": Variable.setdefault("secret_var", "monkey")}, + "{'plain_key': '***'}", + id="env-plain-key-sensitive-var", + ), + pytest.param( + {"plain_key": "{{ var.value.plain_var }}"}, + "{'plain_key': '{{ var.value.plain_var }}'}", + id="env-plain-key-plain-tpld-var", + ), + ], + ) + def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): + ... + + +You can make the code conditional and mock out the Variable to avoid hitting the database. + + + .. code-block:: python + + from airflow.models.variable import Variable + + pytestmark = pytest.mark.db_test + + + if os.environ.get("_AIRFLOW_SKIP_DB_TESTS") == "true": + # Handle collection of the test by non-db case + Variable = mock.MagicMock() # type: ignore[misc] # noqa: F811 + else: + initial_db_init() + + + @pytest.mark.parametrize( + "env, expected", + [ + pytest.param( + {"plain_key": "plain_value"}, + "{'plain_key': 'plain_value'}", + id="env-plain-key-val", + ), + pytest.param( + {"plain_key": Variable.setdefault("plain_var", "banana")}, + "{'plain_key': 'banana'}", + id="env-plain-key-plain-var", + ), + pytest.param( + {"plain_key": Variable.setdefault("secret_var", "monkey")}, + "{'plain_key': '***'}", + id="env-plain-key-sensitive-var", + ), + pytest.param( + {"plain_key": "{{ var.value.plain_var }}"}, + "{'plain_key': '{{ var.value.plain_var }}'}", + id="env-plain-key-plain-tpld-var", + ), + ], + ) + def test_rendered_task_detail_env_secret(patch_app, admin_client, request, env, expected): + ... + +You can also use fixture to create object that needs database just like this. + + + .. code-block:: python + + from airflow.models import Connection + + pytestmark = pytest.mark.db_test + + + @pytest.fixture() + def get_connection1(): + return Connection() + + + @pytest.fixture() + def get_connection2(): + return Connection(host="apache.org", extra={}) + + + @pytest.mark.parametrize( + "conn", + [ + "get_connection1", + "get_connection2", + ], + ) + def test_as_json_from_connection(self, conn: Connection): + conn = request.getfixturevalue(conn) + ... + + +Running Unit tests +------------------ + +Running Unit Tests from PyCharm IDE +................................... + +To run unit tests from the PyCharm IDE, create the `local virtualenv <07_local_virtualenv.rst>`_, +select it as the default project's environment, then configure your test runner: + +.. image:: images/pycharm/configure_test_runner.png + :align: center + :alt: Configuring test runner + +and run unit tests as follows: + +.. image:: images/pycharm/running_unittests.png + :align: center + :alt: Running unit tests + +**NOTE:** You can run the unit tests in the standalone local virtualenv +(with no Breeze installed) if they do not have dependencies such as +Postgres/MySQL/Hadoop/etc. + +Running Unit Tests from PyCharm IDE using Breeze +................................................ + +Ideally, all unit tests should be run using the standardized Breeze environment. While not +as convenient as the one-click "play button" in PyCharm, the IDE can be configured to do +this in two clicks. + +1. Add Breeze as an "External Tool": + + a. From the settings menu, navigate to Tools > External Tools + b. Click the little plus symbol to open the "Create Tool" popup and fill it out: + +.. image:: images/pycharm/pycharm_create_tool.png + :align: center + :alt: Installing Python extension + +2. Add the tool to the context menu: + + a. From the settings menu, navigate to Appearance & Behavior > Menus & Toolbars > Project View Popup Menu + b. Click on the list of entries where you would like it to be added. Right above or below "Project View Popup Menu Run Group" may be a good choice, you can drag and drop this list to rearrange the placement later as desired. + c. Click the little plus at the top of the popup window + d. Find your "External Tool" in the new "Choose Actions to Add" popup and click OK. If you followed the image above, it will be at External Tools > External Tools > Breeze + +**Note:** That only adds the option to that one menu. If you would like to add it to the context menu +when right-clicking on a tab at the top of the editor, for example, follow the steps above again +and place it in the "Editor Tab Popup Menu" + +.. image:: images/pycharm/pycharm_add_to_context.png + :align: center + :alt: Installing Python extension + +3. To run tests in Breeze, right click on the file or directory in the Project View and click Breeze. + + +Running Unit Tests from Visual Studio Code +.......................................... + +To run unit tests from the Visual Studio Code: + +1. Using the ``Extensions`` view install Python extension, reload if required + +.. image:: images/vscode_install_python_extension.png + :align: center + :alt: Installing Python extension + +2. Using the ``Testing`` view click on ``Configure Python Tests`` and select ``pytest`` framework + +.. image:: images/vscode_configure_python_tests.png + :align: center + :alt: Configuring Python tests + +.. image:: images/vscode_select_pytest_framework.png + :align: center + :alt: Selecting pytest framework + +3. Open ``/.vscode/settings.json`` and add ``"python.testing.pytestArgs": ["tests"]`` to enable tests discovery + +.. image:: images/vscode_add_pytest_settings.png + :align: center + :alt: Enabling tests discovery + +4. Now you are able to run and debug tests from both the ``Testing`` view and test files + +.. image:: images/vscode_run_tests.png + :align: center + :alt: Running tests + +Running Unit Tests in local virtualenv +...................................... + +To run unit, integration, and system tests from the Breeze and your +virtualenv, you can use the `pytest `_ framework. + +Custom ``pytest`` plugin runs ``airflow db init`` and ``airflow db reset`` the first +time you launch them. So, you can count on the database being initialized. Currently, +when you run tests not supported **in the local virtualenv, they may either fail +or provide an error message**. + +There are many available options for selecting a specific test in ``pytest``. Details can be found +in the official documentation, but here are a few basic examples: + +.. code-block:: bash + + pytest tests/core -k "TestCore and not check" + +This runs the ``TestCore`` class but skips tests of this class that include 'check' in their names. +For better performance (due to a test collection), run: + +.. code-block:: bash + + pytest tests/core/test_core.py -k "TestCore and not bash" + +This flag is useful when used to run a single test like this: + +.. code-block:: bash + + pytest tests/core/test_core.py -k "test_check_operators" + +This can also be done by specifying a full path to the test: + +.. code-block:: bash + + pytest tests/core/test_core.py::TestCore::test_dag_params_and_task_params + +To run the whole test class, enter: + +.. code-block:: bash + + pytest tests/core/test_core.py::TestCore + +You can use all available ``pytest`` flags. For example, to increase a log level +for debugging purposes, enter: + +.. code-block:: bash + + pytest --log-cli-level=DEBUG tests/core/test_core.py::TestCore + + +Running Tests using Breeze interactive shell +............................................ + +You can run tests interactively using regular pytest commands inside the Breeze shell. This has the +advantage, that Breeze container has all the dependencies installed that are needed to run the tests +and it will ask you to rebuild the image if it is needed and some new dependencies should be installed. + +By using interactive shell and iterating over the tests, you can iterate and re-run tests one-by-one +or group by group right after you modified them. + +Entering the shell is as easy as: + +.. code-block:: bash + + breeze + +This should drop you into the container. + +You can also use other switches (like ``--backend`` for example) to configure the environment for your +tests (and for example to switch to different database backend - see ``--help`` for more details). + +Once you enter the container, you might run regular pytest commands. For example: + +.. code-block:: bash + + pytest --log-cli-level=DEBUG tests/core/test_core.py::TestCore + + +Running Tests using Breeze from the Host +........................................ + +If you wish to only run tests and not to drop into the shell, apply the +``tests`` command. You can add extra targets and pytest flags after the ``--`` command. Note that +often you want to run the tests with a clean/reset db, so usually you want to add ``--db-reset`` flag +to breeze command. The Breeze image usually will have all the dependencies needed and it +will ask you to rebuild the image if it is needed and some new dependencies should be installed. + +.. code-block:: bash + + breeze testing tests tests/providers/http/hooks/test_http.py tests/core/test_core.py --db-reset --log-cli-level=DEBUG + +You can run the whole test suite without adding the test target: + +.. code-block:: bash + + breeze testing tests --db-reset + +You can also specify individual tests or a group of tests: + +.. code-block:: bash + + breeze testing tests --db-reset tests/core/test_core.py::TestCore + +You can also limit the tests to execute to specific group of tests + +.. code-block:: bash + + breeze testing tests --test-type Core + +In case of Providers tests, you can run tests for all providers + +.. code-block:: bash + + breeze testing tests --test-type Providers + +You can limit the set of providers you would like to run tests of + +.. code-block:: bash + + breeze testing tests --test-type "Providers[airbyte,http]" + +You can also run all providers but exclude the providers you would like to skip + +.. code-block:: bash + + breeze testing tests --test-type "Providers[-amazon,google]" + + +Sometimes you need to inspect docker compose after tests command complete, +for example when test environment could not be properly set due to +failed healthchecks. This can be achieved with ``--skip-docker-compose-down`` +flag: + +.. code-block:: bash + + breeze testing tests --skip--docker-compose-down + + +Running full Airflow unit test suite in parallel +................................................ + +If you run ``breeze testing tests --run-in-parallel`` tests run in parallel +on your development machine - maxing out the number of parallel runs at the number of cores you +have available in your Docker engine. + +In case you do not have enough memory available to your Docker (8 GB), the ``Integration``. ``Provider`` +and ``Core`` test type are executed sequentially with cleaning the docker setup in-between. This +allows to print + +This allows for massive speedup in full test execution. On 8 CPU machine with 16 cores and 64 GB memory +and fast SSD disk, the whole suite of tests completes in about 5 minutes (!). Same suite of tests takes +more than 30 minutes on the same machine when tests are run sequentially. + +.. note:: + + On MacOS you might have less CPUs and less memory available to run the tests than you have in the host, + simply because your Docker engine runs in a Linux Virtual Machine under-the-hood. If you want to make + use of the parallelism and memory usage for the CI tests you might want to increase the resources available + to your docker engine. See the `Resources `_ chapter + in the ``Docker for Mac`` documentation on how to do it. + +You can also limit the parallelism by specifying the maximum number of parallel jobs via +MAX_PARALLEL_TEST_JOBS variable. If you set it to "1", all the test types will be run sequentially. + +.. code-block:: bash + + MAX_PARALLEL_TEST_JOBS="1" ./scripts/ci/testing/ci_run_airflow_testing.sh + +.. note:: + + In case you would like to cleanup after execution of such tests you might have to cleanup + some of the docker containers running in case you use ctrl-c to stop execution. You can easily do it by + running this command (it will kill all docker containers running so do not use it if you want to keep some + docker containers running): + + .. code-block:: bash + + docker kill $(docker ps -q) + +Running Backend-Specific Tests +.............................. + +Tests that are using a specific backend are marked with a custom pytest marker ``pytest.mark.backend``. +The marker has a single parameter - the name of a backend. It corresponds to the ``--backend`` switch of +the Breeze environment (one of ``mysql``, ``sqlite``, or ``postgres``). Backend-specific tests only run when +the Breeze environment is running with the right backend. If you specify more than one backend +in the marker, the test runs for all specified backends. + +Example of the ``postgres`` only test: + +.. code-block:: python + + @pytest.mark.backend("postgres") + def test_copy_expert(self): + ... + + +Example of the ``postgres,mysql`` test (they are skipped with the ``sqlite`` backend): + +.. code-block:: python + + @pytest.mark.backend("postgres", "mysql") + def test_celery_executor(self): + ... + + +You can use the custom ``--backend`` switch in pytest to only run tests specific for that backend. +Here is an example of running only postgres-specific backend tests: + +.. code-block:: bash + + pytest --backend postgres + +Running Long-running tests +.......................... + +Some of the tests rung for a long time. Such tests are marked with ``@pytest.mark.long_running`` annotation. +Those tests are skipped by default. You can enable them with ``--include-long-running`` flag. You +can also decide to only run tests with ``-m long-running`` flags to run only those tests. + +Running Quarantined tests +......................... + +Some of our tests are quarantined. This means that this test will be run in isolation and that it will be +re-run several times. Also when quarantined tests fail, the whole test suite will not fail. The quarantined +tests are usually flaky tests that need some attention and fix. + +Those tests are marked with ``@pytest.mark.quarantined`` annotation. +Those tests are skipped by default. You can enable them with ``--include-quarantined`` flag. You +can also decide to only run tests with ``-m quarantined`` flag to run only those tests. + +Running Tests with provider packages +.................................... + +Airflow 2.0 introduced the concept of splitting the monolithic Airflow package into separate +providers packages. The main "apache-airflow" package contains the bare Airflow implementation, +and additionally we have 70+ providers that we can install additionally to get integrations with +external services. Those providers live in the same monorepo as Airflow, but we build separate +packages for them and the main "apache-airflow" package does not contain the providers. + +Most of the development in Breeze happens by iterating on sources and when you run +your tests during development, you usually do not want to build packages and install them separately. +Therefore by default, when you enter Breeze airflow and all providers are available directly from +sources rather than installed from packages. This is for example to test the "provider discovery" +mechanism available that reads provider information from the package meta-data. + +When Airflow is run from sources, the metadata is read from provider.yaml +files, but when Airflow is installed from packages, it is read via the package entrypoint +``apache_airflow_provider``. + +By default, all packages are prepared in wheel format. To install Airflow from packages you +need to run the following steps: + +1. Prepare provider packages + +.. code-block:: bash + + breeze release-management prepare-provider-packages [PACKAGE ...] + +If you run this command without packages, you will prepare all packages. However, You can specify +providers that you would like to build if you just want to build few provider packages. +The packages are prepared in ``dist`` folder. Note that this command cleans up the ``dist`` folder +before running, so you should run it before generating ``apache-airflow`` package. + +2. Prepare airflow packages + +.. code-block:: bash + + breeze release-management prepare-airflow-package + +This prepares airflow .whl package in the dist folder. + +3. Enter breeze installing both airflow and providers from the dist packages + +.. code-block:: bash + + breeze --use-airflow-version wheel --use-packages-from-dist --mount-sources skip + + +Code Coverage +------------- + +Airflow's CI process automatically uploads the code coverage report to codecov.io. +For the most recent coverage report of the main branch, visit: https://codecov.io/gh/apache/airflow. + +Generating Local Coverage Reports: +.................................. + +If you wish to obtain coverage reports for specific areas of the codebase on your local machine, follow these steps: + +a. Initiate a breeze shell. + +b. Execute one of the commands below based on the desired coverage area: + + - **Core:** ``python scripts/cov/core_coverage.py`` + - **REST API:** ``python scripts/cov/restapi_coverage.py`` + - **CLI:** ``python scripts/cov/cli_coverage.py`` + - **Webserver:** ``python scripts/cov/www_coverage.py`` + +c. After execution, the coverage report will be available at: http://localhost:28000/dev/coverage/index.html. + + .. note:: + + In order to see the coverage report, you must start webserver first in breeze environment via the + `airflow webserver`. Once you enter `breeze`, you can start `tmux` (terminal multiplexer) and + split the terminal (by pressing `ctrl-B "` for example) to continue testing and run the webserver + in one terminal and run tests in the second one (you can switch between the terminals with `ctrl-B `). + +Modules Not Fully Covered: +.......................... + +Each coverage command provides a list of modules that aren't fully covered. If you wish to enhance coverage for a particular module: + +a. Work on the module to improve its coverage. + +b. Once coverage reaches 100%, you can safely remove the module from the list of modules that are not fully covered. + This list is inside each command's source code. + +Tracking SQL statements +----------------------- + +You can run tests with SQL statements tracking. To do this, use the ``--trace-sql`` option and pass the +columns to be displayed as an argument. Each query will be displayed on a separate line. +Supported values: + +* ``num`` - displays the query number; +* ``time`` - displays the query execution time; +* ``trace`` - displays the simplified (one-line) stack trace; +* ``sql`` - displays the SQL statements; +* ``parameters`` - display SQL statement parameters. + +If you only provide ``num``, then only the final number of queries will be displayed. + +By default, pytest does not display output for successful tests, if you still want to see them, you must +pass the ``--capture=no`` option. + +If you run the following command: + +.. code-block:: bash + + pytest --trace-sql=num,sql,parameters --capture=no \ + tests/jobs/test_scheduler_job.py -k test_process_dags_queries_count_05 + +On the screen you will see database queries for the given test. + +SQL query tracking does not work properly if your test runs subprocesses. Only queries from the main process +are tracked. + +----- + +For other kinds of tests look at `Testing document <../09_testing.rst>`__ diff --git a/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md b/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md index d9dc02b4b0e82..9b981acd31377 100644 --- a/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md +++ b/dev/MANUALLY_GENERATING_IMAGE_CACHE_AND_CONSTRAINTS.md @@ -67,9 +67,9 @@ cases where we need to update them manually. This document describes how to do i Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and merges to `main` branch builds (also known as `canary` builds) have separate maintenance step that take care about refreshing the cache that is used to speed up our builds and to speed up -rebuilding of [Breeze](./breeze/doc/breeze.rst) images for development purpose. This is all happening automatically, usually: +rebuilding of [Breeze](./breeze/doc/README.rst) images for development purpose. This is all happening automatically, usually: -* The latest [constraints](../CONTRIBUTING.rst#pinned-constraint-files) are pushed to appropriate branch after all tests succeed in the +* The latest [constraints](../contributing-docs/12_airflow_dependencies_and_extras.rst#pinned-constraint-files) are pushed to appropriate branch after all tests succeed in the `canary` build. * The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed early at the beginning of the `canary` build. This diff --git a/dev/README_RELEASE_AIRFLOW.md b/dev/README_RELEASE_AIRFLOW.md index 16147f5ecb64a..0504d94714bcd 100644 --- a/dev/README_RELEASE_AIRFLOW.md +++ b/dev/README_RELEASE_AIRFLOW.md @@ -693,7 +693,7 @@ that the Airflow works as you expected. Breeze also allows you to easily build and install pre-release candidates including providers by following simple instructions described in -[Manually testing release candidate packages](https://github.com/apache/airflow/blob/main/TESTING.rst#manually-testing-release-candidate-packages) +[Manually testing release candidate packages](https://github.com/apache/airflow/blob/main/contributing-docs/testing/testing-packages.rst) # Publish the final Apache Airflow release diff --git a/dev/README_RELEASE_PROVIDER_PACKAGES.md b/dev/README_RELEASE_PROVIDER_PACKAGES.md index c71142225a05a..b99498b741042 100644 --- a/dev/README_RELEASE_PROVIDER_PACKAGES.md +++ b/dev/README_RELEASE_PROVIDER_PACKAGES.md @@ -936,7 +936,7 @@ the release candidate version. Breeze allows you to easily install and run pre-release candidates by following simple instructions described in -[Manually testing release candidate packages](https://github.com/apache/airflow/blob/main/TESTING.rst#manually-testing-release-candidate-packages) +[Manually testing release candidate packages](https://github.com/apache/airflow/blob/main/contributing-docs/testing/testing_packages.rst) But you can use any of the installation methods you prefer (you can even install it via the binary wheels downloaded from the SVN). diff --git a/dev/airflow-github b/dev/airflow-github index 91aa56835538d..97b04c6ecde83 100755 --- a/dev/airflow-github +++ b/dev/airflow-github @@ -141,18 +141,14 @@ def is_core_commit(files: list[str]) -> bool: "clients", # non-released docs "COMMITTERS.rst", - "CONTRIBUTING.rst", - "CONTRIBUTORS_QUICK_START.rst", + "contributing_docs/", "IMAGES.rst", - "LOCAL_VIRTUALENV.rst", "INTHEWILD.md", "INSTALL", "README.md", "CI.rst", "CI_DIAGRAMS.md", - "STATIC_CODE_CHECKS.rst", "images/", - "TESTING.rst", "codecov.yml", "kubernetes_tests/", ".github/", diff --git a/dev/breeze/README.md b/dev/breeze/README.md index 52bbaaa4aec14..6d235e56b971f 100644 --- a/dev/breeze/README.md +++ b/dev/breeze/README.md @@ -55,7 +55,7 @@ pipx install -e ./dev/breeze --force ``` -You can read more about Breeze in the [documentation](https://github.com/apache/airflow/blob/main/dev/breeze/doc/breeze.rst) +You can read more about Breeze in the [documentation](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst) This README file contains automatically generated hash of the `pyproject.toml` files that were available when the package was installed. Since this file becomes part of the installed package, it helps diff --git a/dev/breeze/SELECTIVE_CHECKS.md b/dev/breeze/SELECTIVE_CHECKS.md index b35dd9903b6eb..e9f1f7547ef73 100644 --- a/dev/breeze/SELECTIVE_CHECKS.md +++ b/dev/breeze/SELECTIVE_CHECKS.md @@ -67,7 +67,7 @@ We have the following Groups of files for CI that determine which tests are run: We have a number of `TEST_TYPES` that can be selectively disabled/enabled based on the content of the incoming PR. Usually they are limited to a sub-folder of the "tests" folder but there -are some exceptions. You can read more about those in `TESTING.rst `. Those types +are some exceptions. You can read more about those in `testing.rst `. Those types are determined by selective checks and are used to run `DB` and `Non-DB` tests. The `DB` tests inside each `TEST_TYPE` are run sequentially (because they use DB as state) while `TEST_TYPES` diff --git a/dev/breeze/doc/01_installation.rst b/dev/breeze/doc/01_installation.rst index 2f0e49765cdbd..5f8201ecc2997 100644 --- a/dev/breeze/doc/01_installation.rst +++ b/dev/breeze/doc/01_installation.rst @@ -85,8 +85,8 @@ Here is an example configuration with more than 200GB disk space for Docker: alt="Docker socket used"> -Note: If you use Colima, please follow instructions at: `Contributors Quick Start Guide `__ +Note: If you use Colima, please follow instructions at: +`Contributors Quick Start Guide <../../../contributing-docs/03_contributors_quick_start.rst>`__ Docker Compose -------------- diff --git a/dev/breeze/doc/02_customizing.rst b/dev/breeze/doc/02_customizing.rst index 277674f29b1cc..ddb1875bac93b 100644 --- a/dev/breeze/doc/02_customizing.rst +++ b/dev/breeze/doc/02_customizing.rst @@ -95,7 +95,7 @@ Launching Breeze integrations When Breeze starts, it can start additional integrations. Those are additional docker containers that are started in the same docker-compose command. Those are required by some of the tests -as described in ``_. +as described in `<../../../contributing-docs/testing/integration-tests.rst>`_. By default Breeze starts only airflow container without any integration enabled. If you selected ``postgres`` or ``mysql`` backend, the container for the selected backend is also started (but only the one diff --git a/dev/breeze/doc/03_developer_tasks.rst b/dev/breeze/doc/03_developer_tasks.rst index 8ef92c89fd47f..a0e983a18444c 100644 --- a/dev/breeze/doc/03_developer_tasks.rst +++ b/dev/breeze/doc/03_developer_tasks.rst @@ -552,6 +552,7 @@ To use your host IDE with Breeze: Note that you can also use the local virtualenv for Airflow development without Breeze. This is a lightweight solution that has its own limitations. -More details on using the local virtualenv are available in the `Local Virtualenv <../../../LOCAL_VIRTUALENV.rst>`_. +More details on using the local virtualenv are available in the +`Local Virtualenv <../../../contributing-docs/07_local_virtualenv.rst>`_. Next step: Follow the `Troubleshooting <04_troubleshooting.rst>`_ guide to troubleshoot your Breeze environment. diff --git a/dev/breeze/doc/05_test_commands.rst b/dev/breeze/doc/05_test_commands.rst index 3e91af9aa5781..afb68b460b6ea 100644 --- a/dev/breeze/doc/05_test_commands.rst +++ b/dev/breeze/doc/05_test_commands.rst @@ -68,7 +68,7 @@ To run the whole test class: You can re-run the tests interactively, add extra parameters to pytest and modify the files before re-running the test to iterate over the tests. You can also add more flags when starting the ``breeze shell`` command when you run integration tests or system tests. Read more details about it -in the `testing doc `_ where all the test types and information on how to run them are explained. +in the `testing doc <../../../contributing-docs/testing.rst>`_ where all the test types and information on how to run them are explained. This applies to all kind of tests - all our tests can be run using pytest. @@ -288,7 +288,7 @@ You can: ``breeze k8s delete-all-clusters`` commands as well as running complete tests in parallel via ``breeze k8s dump-logs`` command -This is described in detail in `Testing Kubernetes `_. +This is described in detail in `Testing Kubernetes <../../../contributing-docs/testing/k8s_tests.rst>`_. You can read more about KinD that we use in `The documentation `_ diff --git a/dev/breeze/doc/09_release_management_tasks.rst b/dev/breeze/doc/09_release_management_tasks.rst index 0691eed31fa43..4c9ee55638204 100644 --- a/dev/breeze/doc/09_release_management_tasks.rst +++ b/dev/breeze/doc/09_release_management_tasks.rst @@ -385,7 +385,7 @@ Whenever ``pyproject.toml`` gets modified, the CI main job will re-generate cons files are stored in separated orphan branches: ``constraints-main``, ``constraints-2-0``. Those are constraint files as described in detail in the -``_ contributing documentation. +`<../../../contributing-docs/12_airflow_dependencies_and_extras.rst#pinned-constraint-files>`_ contributing documentation. You can use ``breeze release-management generate-constraints`` command to manually generate constraints for diff --git a/dev/breeze/doc/breeze.rst b/dev/breeze/doc/README.rst similarity index 96% rename from dev/breeze/doc/breeze.rst rename to dev/breeze/doc/README.rst index a6c4cb9341e6f..0c6c621ceaac1 100644 --- a/dev/breeze/doc/breeze.rst +++ b/dev/breeze/doc/README.rst @@ -32,7 +32,8 @@ The environment is available for local use and is also used in Airflow's CI test We call it *Airflow Breeze* as **It's a Breeze to contribute to Airflow**. The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow -are described in `CONTRIBUTING.rst `_. +are described in +`Integration test development environments <../../../contributing-docs/06_development_environments.rst>`_. You can use the Breeze environment to run Airflow's tests locally and reproduce CI failures. diff --git a/dev/breeze/doc/adr/0002-implement-standalone-python-command.md b/dev/breeze/doc/adr/0002-implement-standalone-python-command.md index 1d9991054a58e..37eebcf3e15d1 100644 --- a/dev/breeze/doc/adr/0002-implement-standalone-python-command.md +++ b/dev/breeze/doc/adr/0002-implement-standalone-python-command.md @@ -39,7 +39,7 @@ Accepted ## Context -The [Breeze](https://github.com/apache/airflow/blob/main/dev/breeze/doc/breeze.rst) is +The [Breeze](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst) is a command line development environment for Apache Airflow that makes it easy to setup Airflow development and test environment easily (< 10 minutes is the goal) and enable contributors to run any subset diff --git a/dev/breeze/src/airflow_breeze/commands/ci_image_commands.py b/dev/breeze/src/airflow_breeze/commands/ci_image_commands.py index f3f4ca222a06e..7433e99249a98 100644 --- a/dev/breeze/src/airflow_breeze/commands/ci_image_commands.py +++ b/dev/breeze/src/airflow_breeze/commands/ci_image_commands.py @@ -709,7 +709,7 @@ def should_we_run_the_build(build_ci_params: BuildCiParams) -> bool: get_console().print( f"[info]Please rebase your code to latest {build_ci_params.airflow_branch} " "before continuing.[/]\nCheck this link to find out how " - "https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#id15\n" + "https://github.com/apache/airflow/blob/main/contributing-docs/11_working_with_git.rst\n" ) get_console().print("[error]Exiting the process[/]\n") sys.exit(1) diff --git a/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py b/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py index 15dab295a74f1..5a4e569e42a7b 100644 --- a/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py +++ b/dev/breeze/src/airflow_breeze/utils/docker_command_utils.py @@ -89,7 +89,6 @@ ("docs", "/opt/airflow/docs"), ("generated", "/opt/airflow/generated"), ("hooks", "/opt/airflow/hooks"), - ("images", "/opt/airflow/images"), ("logs", "/root/airflow/logs"), ("pyproject.toml", "/opt/airflow/pyproject.toml"), ("scripts", "/opt/airflow/scripts"), diff --git a/docs/README.rst b/docs/README.rst index 477385e8bf4b5..e16f22499e894 100644 --- a/docs/README.rst +++ b/docs/README.rst @@ -88,7 +88,7 @@ To make an edit to an autogenerated doc, you need to make changes to a string in Building documentation ====================== -To generate a local version of the docs you can use `<../dev/breeze/doc/breeze.rst>`_. +To generate a local version of the docs you can use `<../dev/breeze/doc/README.rst>`_. The documentation build consists of verifying consistency of documentation and two steps: diff --git a/docs/apache-airflow-providers/howto/create-custom-providers.rst b/docs/apache-airflow-providers/howto/create-custom-providers.rst index 0e868ac253bd2..dd7eecb697f46 100644 --- a/docs/apache-airflow-providers/howto/create-custom-providers.rst +++ b/docs/apache-airflow-providers/howto/create-custom-providers.rst @@ -109,7 +109,7 @@ they define the extensions properly. See :doc:`apache-airflow:cli-and-env-variab sub-commands. When you write your own provider, consider following the -`Naming conventions for provider packages `_ +`Naming conventions for provider packages `_ Special considerations '''''''''''''''''''''' diff --git a/docs/docker-stack/README.md b/docs/docker-stack/README.md index 99862c1f38a87..9b58311ad122f 100644 --- a/docs/docker-stack/README.md +++ b/docs/docker-stack/README.md @@ -63,7 +63,7 @@ packages or even custom providers. You can learn how to do it in [Building the i The production images are build in DockerHub from released version and release candidates. There are also images published from branches but they are used mainly for development and testing purpose. -See [Airflow Git Branching](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#airflow-git-branches) +See [Airflow Git Branching](https://github.com/apache/airflow/blob/main/contributing-docs/working-with-git#airflow-git-branches) for details. ## Usage diff --git a/docs/docker-stack/index.rst b/docs/docker-stack/index.rst index 1cf264d97b531..07d845bf5dd2d 100644 --- a/docs/docker-stack/index.rst +++ b/docs/docker-stack/index.rst @@ -80,7 +80,7 @@ packages or even custom providers. You can learn how to do it in :ref:`Building The production images are build in DockerHub from released version and release candidates. There are also images published from branches but they are used mainly for development and testing purpose. -See `Airflow Git Branching `_ +See `Airflow Git Branching `_ for details. Fixing images at release time diff --git a/generated/PYPI_README.md b/generated/PYPI_README.md index cd55a24db2666..27a4b546c2941 100644 --- a/generated/PYPI_README.md +++ b/generated/PYPI_README.md @@ -163,11 +163,16 @@ release provided they have access to the appropriate platform and tools. ## Contributing -Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst). +Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/contributing-docs/README.rst). Official Docker (container) images for Apache Airflow are described in [IMAGES.rst](https://github.com/apache/airflow/blob/main/IMAGES.rst). +## Voting Policy + +* Commits need a +1 vote from a committer who is not the author +* When we do AIP voting, both PMC member's and committer's `+1s` are considered a binding vote. + ## Who uses Apache Airflow? We know about around 500 organizations that are using Apache Airflow (but there are likely many more) diff --git a/generated/README.md b/generated/README.md index 070fecf89e550..f3648060140ae 100644 --- a/generated/README.md +++ b/generated/README.md @@ -20,7 +20,7 @@ NOTE! The files in this folder are generated by pre-commit based on airflow sources. They are not supposed to be manually modified. -You can read more about pre-commit hooks [here](../STATIC_CODE_CHECKS.rst#pre-commit-hooks). +You can read more about pre-commit hooks [here](../contributing-docs/08_static_code_checks.rst#pre-commit-hooks). * `provider_dependencies.json` - is generated based on `provider.yaml` files in `airflow/providers` and based on the imports in the provider code. If you want to add new dependency to a provider, you diff --git a/scripts/ci/docker-compose/local.yml b/scripts/ci/docker-compose/local.yml index 6a86a203d9a1c..07ec8a89c620c 100644 --- a/scripts/ci/docker-compose/local.yml +++ b/scripts/ci/docker-compose/local.yml @@ -82,9 +82,6 @@ services: - type: bind source: ../../../hooks target: /opt/airflow/hooks - - type: bind - source: ../../../images - target: /opt/airflow/images - type: bind source: ../../../logs target: /root/airflow/logs diff --git a/scripts/ci/pre_commit/pre_commit_check_integrations_list.py b/scripts/ci/pre_commit/pre_commit_check_integrations_list.py index f965ec3ea494f..5ad9974d4cbbd 100755 --- a/scripts/ci/pre_commit/pre_commit_check_integrations_list.py +++ b/scripts/ci/pre_commit/pre_commit_check_integrations_list.py @@ -40,7 +40,7 @@ ) from tabulate import tabulate -DOCUMENTATION_PATH = AIRFLOW_SOURCES_ROOT_PATH / "TESTING.rst" +DOCUMENTATION_PATH = AIRFLOW_SOURCES_ROOT_PATH / "contributing-docs" / "testing" / "integration_tests.rst" INTEGRATION_TESTS_PATH = AIRFLOW_SOURCES_ROOT_PATH / "scripts" / "ci" / "docker-compose" INTEGRATION_TEST_PREFIX = "integration-*.yml" DOCS_MARKER_START = ".. BEGIN AUTO-GENERATED INTEGRATION LIST" @@ -129,7 +129,7 @@ def update_integration_tests_array(contents: dict[str, list[str]]): rows.append((integration, formatted_hook_description)) formatted_table = "\n" + tabulate(rows, tablefmt="grid", headers=("Identifier", "Description")) + "\n\n" insert_documentation( - file_path=AIRFLOW_SOURCES_ROOT_PATH / "TESTING.rst", + file_path=AIRFLOW_SOURCES_ROOT_PATH / "contributing-docs" / "testing" / "integration_tests.rst", content=formatted_table.splitlines(keepends=True), header=DOCS_MARKER_START, footer=DOCS_MARKER_END, diff --git a/scripts/ci/pre_commit/pre_commit_check_pre_commit_hooks.py b/scripts/ci/pre_commit/pre_commit_check_pre_commit_hooks.py index dd7c151412a74..8929440703cf5 100755 --- a/scripts/ci/pre_commit/pre_commit_check_pre_commit_hooks.py +++ b/scripts/ci/pre_commit/pre_commit_check_pre_commit_hooks.py @@ -133,7 +133,7 @@ def update_static_checks_array(hooks: dict[str, list[str]], image_hooks: list[st rows.append((hook_id, formatted_hook_description, " * " if hook_id in image_hooks else " ")) formatted_table = "\n" + tabulate(rows, tablefmt="grid", headers=("ID", "Description", "Image")) + "\n\n" insert_documentation( - file_path=AIRFLOW_SOURCES_ROOT_PATH / "STATIC_CODE_CHECKS.rst", + file_path=AIRFLOW_SOURCES_ROOT_PATH / "contributing-docs" / "08_static_code_checks.rst", content=formatted_table.splitlines(keepends=True), header=" .. BEGIN AUTO-GENERATED STATIC CHECK LIST", footer=" .. END AUTO-GENERATED STATIC CHECK LIST", diff --git a/scripts/ci/pre_commit/pre_commit_insert_extras.py b/scripts/ci/pre_commit/pre_commit_insert_extras.py index b8a8ff6f838b4..d64cd6cd5589a 100755 --- a/scripts/ci/pre_commit/pre_commit_insert_extras.py +++ b/scripts/ci/pre_commit/pre_commit_insert_extras.py @@ -51,7 +51,10 @@ def get_header_and_footer(extra_type: ExtraType, file_format: str) -> tuple[str, def get_wrapped_list(extras_set: set[str]) -> list[str]: - return [line + "\n" for line in textwrap.wrap(", ".join(sorted(extras_set)), 100)] + array = [line + "\n" for line in textwrap.wrap(", ".join(sorted(extras_set)), 100)] + array.insert(0, "\n") + array.append("\n") + return array def get_extra_types_dict(extras: dict[str, list[str]]) -> dict[ExtraType, tuple[set[str], list[str]]]: @@ -84,7 +87,10 @@ def get_extras_from_pyproject_toml() -> dict[str, list[str]]: return pyproject_toml_content["project"]["optional-dependencies"] -FILES_TO_UPDATE = [(AIRFLOW_ROOT_PATH / "INSTALL", "txt"), (AIRFLOW_ROOT_PATH / "CONTRIBUTING.rst", "rst")] +FILES_TO_UPDATE = [ + (AIRFLOW_ROOT_PATH / "INSTALL", "txt"), + (AIRFLOW_ROOT_PATH / "contributing-docs" / "12_airflow_dependencies_and_extras.rst", "rst"), +] def process_documentation_files(): diff --git a/scripts/ci/pre_commit/pre_commit_new_session_in_provide_session.py b/scripts/ci/pre_commit/pre_commit_new_session_in_provide_session.py index 47b782fe123a7..b6ba24cccff0f 100755 --- a/scripts/ci/pre_commit/pre_commit_new_session_in_provide_session.py +++ b/scripts/ci/pre_commit/pre_commit_new_session_in_provide_session.py @@ -117,7 +117,10 @@ def main(argv: list[str]) -> int: print(f"{path}:{error.lineno}") print(f"\tdef {error.name}(...", end="\n\n") print("Only function decorated with @provide_session should use 'session: Session = NEW_SESSION'.") - print("See: https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#database-session-handling") + print( + "See: https://github.com/apache/airflow/blob/main/" + "contributing-docs/creating_issues_and_pull_requests#database-session-handling" + ) return len(errors) diff --git a/scripts/in_container/_in_container_utils.sh b/scripts/in_container/_in_container_utils.sh index 9b8ed1679b8f1..b8d11c61e2662 100644 --- a/scripts/in_container/_in_container_utils.sh +++ b/scripts/in_container/_in_container_utils.sh @@ -44,7 +44,7 @@ function assert_in_container() { echo echo "You should only run this script in the Airflow docker container as it may override your files." echo "Learn more about how we develop and test airflow in:" - echo "https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst" + echo "https://github.com/apache/airflow/blob/main/contribuging-docs/README.rst" echo exit 1 fi diff --git a/tests/system/README.md b/tests/system/README.md index c1452bfe6e5df..6e5217e95be6c 100644 --- a/tests/system/README.md +++ b/tests/system/README.md @@ -19,11 +19,14 @@ # Airflow System Tests -- [How to run system tests](#how_to_run) - - [Running via Airflow](#run_via_airflow) - - [Running via Pytest](#run_via_pytest) - - [Running via Airflow CLI](#run_via_airflow_cli) -- [How to write system tests](#how_to_write) + + + +- [How to run system tests](#how-to-run-system-tests) + - [Running via Airflow](#running-via-airflow) + - [Running via Pytest](#running-via-pytest) + + System tests verify the correctness of Airflow Operators by running them in DAGs and allowing to communicate with external services. A system test tries to look as close to a regular DAG as possible, and it generally checks the @@ -37,12 +40,7 @@ The purpose of these tests is to: - provide runnable example DAGs with use cases for different Operators, - serve both as examples and test files. -> This is the new design of system tests which temporarily exists along with the old one documented at -> [TESTING.rst](../../TESTING.rst) and soon will completely replace it. The new design is based on the -> [AIP-47](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests). -> Please use it and write any new system tests according to this documentation. - -## How to run system tests +# How to run system tests There are multiple ways of running system tests. Each system test is a self-contained DAG, so it can be run as any other DAG. Some tests may require access to external services, enabled APIs or specific permissions. Make sure to @@ -50,7 +48,7 @@ prepare your environment correctly, depending on the system tests you want to r configuration which should be documented by the relevant providers in their subdirectory `tests/system/providers//README.md`. -### Running via Airflow +## Running via Airflow If you have a working Airflow environment with a scheduler and a webserver, you can import system test files into your Airflow instance and they will be automatically triggered. If the setup of the environment is correct @@ -58,7 +56,7 @@ your Airflow instance and they will be automatically triggered. If the setup of how to set up the environment is documented in each provider's system tests directory. Make sure that all resource required by the tests are also imported. -### Running via Pytest +## Running via Pytest Running system tests with pytest is the easiest with Breeze. Thanks to it, you don't need to bother about setting up the correct environment, that is able to execute the tests. @@ -76,7 +74,7 @@ You can specify several `--system` flags if you want to execute tests for severa pytest --system google --system aws tests/system ``` -### Running via Airflow CLI +### Running via Airflow CLI It is possible to run system tests using Airflow CLI. To execute a specific system test, you need to provide `dag_id` of the test to be run, `execution_date` (preferably the one from the past) and a `-S/--subdir` option @@ -89,16 +87,3 @@ airflow dags test -S tests/system bigquery_dataset 2022-01-01 > Some additional setup may be required to use Airflow CLI. Please refer > [here](https://airflow.apache.org/docs/apache-airflow/stable/usage-cli.html) for a documentation. - - -## How to write system tests - -If you are going to implement new system tests, it is recommended to familiarize with the content of the -[AIP-47](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests). There are -many changes in comparison to the old design documented at [TESTING.rst](../../TESTING.rst), so you need to be -aware of them and be compliant with the new design. - -To make it easier to migrate old system tests or write new ones, we -documented the whole **process of migration in details** (which can be found -[here](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests#AIP47NewdesignofAirflowSystemTests-Processofmigrationindetails)) -and also prepared an example of a test (located just below the migration details).