From 10bc4b6e9aa46c54d2485aca523b62ebcc3a0ebc Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Tue, 8 Dec 2015 20:08:46 -0200 Subject: [PATCH 1/5] First version of architecture overview doc --- OVERVIEW.md | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 OVERVIEW.md diff --git a/OVERVIEW.md b/OVERVIEW.md new file mode 100644 index 00000000..552fb332 --- /dev/null +++ b/OVERVIEW.md @@ -0,0 +1,66 @@ +# Overview # + +Here it is described a brief overview of xdist's internal architecture. + + +`xdist` works by spawning one or more **worker nodes**, which are controlled +by the **master node**. Each **worker node** is responsible for performing +a full test collection and afterwards running tests as dictated by the **master node**. + +The execution flow is: + +1. **master node** spawns one or more **worker nodes** at the begginning of + the test session. The communication between **master** and **worker** nodes makes use of + [execnet](http://codespeak.net/execnet/) and its [gateways](http://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters). + The actual interpreters executing the code for the **worker nodes** might + be remote or local. + +1. Each **worker node** itself is a mini pytest runner. **workers** at this + point perform a full test collection, sending back the collected + test-ids back to the **master node** which does not + perform any collection itself. + +1. The **master node** receives the result of the collection from all nodes. + At this point the **master node** performs some sanity check to ensure that + all **worker nodes** collected the same tests (including order), bailing out otherwise. + If all is well, it converts the list of test-ids into a list of simple + indexes, where each index corresponds to the position of that test in the + original collection list. This works because all nodes have the same + collection list, and saves bandwidth because the **master** can now tell + one of the workers to just *execute test index 3* index of passing the + full test id. + +1. If **dist-mode** is **each**: the **master node** just sends the full list + of test indexes to each node at this moment. + +1. If **dist-mode** is **load**: the **master node** takes around 25% of the + tests and sends them one by one to each **worker node** in a round robin + fashion. The rest of the tests will be distributed later as **worker nodes** + finish tests (see below). + +1. **worker nodes** re-implement `pytest_runtestloop`: pytest's default implementation + basically loops over all collected items in the `session` object and executes + the `pytest_runtest_protocol` for each test item, but in xdist **workers** sit idly + waiting for **master node** to send tests for execution. As tests are + received by **workers**, `pytest_runtest_protocol` is executed for each test. + Here it worth noting an implementation detail: at least one + test is kept always in **worker nodes** must they comply with + `pytest_runtest_protocol` in that it needs to know which will be the + `nextitem` in the hook call: either a new test in case the **master node** sends + a new test, or `None` if the **worker** receives a "shutdown" request. + +1. As tests are started and completed at the **workers**, the results are sent + back to the **master node**, which then just forwards the results to + the appropriate pytest hooks: `pytest_runtest_logstart` and + `pytest_runtest_logreport`. This way other plugins (for example `junitxml`) + can work normally. The **master node** (when in dist-mode **load**) + decides to send more tests to a node when a test completes, using + some heuristics such as test durations and how many tests each **worker node** + still has to run. + +1. When the **master node** has no more pending tests it will + send a "shutdown" signal to all **workers**, which will then run their + remaining tests to completion and shut down. At this point the + **master node** will sit waiting for **workers** to shut down, still + processing events such as `pytest_runtest_logreport`. + From b29ff737811a93f62b3d1e3f7bdf0af5818d47b5 Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Wed, 9 Dec 2015 19:00:20 -0200 Subject: [PATCH 2/5] Apply small review requests * Fixed typo * Removed superfluous introduction --- OVERVIEW.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/OVERVIEW.md b/OVERVIEW.md index 552fb332..25626959 100644 --- a/OVERVIEW.md +++ b/OVERVIEW.md @@ -1,15 +1,12 @@ # Overview # -Here it is described a brief overview of xdist's internal architecture. - - `xdist` works by spawning one or more **worker nodes**, which are controlled by the **master node**. Each **worker node** is responsible for performing a full test collection and afterwards running tests as dictated by the **master node**. The execution flow is: -1. **master node** spawns one or more **worker nodes** at the begginning of +1. **master node** spawns one or more **worker nodes** at the beginning of the test session. The communication between **master** and **worker** nodes makes use of [execnet](http://codespeak.net/execnet/) and its [gateways](http://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters). The actual interpreters executing the code for the **worker nodes** might From c62effdd793133afa5b12109677701ae60f7928f Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Thu, 10 Dec 2015 19:55:31 -0200 Subject: [PATCH 3/5] Reword reason why workers must keep a single test on queue always --- OVERVIEW.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/OVERVIEW.md b/OVERVIEW.md index 25626959..62cb4d1c 100644 --- a/OVERVIEW.md +++ b/OVERVIEW.md @@ -40,11 +40,12 @@ The execution flow is: the `pytest_runtest_protocol` for each test item, but in xdist **workers** sit idly waiting for **master node** to send tests for execution. As tests are received by **workers**, `pytest_runtest_protocol` is executed for each test. - Here it worth noting an implementation detail: at least one - test is kept always in **worker nodes** must they comply with - `pytest_runtest_protocol` in that it needs to know which will be the - `nextitem` in the hook call: either a new test in case the **master node** sends - a new test, or `None` if the **worker** receives a "shutdown" request. + Here it worth noting an implementation detail: **workers** always must keep at + least one test item on their queue due to how the `pytest_runtest_protocol(item, nextitem)` + hook is defined: in order to pass the `nextitem` to the hook, the worker must wait for more + instructions from master before executing that remaining test. If it receives more tests, + then it can safely call `pytest_runtest_protocol` because it knows what the `nextitem` parameter will be. + If it receives a "shutdown" signal, then it can execute the hook passing `nextitem` as `None`. 1. As tests are started and completed at the **workers**, the results are sent back to the **master node**, which then just forwards the results to From 1716767a1c49dc12e1c42cfdc49ecf5d04aca3b0 Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Thu, 10 Dec 2015 20:00:43 -0200 Subject: [PATCH 4/5] Drop "node" from "workers" and "master" --- OVERVIEW.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/OVERVIEW.md b/OVERVIEW.md index 62cb4d1c..15fc1f68 100644 --- a/OVERVIEW.md +++ b/OVERVIEW.md @@ -1,25 +1,25 @@ # Overview # -`xdist` works by spawning one or more **worker nodes**, which are controlled -by the **master node**. Each **worker node** is responsible for performing -a full test collection and afterwards running tests as dictated by the **master node**. +`xdist` works by spawning one or more **workers**, which are controlled +by the **master**. Each **worker** is responsible for performing +a full test collection and afterwards running tests as dictated by the **master**. The execution flow is: -1. **master node** spawns one or more **worker nodes** at the beginning of +1. **master** spawns one or more **workers** at the beginning of the test session. The communication between **master** and **worker** nodes makes use of [execnet](http://codespeak.net/execnet/) and its [gateways](http://codespeak.net/execnet/basics.html#gateways-bootstrapping-python-interpreters). - The actual interpreters executing the code for the **worker nodes** might + The actual interpreters executing the code for the **workers** might be remote or local. -1. Each **worker node** itself is a mini pytest runner. **workers** at this +1. Each **worker** itself is a mini pytest runner. **workers** at this point perform a full test collection, sending back the collected - test-ids back to the **master node** which does not + test-ids back to the **master** which does not perform any collection itself. -1. The **master node** receives the result of the collection from all nodes. - At this point the **master node** performs some sanity check to ensure that - all **worker nodes** collected the same tests (including order), bailing out otherwise. +1. The **master** receives the result of the collection from all nodes. + At this point the **master** performs some sanity check to ensure that + all **workers** collected the same tests (including order), bailing out otherwise. If all is well, it converts the list of test-ids into a list of simple indexes, where each index corresponds to the position of that test in the original collection list. This works because all nodes have the same @@ -27,18 +27,18 @@ The execution flow is: one of the workers to just *execute test index 3* index of passing the full test id. -1. If **dist-mode** is **each**: the **master node** just sends the full list +1. If **dist-mode** is **each**: the **master** just sends the full list of test indexes to each node at this moment. -1. If **dist-mode** is **load**: the **master node** takes around 25% of the - tests and sends them one by one to each **worker node** in a round robin - fashion. The rest of the tests will be distributed later as **worker nodes** +1. If **dist-mode** is **load**: the **master** takes around 25% of the + tests and sends them one by one to each **worker** in a round robin + fashion. The rest of the tests will be distributed later as **workers** finish tests (see below). -1. **worker nodes** re-implement `pytest_runtestloop`: pytest's default implementation +1. **workers** re-implement `pytest_runtestloop`: pytest's default implementation basically loops over all collected items in the `session` object and executes the `pytest_runtest_protocol` for each test item, but in xdist **workers** sit idly - waiting for **master node** to send tests for execution. As tests are + waiting for **master** to send tests for execution. As tests are received by **workers**, `pytest_runtest_protocol` is executed for each test. Here it worth noting an implementation detail: **workers** always must keep at least one test item on their queue due to how the `pytest_runtest_protocol(item, nextitem)` @@ -48,17 +48,17 @@ The execution flow is: If it receives a "shutdown" signal, then it can execute the hook passing `nextitem` as `None`. 1. As tests are started and completed at the **workers**, the results are sent - back to the **master node**, which then just forwards the results to + back to the **master**, which then just forwards the results to the appropriate pytest hooks: `pytest_runtest_logstart` and `pytest_runtest_logreport`. This way other plugins (for example `junitxml`) - can work normally. The **master node** (when in dist-mode **load**) + can work normally. The **master** (when in dist-mode **load**) decides to send more tests to a node when a test completes, using - some heuristics such as test durations and how many tests each **worker node** + some heuristics such as test durations and how many tests each **worker** still has to run. -1. When the **master node** has no more pending tests it will +1. When the **master** has no more pending tests it will send a "shutdown" signal to all **workers**, which will then run their remaining tests to completion and shut down. At this point the - **master node** will sit waiting for **workers** to shut down, still + **master** will sit waiting for **workers** to shut down, still processing events such as `pytest_runtest_logreport`. From 0a5bdfcda52b04cd4a8fa9cbaa3f685b7832124d Mon Sep 17 00:00:00 2001 From: Bruno Oliveira Date: Thu, 10 Dec 2015 20:01:06 -0200 Subject: [PATCH 5/5] Add a FAQ section --- OVERVIEW.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/OVERVIEW.md b/OVERVIEW.md index 15fc1f68..25bb76f4 100644 --- a/OVERVIEW.md +++ b/OVERVIEW.md @@ -62,3 +62,15 @@ The execution flow is: **master** will sit waiting for **workers** to shut down, still processing events such as `pytest_runtest_logreport`. +## FAQ ## + +> Why does each worker do its own collection, as opposed to having +the master collect once and distribute from that collection to the workers? + +If collection was performed by master then it would have to +serialize collected items to send them through the wire, as workers live in another process. +The problem is that test items are not easily (impossible?) to serialize, as they contain references to +the test functions, fixture managers, config objects, etc. Even if one manages to serialize it, +it seems it would be very hard to get it right and easy to break by any small change in pytest. + +