update docs + readme

RichardZhu2 · RichardZhu2 · commit 504e5deab167 · 2024-11-29T03:00:52.000Z
diff --git a/README.md b/README.md
@@ -51,11 +51,12 @@ Let's simulate a pipeline that performs a series of transformations on some data
 ```python
 import asyncio
 import time
+import typing
 
 from pyper import task
 
 
-def step1(limit):
+def step1(limit: int):
     """Generate some data."""
     for i in range(limit):
         yield i
@@ -75,7 +76,7 @@ def step3(data: int):
     return 2 * data - 1
 
 
-async def print_sum(data):
+async def print_sum(data: typing.AsyncGenerator[int]):
     """Print the sum of values from a data stream."""
     total = 0
     async for output in data:
@@ -117,7 +118,7 @@ Having defined the logical operations we want to perform on our data as function
 ```python
 # Analogous to:
 # pipeline = task(step1) | task(step2) | task(step3)
-async def pipeline(limit):
+async def pipeline(limit: int):
     for data in step1(limit):
         data = await step2(data)
         data = step3(data)
@@ -126,7 +127,7 @@ async def pipeline(limit):
 
 # Analogous to:
 # run = pipeline > print_sum
-async def run(limit):
+async def run(limit: int):
     await print_sum(pipeline(limit))
 
 
@@ -152,7 +153,7 @@ Concurrent programming in Python is notoriously difficult to get right. In a con
 The basic approach to doing this is by using queues-- a simplified and very unabstracted implementation could be:
 
 ```python
-async def pipeline(limit):
+async def pipeline(limit: int):
     q1 = asyncio.Queue()
     q2 = asyncio.Queue()
     q3 = asyncio.Queue()
@@ -210,7 +211,7 @@ async def pipeline(limit):
             yield data
 
 
-async def run(limit):
+async def run(limit: int):
     await print_sum(pipeline(limit))
 
 
@@ -233,11 +234,12 @@ No-- not every program is asynchronous, so Pyper pipelines are by default synchr
 
 ```python
 import time
+import typing
 
 from pyper import task
 
 
-def step1(limit):
+def step1(limit: int):
     for i in range(limit):
         yield i
 
@@ -252,7 +254,7 @@ def step3(data: int):
     return 2 * data - 1
 
 
-def print_sum(data):
+def print_sum(data: typing.Generator[int]):
     total = 0
     for output in data:
         total += output
diff --git a/docs/src/docs/UserGuide/CombiningPipelines.md b/docs/src/docs/UserGuide/CombiningPipelines.md
@@ -12,7 +12,6 @@ permalink: /docs/UserGuide/CombiningPipelines
 1. TOC
 {:toc}
 
-
 ## Piping and the `|` Operator 
 
 The `|` operator (inspired by UNIX syntax) is used to pipe one pipeline into another. This is syntactic sugar for the `Pipeline.pipe` method.
@@ -46,7 +45,6 @@ new_new_pipeline = p0 | new_pipeline | p4
 new_new_new_pipeline = new_pipeline | new_new_pipeline
 ```
 
-
 ## Consumer Functions and the `>` Operator
 
 It is often useful to define resuable functions that process the results of a pipeline, which we'll call a 'consumer'. For example:
@@ -89,11 +87,8 @@ run = step1.pipe(step2).consume(JsonFileWriter("data.json"))
 run(limit=10)
 ```
 
-
-The operator `>` is obviously not to be taken to mean 'greater than' when used in these contexts.
-
 {: .info}
-Pyper comes with fantastic IDE intellisense support which understands these operators, and will always show you what the resulting type of a variable is (including the input and output type specs for pipelines)
+Pyper comes with fantastic IDE intellisense support which understands these operators, and will always show you which variables are `Pipeline` or `AsyncPipeline` objects; this also preserves type hints from your own functions, showing you the parameter and return type specs for each pipeline or consumer
 
 ## Asynchronous Code
 
@@ -111,10 +106,10 @@ assert isinstance(task(func), AsyncPipeline)
 
 When combining pipelines, the following rule applies:
 
-* `Pipeline` > `Pipeline` = `Pipeline`
-* `Pipeline` > `AsyncPipeline` = `AsyncPipeline`
-* `AsyncPipeline` > `Pipeline` = `AsyncPipeline`
-* `AsyncPipeline` > `AsyncPipeline` = `AsyncPipeline`
+* `Pipeline` + `Pipeline` = `Pipeline`
+* `Pipeline` + `AsyncPipeline` = `AsyncPipeline`
+* `AsyncPipeline` + `Pipeline` = `AsyncPipeline`
+* `AsyncPipeline` + `AsyncPipeline` = `AsyncPipeline`
 
 In other words:
 
diff --git a/docs/src/docs/UserGuide/Considerations.md b/docs/src/docs/UserGuide/Considerations.md
@@ -74,12 +74,13 @@ The advantage of using `daemon` threads is that they do not prevent the main pro
 Therefore, there is a simple consideration that determines whether to set `daemon=True` on a particular task:
 
 {: .info}
-Tasks can be created with `daemon=True` when they do NOT reach out to external resources.
+Tasks can be created with `daemon=True` when they do NOT reach out to external resources
 
-This includes:
- * Pure functions, which simply take an input and generate an output
- * Functions that depend on or modify some external Python state, like an `Object` or a `Class`
+This includes all **pure functions** (functions which simply take an input and generate an output, without mutating external state).
 
 Functions that should _not_ use `daemon` threads include:
 * Writing to a database
-* Reading from a file
+* Processing a file
+* Making a network request
+
+Recall that only synchronous tasks can be created with `daemon=True`.
diff --git a/docs/src/docs/UserGuide/CreatingPipelines.md b/docs/src/docs/UserGuide/CreatingPipelines.md
@@ -45,7 +45,7 @@ In addition to functions, anything `callable` in Python can be wrapped in `task`
 from pyper import task
 
 class Doubler:
-    def __call__(self, x):
+    def __call__(self, x: int):
         return 2 * x
 
 pipeline1 = task(Doubler())
diff --git a/docs/src/docs/UserGuide/TaskParameters.md b/docs/src/docs/UserGuide/TaskParameters.md
@@ -12,7 +12,6 @@ permalink: /docs/UserGuide/TaskParameters
 1. TOC
 {:toc}
 
-
 > For convenience, we will use the following terminology on this page:
 > * **Producer**: The _first_ task within a pipeline
 > * **Producer-consumer**: Any task after the first task within a pipeline