Skip to content

Commit 94adb09

Browse files
Revisiting nxp algorithms (#63)
* added chunking and get_chunks to all algorithms * added smoke tests for get_chunks for all functions * enh funcs in tournament.py * updated test_get_chunks * making docs more aligned with sphinx guidelines * renamed n updated Dispatcher class, updated test.yml * minor maintenance things * rm try-except
1 parent 09014a7 commit 94adb09

26 files changed

+365
-189
lines changed

.github/workflows/test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ jobs:
2828
python-version: ${{ matrix.python-version }}
2929
- name: Install dependencies
3030
run: |
31-
python -m pip install scipy pandas pytest-cov pytest-randomly
32-
# matplotlib lxml pygraphviz pydot sympy # Extra networkx deps we don't need yet
31+
python -m pip install scipy numpy pytest-randomly
32+
# pandas pytest-cov matplotlib lxml pygraphviz pydot sympy # Extra networkx deps we don't need yet
3333
python -m pip install git+https://github.com/networkx/networkx.git@main
3434
python -m pip install .
3535
echo "Done with installing"

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ venv/
109109
ENV/
110110
env.bak/
111111
venv.bak/
112+
nxp-dev/
112113

113114
# Spyder project settings
114115
.spyderproject
@@ -131,3 +132,6 @@ dmypy.json
131132
# asv
132133
results/
133134
html/
135+
136+
# get_info update script
137+
temp__init__.py

CONTRIBUTING.md

Lines changed: 19 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -82,52 +82,49 @@ To add any additional tests, **specific to nx_parallel**, you can follow the way
8282

8383
## Documentation syntax
8484

85-
For displaying a small note about nx-parallel's implementation at the end of the main NetworkX documentation, we use the `backend_info` [entry_point](https://packaging.python.org/en/latest/specifications/entry-points/#entry-points) (in the `pyproject.toml` file). The [`get_info` function](https://github.com/networkx/nx-parallel/blob/main/_nx_parallel/__init__.py) is used to parse the docstrings of all the algorithms in nx-parallel and display the nx-parallel specific documentation on the NetworkX's main docs, in the "Additional Backend implementations" box, as shown in the screenshot below.
85+
For displaying a small note about nx-parallel's implementation at the end of the main NetworkX documentation, we use the `backend_info` [entry_point](https://packaging.python.org/en/latest/specifications/entry-points/#entry-points) (in the `pyproject.toml` file). The [`get_info` function](./_nx_parallel/__init__.py) is used to parse the docstrings of all the algorithms in nx-parallel and display the nx-parallel specific documentation on the NetworkX's main docs, in the "Additional Backend implementations" box, as shown in the screenshot below.
8686

87-
![backend_box_ss](https://github.com/networkx/nx-parallel/blob/main/assets/images/backend_box_ss.png)
87+
![backend_box_ss](./assets/images/backend_box_ss.png)
8888

89-
Here is how the docstring should be formatted in nx-parallel:
89+
nx-parallel follows [sphinx docstring guidelines](https://the-ultimate-sphinx-tutorial.readthedocs.io/en/latest/_guide/_styleguides/docstrings-guidelines.html) for writing docstrings. But, while extracting the docstring to display on the main networkx docs, only the first paragraph of the function's description and the first paragraph of each parameter's description is extracted and displayed. So, make sure to include all the necessary information in the first paragraphs itself. And you only need to include the additional **backend** parameters in the `Parameters` section and not all the parameters. Also, it is recommended to include a link to the networkx function's documentation page in the docstring, at the end of the function description.
90+
91+
Here is an example of how the docstrings should be formatted in nx-parallel:
9092

9193
```.py
92-
def betweenness_centrality(
93-
G, k=None, normalized=True, weight=None, endpoints=False, seed=None, get_chunks="chunks"
94-
):
95-
"""[FIRST PARA DISPLAYED ON MAIN NETWORKX DOCS AS FUNC DESC]
96-
The parallel computation is implemented by dividing the
97-
nodes into chunks and computing betweenness centrality for each chunk concurrently.
94+
def parallel_func(G, nx_arg, additional_backend_arg_1, additional_backend_arg_2=None):
95+
"""The parallel computation is implemented by dividing the
96+
nodes into chunks and ..... [ONLY THIS PARAGRAPH WILL BE DISPLAYED ON THE MAIN NETWORKX DOCS]
97+
98+
Some more additional information about the function.
99+
100+
networkx.func : <link to the function's networkx docs page>
98101
99102
Parameters
100-
------------ [EVERYTHING BELOW THIS LINE AND BEFORE THE NETWORKX LINK WILL BE DISPLAYED IN ADDITIONAL PARAMETER'S SECTION ON NETWORKX MAIN DOCS]
101-
get_chunks : function (default = "chunks")
102-
A function that takes in nodes as input and returns node_chunks...[YOU CAN MULTIPLE PARAGRAPHS FOR EACH PARAMETER, IF NEEDED, SEPARATED BY 1 BLANK LINE]
103+
----------
104+
additional_backend_arg_1 : int or float
105+
[YOU CAN HAVE MULTIPLE PARAGRAPHS BUT ONLY THE FIRST PARAGRAPH WILL BE DISPLAYED ON THE MAIN NETWORKX DOCS]
103106
104-
[LEAVE 2 BLANK LINES BETWEEN EACH PARAMETER]
105-
parameter 2 : int
107+
additional_backend_arg_2 : None or str (default=None)
106108
....
107-
.
108-
.
109-
.
110-
[LEAVE 1 BLANK LINE BETWEEN THE PARAMETERS SECTION AND THE LINK]
111-
networkx.betweenness_centrality : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
112109
"""
113110
```
114111

115112
## Chunking
116113

117114
In parallel computing, "chunking" refers to dividing a large task into smaller, more manageable chunks that can be processed simultaneously by multiple computing units, such as CPU cores or distributed computing nodes. It's like breaking down a big task into smaller pieces so that multiple workers can work on different pieces at the same time, and in the case of nx-parallel, this usually speeds up the overall process.
118115

119-
The default chunking in nx-parallel is done by first determining the number of available CPU cores and then allocating the nodes (or edges or any other iterator) per chunk by dividing the total number of nodes by the total CPU cores available. (ref. [chunk.py](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/utils/chunk.py)). This default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)). Also, when [the `config` PR](https://github.com/networkx/networkx/pull/7225) is merged in networkx, and the `config` will be added to nx-parallel, then the user would be able to control the number of CPU cores they would want to use and then the chunking would be done accordingly.
116+
The default chunking in nx-parallel is done by first determining the number of available CPU cores and then allocating the nodes (or edges or any other iterator) per chunk by dividing the total number of nodes by the total CPU cores available. (ref. [chunk.py](./nx_parallel/utils/chunk.py)). This default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)). Also, when [the `config` PR](https://github.com/networkx/networkx/pull/7225) is merged in networkx, and the `config` will be added to nx-parallel, then the user would be able to control the number of CPU cores they would want to use and then the chunking would be done accordingly.
120117

121118
## General guidelines on adding a new algorithm
122119

123120
- To get started with adding a new algorithm, you can refer to the existing implementations in nx-parallel and also refer to the [joblib's documentation on embarrassingly parallel `for` loops](https://joblib.readthedocs.io/en/latest/parallel.html).
124121
- The algorithm that you are considering to add to nx-parallel should be in the main networkx repository and it should have the `_dispatchable` decorator. If not, you can consider adding a sequential implementation in networkx first.
125122
- check-list for adding a new function:
126123
- [ ] Add the parallel implementation(make sure API doesn't break), the file structure should be the same as that in networkx.
127-
- [ ] add the function to the `Dispatcher` class in [interface.py](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/interface.py) (take care of the `name` parameter in `_dispatchable` (ref. [docs](https://networkx.org/documentation/latest/reference/backends.html)))
124+
- [ ] add the function to the `BackendInterface` class in [interface.py](./nx_parallel/interface.py) (take care of the `name` parameter in `_dispatchable` (ref. [docs](https://networkx.org/documentation/latest/reference/backends.html)))
128125
- [ ] update the `__init__.py` files accordingly
129126
- [ ] docstring following the above format
130-
- [ ] run the [timing script](https://github.com/networkx/nx-parallel/blob/main/timing/timing_individual_function.py) to get the performance heatmap
127+
- [ ] run the [timing script](./timing/timing_individual_function.py) to get the performance heatmap
131128
- [ ] add additional test(if any)
132129
- [ ] add benchmark(s) for the new function(ref. the README in benchmarks folder for more details)
133130

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ nxp.betweenness_centrality(H)
123123

124124
2. Right now there isn't much difference between `nx.Graph` and `nxp.ParallelGraph` so `method 3` would work fine but it is not recommended because in the future that might not be the case.
125125

126-
Feel free to contribute to nx-parallel. You can find the contributing guidelines [here](https://github.com/networkx/nx-parallel/blob/main/CONTRIBUTING.md). If you'd like to implement a feature or fix a bug, we'd be happy to review a pull request. Please make sure to explain the changes you made in the pull request description. And feel free to open issues for any problems you face, or for new features you'd like to see implemented.
126+
Feel free to contribute to nx-parallel. You can find the contributing guidelines [here](./CONTRIBUTING.md). If you'd like to implement a feature or fix a bug, we'd be happy to review a pull request. Please make sure to explain the changes you made in the pull request description. And feel free to open issues for any problems you face, or for new features you'd like to see implemented.
127127

128128
This project is managed under the NetworkX organisation, so the [code of conduct of NetworkX](https://github.com/networkx/networkx/blob/main/CODE_OF_CONDUCT.rst) applies here as well.
129129

_nx_parallel/__init__.py

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ def get_info():
1313
"number_of_isolates": {
1414
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L8",
1515
"additional_docs": "The parallel computation is implemented by dividing the list of isolated nodes into chunks and then finding the length of each chunk in parallel and then adding all the lengths at the end.",
16-
"additional_parameters": None,
16+
"additional_parameters": {
17+
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the isolated nodes as input and returns an iterable `isolate_chunks`. The default chunking is done by slicing the `isolates` into `n` chunks, where `n` is the total number of CPU cores available."
18+
},
1719
},
1820
"square_clustering": {
1921
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#L10",
@@ -25,22 +27,30 @@ def get_info():
2527
"local_efficiency": {
2628
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L9",
2729
"additional_docs": "The parallel computation is implemented by dividing the nodes into chunks and then computing and adding global efficiencies of all node in all chunks, in parallel, and then adding all these sums and dividing by the total number of nodes at the end.",
28-
"additional_parameters": None,
30+
"additional_parameters": {
31+
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores available."
32+
},
2933
},
3034
"closeness_vitality": {
3135
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L9",
3236
"additional_docs": "The parallel computation is implemented only when the node is not specified. The closeness vitality for each node is computed concurrently.",
33-
"additional_parameters": None,
37+
"additional_parameters": {
38+
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores."
39+
},
3440
},
3541
"is_reachable": {
36-
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L10",
42+
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L12",
3743
"additional_docs": "The function parallelizes the calculation of two neighborhoods of vertices in `G` and checks closure conditions for each neighborhood subset in parallel.",
38-
"additional_parameters": None,
44+
"additional_parameters": {
45+
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores available."
46+
},
3947
},
4048
"tournament_is_strongly_connected": {
41-
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L54",
49+
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L59",
4250
"additional_docs": "The parallel computation is implemented by dividing the nodes into chunks and then checking whether each node is reachable from each other node in parallel.",
43-
"additional_parameters": None,
51+
"additional_parameters": {
52+
'get_chunks : str, function (default = "chunks")': "A function that takes in a list of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `nodes` into `n` chunks, where `n` is the total number of CPU cores available."
53+
},
4454
},
4555
"all_pairs_node_connectivity": {
4656
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/connectivity/connectivity.py#L17",
@@ -127,7 +137,7 @@ def get_info():
127137
},
128138
},
129139
"all_pairs_shortest_path": {
130-
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L62",
140+
"url": "https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/unweighted.py#L63",
131141
"additional_docs": "The parallel implementation first divides the nodes into chunks and then creates a generator to lazily compute shortest paths for each `node_chunk`, and then employs joblib's `Parallel` function to execute these computations in parallel across all available CPU cores.",
132142
"additional_parameters": {
133143
'get_chunks : str, function (default = "chunks")': "A function that takes in an iterable of all the nodes as input and returns an iterable `node_chunks`. The default chunking is done by slicing the `G.nodes` into `n` chunks, where `n` is the number of CPU cores."

_nx_parallel/update_get_info.py

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
import os
22
import ast
33

4-
__all__ = ["get_funcs_info", "extract_docstrings_from_file", "extract_from_docs"]
4+
__all__ = [
5+
"get_funcs_info",
6+
"extract_docstrings_from_file",
7+
"extract_add_docs",
8+
"extract_add_params",
9+
"get_url",
10+
]
511

612
# Helper functions for get_info
713

@@ -21,11 +27,10 @@ def get_funcs_info():
2127
path = os.path.join(root, file)
2228
d = extract_docstrings_from_file(path)
2329
for func in d:
24-
par_docs, par_params = extract_from_docs(d[func])
2530
funcs[func] = {
2631
"url": get_url(path, func),
27-
"additional_docs": par_docs,
28-
"additional_parameters": par_params,
32+
"additional_docs": extract_add_docs(d[func]),
33+
"additional_parameters": extract_add_params(d[func]),
2934
}
3035
return funcs
3136

@@ -60,8 +65,8 @@ def extract_docstrings_from_file(file_path):
6065
return docstrings
6166

6267

63-
def extract_from_docs(docstring):
64-
"""Extract the parallel documentation and parallel parameter description from the given doctring."""
68+
def extract_add_docs(docstring):
69+
"""Extract the parallel documentation description from the given doctring."""
6570
try:
6671
# Extracting Parallel Computation description
6772
# Assuming that the first para in docstring is the function's PC desc
@@ -76,30 +81,38 @@ def extract_from_docs(docstring):
7681
except Exception as e:
7782
print(e)
7883
par_docs = None
84+
return par_docs
7985

86+
87+
def extract_add_params(docstring):
88+
"""Extract the parallel parameter description from the given docstring."""
8089
try:
8190
# Extracting extra parameters
8291
# Assuming that the last para in docstring is the function's extra params
8392
par_params = {}
84-
par_params_ = docstring.split("------------\n")[1]
85-
86-
par_params_ = par_params_.split("\n\n\n")
87-
for i in par_params_:
88-
j = i.split("\n")
89-
par_params[j[0]] = "\n".join(
90-
[line.strip() for line in j[1:] if line.strip()]
91-
)
92-
if i == par_params_[-1]:
93-
par_params[j[0]] = " ".join(
94-
[line.strip() for line in j[1:-1] if line.strip()]
95-
)
96-
par_docs = par_docs.replace("\n", " ")
93+
par_params_ = docstring.split("----------\n")[1]
94+
par_params_ = par_params_.split("\n")
95+
96+
i = 0
97+
while i < len(par_params_):
98+
line = par_params_[i]
99+
if " : " in line:
100+
key = line.strip()
101+
n = par_params_.index(key) + 1
102+
par_desc = ""
103+
while n < len(par_params_) and par_params_[n] != "":
104+
par_desc += par_params_[n].strip() + " "
105+
n += 1
106+
par_params[key] = par_desc.strip()
107+
i = n + 1
108+
else:
109+
i += 1
97110
except IndexError:
98111
par_params = None
99112
except Exception as e:
100113
print(e)
101114
par_params = None
102-
return par_docs, par_params
115+
return par_params
103116

104117

105118
def get_url(file_path, function_name):

assets/images/backend_box_ss.png

-437 KB
Loading

nx_parallel/algorithms/approximation/connectivity.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,16 +24,16 @@ def approximate_all_pairs_node_connectivity(
2424
will run the parallel implementation of `all_pairs_node_connectivity` present in the
2525
`connectivity/connectivity`. Use `nxp.approximate_all_pairs_node_connectivity` instead.
2626
27+
networkx.all_pairs_node_connectivity : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.connectivity.all_pairs_node_connectivity.html
28+
2729
Parameters
28-
------------
30+
----------
2931
get_chunks : str, function (default = "chunks")
3032
A function that takes in `list(iter_func(nbunch, 2))` as input and returns
3133
an iterable `pairs_chunks`, here `iter_func` is `permutations` in case of
3234
directed graphs and `combinations` in case of undirected graphs. The default
3335
is to create chunks by slicing the list into `n` chunks, where `n` is the
3436
number of CPU cores, such that size of each chunk is atmost 10, and at least 1.
35-
36-
networkx.all_pairs_node_connectivity : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.connectivity.all_pairs_node_connectivity.html
3737
"""
3838

3939
if hasattr(G, "graph_object"):

0 commit comments

Comments
 (0)