-
Notifications
You must be signed in to change notification settings - Fork 119
Support Clusters for Internal Structure? #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It also looks like #39 may be similar as well - sorry for any duplication or noise! |
That does make sense and it might not be the same as #39, which is about limiting the graph based by cutting the depth of the directory tree , while this seems (iiuc) to be more about ways to globally simplify a ginormous graph. Both seem useful, #39 is probably easier to implement... When you say "piles of packages", are those the kind of packages that have setup.py files (and are installed in develop mode)? We have some internal tools which look at inter-package dependencies that are a bit rough but should probably become part of pydeps (we have 130+ packages and probably 300 Klocs -- it's been a little while since I counted). |
No - I'm using that as a concept to describe subdirectories of modules within the codebase that are imported by other modules - currently does lead to a ginormous graph. 😄 If I'm misusing the term, happy to adapt. |
No, you're using it 100% correctly. Pydeps started as a tool for helping us move from a repo/package containing all our code, to many smaller repos/packages (it's an improvement, but brings its own set of challenges). I don't think there is any easy solution to representing relationships in a 300+ Kloc code base in 2d., but manually reducing the graph can still give you useful insights... I usually follow a workflow that looks something like this (using django-cms - ~50KLOC) as an example: then I look at the top and the bottom of the graph and remove nodes with high in/out degrees (since they're significantly constraining the graph display algorithm that dot uses), e.g. a first pass gives: ..and repeating until I come to a graph that's readable and interesting: We were having increasing issues with developers unintentionally introducing circular imports, so we look for arrows going the "wrong" way (i.e. up). During the extract-to-individual projects phase we'd use nodes that clustered together in the graph as an indication that they should probably move as a unit. |
Just as a data point... If we take the first django-cms graph and replace all nodes with their prefix (i.e. replace all The nodes here would be similar to package clusters (without the internal arrows). The code I used (this might be what a solution to #39 would look like):
|
That seems like it might be a step in the right direction - although controlling the depth might be useful. |
This would be really useful for me. In order to prevent circular imports, and to keep a logical program flow, it's nice to ensure that internal packages import each other as a DAG. This is despite only requiring module imports to form a DAG. So, I'd like a way to collect modules into their respective packages and to make a graph of their dependencies. @thebjorn Your last graph looks really nice and close to what I'd like to have, I think. |
Here's my modified script to support graphing packages rather than modules. #!/usr/bin/env python
import subprocess
from pathlib import Path
from typing import Dict, List, Mapping, Tuple
import typer
def find_modules(path: Path,
path_components: List[str]) -> Mapping[str, List[str]]:
modules: Dict[str, List[str]] = {}
ruined_module = '_'.join(path_components)
if path.is_dir():
modules[ruined_module] = (path_components[1:]
if len(path_components) > 1
else path_components)
for sub_path in path.iterdir():
modules.update(find_modules(sub_path, path_components + [sub_path.stem]))
elif path.is_file() and path.suffix == '.py':
if path.stem != '__init__':
modules[ruined_module] = (path_components[1:-1]
if len(path_components) > 2
else path_components[:-1])
return modules
def shorten(name: str,
modules: Mapping[str, List[str]]) -> str:
retval = '•'.join(modules[name.strip()])
if retval in ['graph', 'edge']:
return f'{retval}_'
return retval
def attrs(fmt: str) -> Dict[str, str]:
return dict(kv.split('=') for kv in fmt.strip()[:-2].split(',')) # type: ignore
def attrs2fmt(attr_map: Mapping[str, str]) -> str:
return '[%s];' % ','.join('%s=%s' % (k, v) for k, v in attr_map.items())
def main(base_name: str) -> None:
modules = find_modules(Path(base_name), [base_name])
cp = subprocess.run(['pydeps', base_name, '--max-bacon=1', '--show-dot', '--no-output'],
stdout=subprocess.PIPE, text=True, check=True)
lines = cp.stdout.splitlines()
header = [line
for line in lines[:6]
if 'concentrate' not in line and line != '']
body = lines[6:-3]
nodes = [line for line in body if '->' not in line if line]
node_mapping: Dict[str, Dict[str, str]] = {}
for node in nodes:
name, fmt = node.split('[')
sname = shorten(name, modules)
if sname in node_mapping:
continue
node_mapping[sname] = attrs(fmt)
rules = [line for line in body if '->' in line]
rule_mapping: Dict[Tuple[str, str], Dict[str, str]] = {}
used_nodes = set()
for rule in rules:
arrow, fmt = rule.split('[')
a, _, b = arrow.split()
a = shorten(a, modules)
b = shorten(b, modules)
if (a, b) in rule_mapping:
continue
if a == b:
continue
if b == base_name:
continue
rule_mapping[(a, b)] = attrs(fmt)
used_nodes.add(a)
used_nodes.add(b)
with open(f'uml/{base_name}.dot', 'w') as fp:
fp.write('\n'.join(header))
for n in used_nodes:
some_dict: Dict[str, str] = node_mapping[n]
some_dict['label'] = '"%s"' % n
print(' {} {}'.format(n, attrs2fmt(some_dict)), file=fp)
for (a, b), fmt_dict in rule_mapping.items():
print(' {} -> {} {}'.format(a, b, attrs2fmt(fmt_dict)), file=fp)
print('}', file=fp)
subprocess.run(['dot', '-Tpng', f'uml/{base_name}.dot', '-o', f'uml/{base_name}.png'])
if __name__ == "__main__":
typer.run(main) |
I don't know if this is the same as #19 but I thought it might be useful to explain my use case.
Given a large codebase, some 350k LOC, (dis)organized into piles of packages, sometimes preventing a graph from being generated, since the dot file is too complex, I guess?
I ended up trying to use
max-bacon=1
and tried excluding some paths in the codebase, however that ends up removing a significant amount of nodes imported into the excluded files.I saw the new cluster feature, and wondered if this might be a viable option to use clustering for internal packages as well - so that I can "collapse" significant chunks of the codebase if there are more than N nodes inside that package.
Does that make sense?
The text was updated successfully, but these errors were encountered: