-
Notifications
You must be signed in to change notification settings - Fork 0
Logging and Graph Filters #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…addition to dirs `Galaxy.collapse()` supports new args for this `geodesics/stellar_curvature_distance` supports new args for this
…g the collapse sequence
…dpoints are imported directly from `thema` level __init___, so I opt for minimal files here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Logging works well! I moved the utility functions into utils.py to keep the thema module cleaner.
For users, the main entry points are:
import thema
thema.enable_logging("INFO")
thema.disable_logging()These are defined in the __init__.py of the thema module, so there’s no need for additional files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgathrid can you add more detailed logging also for outer system and universe objects? When I put on the "DEBUG" tag I only get detailed logs for inner system objects.
#NOTE much of the DEBUG logging for Galaxy was in the collapse methods which wasn't being called (see requested changes below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgathrid The filter functions you introduced take arguments:
def component_count_filter(target_components: int) -> Callable:
def component_count_range_filter(min_components: int, max_components: int) -> Callable:
def minimum_nodes_filter(min_nodes: int) -> Callable:
def minimum_edges_filter(min_edges: int) -> Callable:
def minimum_unique_items_filter(min_unique_items: int) -> Callable:These weren’t previously being used in the Thema.genesis method, since there was no call to Galaxy.collapse– I've added that in my last commit. Now running Thema().genesis will break when using filter functions that require arguments.
I’m assuming you’d like this configurable through the YAML driver — in that case, you’ll likely need to set up corresponding dataclasses in thema.config that handle the arguments for each function, including sensible defaults when not specified by the user (similar to how it’s done in projectiles).
Addressed in commit 5c0a672
Addressed in commit 1f3a36c. See ee359b2 for resolution of the 2nd part of your comment re data classes + Now, Galaxy can take the following in the yaml to control filtering: Galaxy:
filter: component_count
filter_params:
target_components: 9or (example using a filter with 2 args) Galaxy:
filter: component_count_range
filter_params:
min_components: 7
max_components: 10Note: If you want to do more advanced filtering i.e. run curvature on every group filtered by the num components, you will need to write your own loop + use |
Reflow long lines, normalize whitespace/quotes, and tidy docstrings to match Black formatting. No logic changes.
Cast outDir to str for os.makedirs and joins to support Path-like inputs. Add timing, worker count, and created-file delta logs in Planet.fit. No algorithmic changes; improves runtime observability and safety.
Add duration measurement, pre/post .pkl counts, and clearer worker logs. More explicit projector setup logs. Behavior unchanged—observability only.
…llapse diagnostics Integrate YAML-driven filter config via config.filter_configs; default to nofilterfunction with display labels. Type-safe casts and consistent path handling for out/clean/proj dirs. In collapse(): robust file selection (dir/list), timing for distance/clustering, off-diagonal distance stats, cluster size distribution, and nReps vs distance_threshold handling. Improves resilience and transparency; star computation logic unchanged.
…d tidy formatting Switch default curvature measure from "forman_curvature" to "ollivier_ricci_curvature". Also apply minor formatting cleanups (multiline function signature and file-list comprehension).
…amples and imports - Ensure self.selected_model_files is initialized in __init__ - Populate selected_model_files after galaxy.collapse() - Reorder/add networkx import near other stdlib imports - Remove stray trailing spaces in docstring examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces comprehensive logging infrastructure and improved graph filtering capabilities to the Thema package. The changes enable better debugging and observability through controlled logging at DEBUG, INFO, and WARNING levels, while adding flexible pre-curvature filtering options for graph selection.
Key changes:
- Added notebook-friendly logging controls (
thema.enable_logging(),thema.disable_logging()) with multiprocessing support - Implemented parameterized graph filters (component count, nodes, edges, unique items) with proper instantiation patterns
- Enhanced Galaxy collapse workflow with detailed logging, filter validation, and improved error handling
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| thema/init.py | Added logging enable/disable functions and sklearn deprecation warning suppression |
| thema/utils.py | Added logging config utilities for multiprocessing and tqdm notebook detection |
| thema/thema.py | Added selected_model_files tracking and galaxy selection integration |
| thema/config.py | Added filter_configs mapping for YAML-based filter configuration |
| thema/multiverse/universe/utils/starFilters.py | Implemented parameterized filter functions (component_count, nodes, edges, etc.) |
| thema/multiverse/universe/geodesics.py | Enhanced stellar_curvature_distance to accept file lists and improved type hints |
| thema/multiverse/universe/galaxy.py | Major refactor: added comprehensive logging, filter setup, validation, and clustering metrics |
| thema/multiverse/universe/stars/jmapStar.py | Added debug logging for empty complexes and graph creation failures |
| thema/multiverse/universe/star.py | Added error logging in save method |
| thema/multiverse/system/inner/planet.py | Added logging throughout Moon instantiation and fixed directory handling |
| thema/multiverse/system/inner/moon.py | Added detailed debug logging for imputation, encoding, and scaling steps |
| thema/multiverse/system/outer/oort.py | Added logging for projection orchestration with timing and success metrics |
| thema/multiverse/system/outer/comet.py | Added error logging in save method |
| thema/multiverse/system/outer/projectiles/*.py | Added logger initialization to projection modules |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| def nofilterfunction(graphobject) -> int: | ||
| """Default filter that accepts all graph objects.""" | ||
| return 1 |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type annotation -> int suggests this function returns an integer, but it's used as a filter (returning 1 for pass, 0 for fail). Consider using -> bool with True/False returns for clearer intent, or document why integer returns are preferred.
| def nofilterfunction(graphobject) -> int: | |
| """Default filter that accepts all graph objects.""" | |
| return 1 | |
| def nofilterfunction(graphobject) -> bool: | |
| """Default filter that accepts all graph objects.""" | |
| return True |
| files: str | list, | ||
| filterfunction: Callable | None = None, | ||
| curvature="forman_curvature", | ||
| curvature="ollivier_ricci_curvature", |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default curvature changed from 'forman_curvature' to 'ollivier_ricci_curvature', but the docstring on line 34 still states the default is 'forman_curvature'. Update the docstring to match the new default.
| component_counts.append( | ||
| nx.number_connected_components(star_obj.starGraph.graph) | ||
| ) | ||
| except: |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except clause catches all exceptions. Use except Exception: to avoid catching system-exiting exceptions like KeyboardInterrupt and SystemExit.
| except: | |
| except Exception: |
| # Tag the callable with a human-friendly name for logging | ||
| try: | ||
| setattr(func, "_display_name", str(filter_type)) | ||
| except Exception: |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting attributes on functions can fail if the function object doesn't support it (e.g., built-in functions, some callables). Consider using a wrapper class or decorator to attach metadata instead of relying on try/except to silently suppress failures.
| # Keep logging resilient | ||
| pass |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent exception suppression in logging code makes debugging difficult. Consider logging the exception at DEBUG level or adding a comment explaining what specific exceptions are expected and why they're safe to ignore.
| # Keep logging resilient | |
| pass | |
| # Keep logging resilient, but log exception details at DEBUG level for troubleshooting | |
| logger.debug("Exception occurred while logging distance matrix statistics", exc_info=True) |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…t.py Co-authored-by: Copilot <[email protected]>
…ns and supported types fix kwargs support to work with `stellar_curvature_distance` params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging for the win 🚀
TL;DR:
Moon,Planet, andGalaxyclasses.utils/starFilters.py. Filters likecomponent_count_filtermust be instantiated with parameters before use; direct argument passing viagalaxy.collapseis not yet supported (I don't know if this is even possible?)Summary:
This PR introduces additions to support Thema usage in the Retire package.
Logging Infrastructure
thema.enable_logging()andthema.disable_logging()for interactive use.Galaxy Filter Improvements
Galaxy.collapse()that checkedfilter_fntwice.filter_fnparameter to Galaxy constructor for non-YAML usage.Code Quality & Bug Fixes
logging_utils.py._instantiate_moonmethods).force_all_finite) from dependencies (fixes Deprecation Warning:force_all_finitefrom scikit-learn #50) [NOTE: the issue is from Kepler mapper and Hdbscan's scikit learn versioning, cannot actually fix until they update their packages].