Skip to content

Conversation

@sgathrid
Copy link
Member

@sgathrid sgathrid commented Oct 15, 2025

TL;DR:

  1. Logging support in Moon, Planet, and Galaxy classes.
import thema
thema.enable_logging('INFO') # --> 'DEBUG' / 'WARNING' also supported
  1. Pre-curvature filters added in utils/starFilters.py. Filters like component_count_filter must be instantiated with parameters before use; direct argument passing via galaxy.collapse is not yet supported (I don't know if this is even possible?)
from thema.multiverse.universe.utils.starFilters import component_count_filter

filter = component_count_filter(9)

selected_stars = galaxy.collapse(
    metric="stellar_curvature_distance",
    filter_fn=filter
)

Summary:

This PR introduces additions to support Thema usage in the Retire package.

Logging Infrastructure

  • Added notebook-friendly logging controls: thema.enable_logging() and thema.disable_logging() for interactive use.
  • Implemented multiprocessing logging: Moon objects spawned by Planet now inherit logging configuration.
  • Added logging throughout the pipeline:
    • Moon: DEBUG logging for data processing steps (imputation, encoding, scaling).
    • Planet: INFO/DEBUG logging for orchestration and Moon creation.
    • Galaxy: INFO logging for filter results, clustering progress, and representative selection.

Galaxy Filter Improvements

  • Fixed duplicate filter logic in Galaxy.collapse() that checked filter_fn twice.
  • Added proper filter_fn parameter to Galaxy constructor for non-YAML usage.
  • Enhanced filter reporting: logs how many graphs pass/fail filters before curvature computation.
  • Clarified order of operations: Filter → Curvature Distance → Clustering → Selection.

Code Quality & Bug Fixes

  • Resolved circular imports by creating logging_utils.py.
  • Cleaned up duplicate code in Planet class (_instantiate_moon methods).
  • Suppressed sklearn deprecation warnings (force_all_finite) from dependencies (fixes Deprecation Warning: force_all_finite from scikit-learn #50) [NOTE: the issue is from Kepler mapper and Hdbscan's scikit learn versioning, cannot actually fix until they update their packages].
  • Improved error handling in logging utilities with specific exception types.
  • Updated documentation to reflect new logging capabilities.

@sgathrid sgathrid added the enhancement New feature or request label Oct 15, 2025
@sgathrid sgathrid changed the title Logging and filters Logging and Graph Filters Oct 15, 2025
sgathrid and others added 7 commits October 14, 2025 19:21
Copy link
Member

@jeremy-wayland jeremy-wayland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Logging works well! I moved the utility functions into utils.py to keep the thema module cleaner.

For users, the main entry points are:

import thema

thema.enable_logging("INFO")
thema.disable_logging()

These are defined in the __init__.py of the thema module, so there’s no need for additional files.

Copy link
Member

@jeremy-wayland jeremy-wayland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgathrid can you add more detailed logging also for outer system and universe objects? When I put on the "DEBUG" tag I only get detailed logs for inner system objects.

#NOTE much of the DEBUG logging for Galaxy was in the collapse methods which wasn't being called (see requested changes below)

Copy link
Member

@jeremy-wayland jeremy-wayland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgathrid The filter functions you introduced take arguments:

def component_count_filter(target_components: int) -> Callable:

def component_count_range_filter(min_components: int, max_components: int) -> Callable:


def minimum_nodes_filter(min_nodes: int) -> Callable:

def minimum_edges_filter(min_edges: int) -> Callable:

def minimum_unique_items_filter(min_unique_items: int) -> Callable:

These weren’t previously being used in the Thema.genesis method, since there was no call to Galaxy.collapse– I've added that in my last commit. Now running Thema().genesis will break when using filter functions that require arguments.

I’m assuming you’d like this configurable through the YAML driver — in that case, you’ll likely need to set up corresponding dataclasses in thema.config that handle the arguments for each function, including sensible defaults when not specified by the user (similar to how it’s done in projectiles).

@sgathrid
Copy link
Member Author

sgathrid commented Oct 20, 2025

@jeremy-wayland

@sgathrid can you add more detailed logging also for outer system and universe objects? When I put on the "DEBUG" tag I only get detailed logs for inner system objects.

#NOTE much of the DEBUG logging for Galaxy was in the collapse methods which wasn't being called (see requested changes below)

Addressed in commit 5c0a672

@sgathrid The filter functions you introduced take arguments:

def component_count_filter(target_components: int) -> Callable:

def component_count_range_filter(min_components: int, max_components: int) -> Callable:


def minimum_nodes_filter(min_nodes: int) -> Callable:

def minimum_edges_filter(min_edges: int) -> Callable:

def minimum_unique_items_filter(min_unique_items: int) -> Callable:

These weren’t previously being used in the Thema.genesis method, since there was no call to Galaxy.collapse– I've added that in my last commit. Now running Thema().genesis will break when using filter functions that require arguments.

I’m assuming you’d like this configurable through the YAML driver — in that case, you’ll likely need to set up corresponding dataclasses in thema.config that handle the arguments for each function, including sensible defaults when not specified by the user (similar to how it’s done in projectiles).

Addressed in commit 1f3a36c.

See ee359b2 for resolution of the 2nd part of your comment re data classes + config.py. Filters are simple key-value parameter sets, unlike complex projectiles or stars, so using a dictionary keeps the code minimal, easy to maintain, and avoids the unnecessary complexity of dataclasses.

Now, Galaxy can take the following in the yaml to control filtering:

Galaxy:
  filter: component_count
  filter_params:
    target_components: 9

or (example using a filter with 2 args)

Galaxy:
  filter: component_count_range
  filter_params:
    min_components: 7
    max_components: 10

Note: If you want to do more advanced filtering i.e. run curvature on every group filtered by the num components, you will need to write your own loop + use .collapse() as is done in the Retire demo. Running filters with the yaml & thema.genesis() function only supports single filters.

sgathrid and others added 6 commits October 20, 2025 13:37
Reflow long lines, normalize whitespace/quotes, and tidy docstrings to match Black formatting. No logic changes.
Cast outDir to str for os.makedirs and joins to support Path-like inputs.
Add timing, worker count, and created-file delta logs in Planet.fit.
No algorithmic changes; improves runtime observability and safety.
Add duration measurement, pre/post .pkl counts, and clearer worker logs.
More explicit projector setup logs. Behavior unchanged—observability only.
…llapse diagnostics

Integrate YAML-driven filter config via config.filter_configs; default to nofilterfunction with display labels.
Type-safe casts and consistent path handling for out/clean/proj dirs.
In collapse(): robust file selection (dir/list), timing for distance/clustering, off-diagonal distance stats, cluster size distribution, and nReps vs distance_threshold handling.
Improves resilience and transparency; star computation logic unchanged.
…d tidy formatting

Switch default curvature measure from "forman_curvature" to "ollivier_ricci_curvature".
Also apply minor formatting cleanups (multiline function signature and file-list comprehension).
…amples and imports

- Ensure self.selected_model_files is initialized in __init__
- Populate selected_model_files after galaxy.collapse()
- Reorder/add networkx import near other stdlib imports
- Remove stray trailing spaces in docstring examples
@jeremy-wayland jeremy-wayland requested review from Copilot and stuartwayland and removed request for jeremy-wayland and stuartwayland October 21, 2025 14:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive logging infrastructure and improved graph filtering capabilities to the Thema package. The changes enable better debugging and observability through controlled logging at DEBUG, INFO, and WARNING levels, while adding flexible pre-curvature filtering options for graph selection.

Key changes:

  • Added notebook-friendly logging controls (thema.enable_logging(), thema.disable_logging()) with multiprocessing support
  • Implemented parameterized graph filters (component count, nodes, edges, unique items) with proper instantiation patterns
  • Enhanced Galaxy collapse workflow with detailed logging, filter validation, and improved error handling

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
thema/init.py Added logging enable/disable functions and sklearn deprecation warning suppression
thema/utils.py Added logging config utilities for multiprocessing and tqdm notebook detection
thema/thema.py Added selected_model_files tracking and galaxy selection integration
thema/config.py Added filter_configs mapping for YAML-based filter configuration
thema/multiverse/universe/utils/starFilters.py Implemented parameterized filter functions (component_count, nodes, edges, etc.)
thema/multiverse/universe/geodesics.py Enhanced stellar_curvature_distance to accept file lists and improved type hints
thema/multiverse/universe/galaxy.py Major refactor: added comprehensive logging, filter setup, validation, and clustering metrics
thema/multiverse/universe/stars/jmapStar.py Added debug logging for empty complexes and graph creation failures
thema/multiverse/universe/star.py Added error logging in save method
thema/multiverse/system/inner/planet.py Added logging throughout Moon instantiation and fixed directory handling
thema/multiverse/system/inner/moon.py Added detailed debug logging for imputation, encoding, and scaling steps
thema/multiverse/system/outer/oort.py Added logging for projection orchestration with timing and success metrics
thema/multiverse/system/outer/comet.py Added error logging in save method
thema/multiverse/system/outer/projectiles/*.py Added logger initialization to projection modules

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +5 to 7
def nofilterfunction(graphobject) -> int:
"""Default filter that accepts all graph objects."""
return 1
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation -> int suggests this function returns an integer, but it's used as a filter (returning 1 for pass, 0 for fail). Consider using -> bool with True/False returns for clearer intent, or document why integer returns are preferred.

Suggested change
def nofilterfunction(graphobject) -> int:
"""Default filter that accepts all graph objects."""
return 1
def nofilterfunction(graphobject) -> bool:
"""Default filter that accepts all graph objects."""
return True

Copilot uses AI. Check for mistakes.
files: str | list,
filterfunction: Callable | None = None,
curvature="forman_curvature",
curvature="ollivier_ricci_curvature",
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default curvature changed from 'forman_curvature' to 'ollivier_ricci_curvature', but the docstring on line 34 still states the default is 'forman_curvature'. Update the docstring to match the new default.

Copilot uses AI. Check for mistakes.
component_counts.append(
nx.number_connected_components(star_obj.starGraph.graph)
)
except:
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions. Use except Exception: to avoid catching system-exiting exceptions like KeyboardInterrupt and SystemExit.

Suggested change
except:
except Exception:

Copilot uses AI. Check for mistakes.
# Tag the callable with a human-friendly name for logging
try:
setattr(func, "_display_name", str(filter_type))
except Exception:
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting attributes on functions can fail if the function object doesn't support it (e.g., built-in functions, some callables). Consider using a wrapper class or decorator to attach metadata instead of relying on try/except to silently suppress failures.

Copilot uses AI. Check for mistakes.
Comment on lines +619 to +620
# Keep logging resilient
pass
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent exception suppression in logging code makes debugging difficult. Consider logging the exception at DEBUG level or adding a comment explaining what specific exceptions are expected and why they're safe to ignore.

Suggested change
# Keep logging resilient
pass
# Keep logging resilient, but log exception details at DEBUG level for troubleshooting
logger.debug("Exception occurred while logging distance matrix statistics", exc_info=True)

Copilot uses AI. Check for mistakes.
@jeremy-wayland jeremy-wayland self-requested a review October 21, 2025 17:13
Copy link
Member

@jeremy-wayland jeremy-wayland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging for the win 🚀

@jeremy-wayland jeremy-wayland merged commit fd45e05 into main Oct 21, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecation Warning: force_all_finite from scikit-learn

3 participants