feat: testing centralized lazy imports #168

briangreunke · 2025-09-07T02:54:54Z

Central Lazy Loading System

In order to make the lazy loading process more DRY, I've attempted to centralize the lazy imports for scorers.

Key Changes:

Created a central lazy loading system for modules and attrs
Created a centralized "scorer" lazy loading system and dependencies
Updated the similarity scorers as PoCs

Notes

Missing imports are handled nicely in the scores (and optionally) via the catch kwarg in the Scorer

Generated Summary:

Introduced lazy loading mechanism for modules within the dreadnode package.
Created LazyImport and LazyAttr classes that facilitate delayed imports and attribute access.
Replaced direct imports in similarity.py with lazy imports to handle potential ImportErrors gracefully.
Added lazy loading for various libraries such as rapidfuzz, nltk, sentence-transformers, and scikit-learn.
Modified scorer evaluation methods to use the new lazy loading structure, improving performance and error handling.
Updated NLTK tokenization to check for required datasets (e.g., 'punkt') during evaluation.
Adjusted pyproject.toml to include new lazy module paths and modified linting rules accordingly.
Overall, these changes enhance modularity and reduce upfront dependency load, which may improve application startup times.

This summary was generated with ❤️ by rigging

Copilot

Pull Request Overview

This PR introduces a centralized lazy loading system to eliminate duplication in import handling across scorer modules. The system provides consistent error handling for optional dependencies and consolidates lazy imports for third-party packages used in similarity scoring.

Created a core lazy loading framework with LazyImport and LazyAttr classes
Centralized scorer-specific lazy imports in a dedicated module
Refactored similarity scorers to use the new centralized system instead of individual import error handling

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

File	Description
pyproject.toml	Added lazy module to mypy configuration and disabled import-not-found errors
dreadnode/lazy/core.py	Core lazy loading implementation with error handling classes
dreadnode/lazy/scorers.py	Centralized lazy imports for all scorer dependencies
dreadnode/scorers/similarity.py	Refactored to use centralized lazy imports, removing duplicate error handling code

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

dreadnode/lazy/core.py

Copilot · 2025-09-07T02:55:35Z

dreadnode/scorers/similarity.py

    def evaluate(data: t.Any, *, reference: str = reference) -> Metric:
+        vectorizer = TfidfVectorizer(stop_words="english")
        candidate_text = str(data)
        tfidf_matrix = vectorizer.fit_transform([candidate_text, reference])
        sim = sklearn_cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]


Creating a new TfidfVectorizer instance on every evaluation call is inefficient. The vectorizer should be created once outside the evaluate function or cached to avoid repeated initialization overhead.

Suggested change

def evaluate(data: t.Any, *, reference: str = reference) -> Metric:

vectorizer = TfidfVectorizer(stop_words="english")

candidate_text = str(data)

tfidf_matrix = vectorizer.fit_transform([candidate_text, reference])

sim = sklearn_cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]

# Fit the vectorizer once on the reference text

vectorizer = TfidfVectorizer(stop_words="english")

reference_vec = vectorizer.fit_transform([reference])

def evaluate(data: t.Any, *, reference: str = reference) -> Metric:

candidate_text = str(data)

candidate_vec = vectorizer.transform([candidate_text])

sim = sklearn_cosine_similarity(candidate_vec, reference_vec)[0][0]

rdheekonda · 2025-09-08T02:55:41Z

dreadnode/lazy/scorers.py

+if t.TYPE_CHECKING:
+    import litellm as litellm  # type: ignore[import-not-found]
+    import nltk as nltk  # type: ignore[import-not-found]
+    from nltk.tokenize import (  # type: ignore[import-not-found]
+        word_tokenize as word_tokenize,
+    )
+    from nltk.translate.bleu_score import (  # type: ignore[import-not-found]
+        sentence_bleu as sentence_bleu,
+    )
+    from rapidfuzz import distance as distance  # type: ignore[import-not-found]
+    from rapidfuzz import fuzz as fuzz  # type: ignore[import-not-found]
+    from rapidfuzz import utils as utils  # type: ignore[import-not-found]
+    from sentence_transformers import (  # type: ignore[import-not-found]
+        SentenceTransformer as SentenceTransformer,
+    )
+    from sentence_transformers import (  # type: ignore[import-not-found]
+        util as util,
+    )
+    from sklearn.feature_extraction.text import (  # type: ignore[import-not-found]
+        TfidfVectorizer as TfidfVectorizer,
+    )
+    from sklearn.metrics.pairwise import (  # type: ignore[import-not-found]
+        cosine_similarity as cosine_similarity,
+    )
+else:
+    fuzz = LazyAttr("rapidfuzz", "fuzz", "text")
+    utils = LazyAttr("rapidfuzz", "utils", "text")
+    distance = LazyAttr("rapidfuzz", "distance", "text")
+    litellm = LazyImport("litellm", "llm")
+    util = LazyAttr("sentence_transformers", "util", "text", package_name="sentence-transformers")
+    TfidfVectorizer = LazyAttr(
+        "sklearn.feature_extraction.text", "TfidfVectorizer", "text", package_name="scikit-learn"
+    )
+    SentenceTransformer = LazyAttr(
+        "sentence_transformers", "SentenceTransformer", "text", package_name="sentence-transformers"
+    )
+    cosine_similarity = LazyAttr(
+        "sklearn.metrics.pairwise", "cosine_similarity", "text", package_name="scikit-learn"


Thanks Brian for tackling this. importlib.* is a good pattern for this, but couple issues I could see here,

Dual maintenance problem, must maintain imports in two places

Example.,

if TYPE_CHECKING: from sklearn.metrics.pairwise import cosine_similarity as cosine_similarity # Place 1 from rapidfuzz import fuzz as fuzz # Place 1 .... else: cosine_similarity = LazyAttr("sklearn.metrics.pairwise", "cosine_similarity", "text") # Place 2 fuzz = LazyAttr("rapidfuzz", "fuzz", "text") # Place 2 ...

For insace, if we wanted to add new sklearn import, need to update at both blocks. Follows the same if any method changes, API changes, need to sync at 2 places. If it's out of sync, we might encounter run time erros.
2. Hidden small perf overhead in calls
every attribute access goers through getattr call, for exmaple

for text in large_dataset: score = fuzz.ratio(ref, text) # __getattr__overhead * 10,000 calls if we have those many examples in the large dataset

However the calls are minmal not expensive

Proposed solution: Use the similar pattern, what you have now, Pandas style: ref link
Something like below:

# dreadnode/utils/imports.py def import_optional_dependency( name: str, extra: str = "", package_name: str = None ) -> Any: """Import optional dependency with helpful error message.""" try: return importlib.import_module(name) except ImportError: pkg = package_name or name raise ImportError( f"Missing dependency '{name}'. " f"Install with: pip install {pkg} or dreadnode[{extra}]" ) from None

Then in scorers/similarity.py

def similarity_with_rapidfuzz( reference: str, method: str = "ratio" ) -> Scorer: """RapidFuzz similarity scorer.""" def evaluate(data: Any) -> Metric: # Import exactly when needed - no globals, no dual maintenance fuzz = import_optional_dependency("rapidfuzz.fuzz", extra="text", package_name="rapidfuzz") candidate_text = str(data) score = getattr(fuzz, method)(reference, candidate_text) return Metric(value=score / 100.0) return Scorer(evaluate, name=f"rapidfuzz_{method}") def similarity_with_sentence_transformers( reference: str, model_name: str = "all-MiniLM-L6-v2" ) -> Scorer: """Sentence Transformers similarity scorer.""" def evaluate(data: Any) -> Metric: # Complex package - import both modules cleanly st = import_optional_dependency("sentence_transformers", extra="text", package_name="sentence-transformers") torch = import_optional_dependency("torch", extra="text") model = st.SentenceTransformer(model_name) # ... rest of implementation return Metric(value=similarity_score) return Scorer(evaluate, name="sentence_transformers")

I think wiuth this we could also minimize TYPE_CHECKING/else blocks and perf overhead

Thank you. Will try it out!

I was trying to find an existing tidy example. Good find w/ pandas

feat: testing centralized lazy imports

28bdf3c

briangreunke requested review from rdheekonda and Copilot September 7, 2025 02:54

dreadnode-renovate-bot bot added the area/python Changes to Python package configuration and dependencies label Sep 7, 2025

briangreunke requested a review from monoxgas September 7, 2025 02:55

Copilot AI reviewed Sep 7, 2025

View reviewed changes

rdheekonda reviewed Sep 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: testing centralized lazy imports #168

feat: testing centralized lazy imports #168

Uh oh!

briangreunke commented Sep 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Sep 7, 2025

Uh oh!

rdheekonda Sep 8, 2025

Uh oh!

briangreunke Sep 9, 2025

Uh oh!

briangreunke Sep 9, 2025

Uh oh!

Uh oh!

feat: testing centralized lazy imports #168

Are you sure you want to change the base?

feat: testing centralized lazy imports #168

Uh oh!

Conversation

briangreunke commented Sep 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Central Lazy Loading System

Generated Summary:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

rdheekonda Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

briangreunke Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

briangreunke Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

briangreunke commented Sep 7, 2025 •

edited by github-actions bot

Loading