Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: categorization of notebooks to simplify huge repositories navigation #190

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

andreytaboola
Copy link
Contributor

@andreytaboola andreytaboola commented Oct 8, 2024

0.6.4 (2024-10-08)

  • Feature: Categorization of the notebooks, allowing set special category tag to notebooks for easy grouping
  • This feature solves issues with huge repo's where only limited number of notebooks are used in the webapp:
    • Very deep navigation tree in the ui for deepest notebook paths
    • Super long names of the reports in scheduler and results pages
    • Hard tile navigation for the reports
  • Enabling categorization of the notebooks using special flag:
    • Add 'category=..' tag to the relevant notebooks metadata
    • Execute notebooker with --categorization flag
  • Important: only categorized notebooks, those having 'category=..' tag are shown as options to select in the webapp
  • Keeps original navigation by directory structure if categorization flag is not

@andreytaboola
Copy link
Contributor Author

Doesn't affect original notebooker navigation if categorization flag is not set

@andreytaboola
Copy link
Contributor Author

Checks pipeline is not running:
https://app.circleci.com/pipelines/github/man-group/notebooker
No follower's GitHub token could fetch config.yml

@andreytaboola
Copy link
Contributor Author

@jonbannister can you take a look?

@andreytaboola
Copy link
Contributor Author

@jonbannister Hi! Please review

Copy link
Collaborator

@jonbannister jonbannister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a different approach. We cannot rely on current_app.config to store the PATH_TO_CATEGORY_DICT, and we can't rely on using the templates to display results - e.g. what if the templates completely change: do we lose all the results?

The categories should be a first-class piece of metadata associated with the report_results and, if activated in the webapp, should essentially act like a top level folder - possibly with different endpoints to make it clearer/cleaner. We shouldn't have lots of if statements everywhere which slightly change behaviour, it should follow a (minimal) separate code path to do the right thing.

I imagine this could be done as a thin layer on top of the existing folder structure which just adds another layer of depth to the frontpage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please revert the version bump until we have ensured the tests pass on master

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please add some documentation/screenshots of the functionality?

"--categorization",
default=False,
is_flag=True,
help="If selected, discovers only templates with the 'category=example' tags set to any cell and groups notebooks by their category names",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behaviour when multiple cells in the same notebook have clashing categories?

@@ -180,6 +188,7 @@ def start_webapp(

@base_notebooker.command()
@click.option("--report-name", help="The name of the template to execute, relative to the template directory.")
@click.option("--category", default="", help="Category of the template.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this come from the cell tags? How come this needs to be passed in?

@@ -351,6 +357,7 @@ def _get_overrides(overrides_as_json: AnyStr, iterate_override_values_of: Option
def execute_notebook_entrypoint(
config: BaseConfig,
report_name: str,
category: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an Optional[str]

return new_dict

filtered_dict = filter_dict(d)
return strip_extensions(filtered_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come all of this extra filtering and cleaning is required? It wasn't beforehand, and the categories do not add new files to clutter the output structure as far as I understand.


def get_all_templates():
if current_app.config["CATEGORIZATION"]:
return get_all_possible_templates(warn_on_local=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the output type - is that intended?

@@ -14,7 +14,7 @@ def test_create_schedule(flask_app, setup_workspace):
rv = client.get("/core/all_possible_templates_flattened")
assert rv.status_code == 200
data = json.loads(rv.data)
assert data == {"result": ["fake/py_report", "fake/ipynb_report", "fake/report_failing"]}
assert sorted(data["result"]) == sorted(["fake/py_report", "fake/ipynb_report", "fake/report_failing"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the ordering change?

// title: "Report Name",
// name: "report_name",
// data: "params.report_name",
// },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment out the report name?

@@ -52,6 +52,8 @@ def remove_schedule(job_id):


def get_job_id(report_name: str, report_title: str) -> str:
if "PATH_TO_CATEGORY_DICT" in current_app.config and report_name in current_app.config["PATH_TO_CATEGORY_DICT"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this way of retrieving the category is tenable - it assumes the webapp still has access to the template files (which is not guaranteed) and that get_directory_structure() has run (which is not guaranteed at this point as far as I know)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants