-
Notifications
You must be signed in to change notification settings - Fork 239
Developer docs
These are things which are useful to refer back to. At some point in the future they might make their way into a proper docs page on RTD. These notes can be rough and might not always be up to date. If it's a quick answer then put it inline here; if it's a longer read then just link to it.
Source: https://github.com/mckinsey/vizro/pull/775
When serve_locally=True (the default), Dash serves component library resources (generally CSS and JS) through the Flask server using the _dash-component-suites route.
- For Vizro library components (currently just KPI cards), this should happen when Vizro is imported.
- For Vizro framework components (everything else), this should happen only when
Vizro()is instantiated.
This makes our footprint as small as possible and ensures there's reduced risk of CSS name clash when someone wants to use Vizro just as a library but doesn't instantiate Vizro() (not common at all now, but maybe will be in the future).
When serve_locally=False, Dash switches to serving resources through external_url, where it's specified. For Vizro components we use jsDeliver as a CDN for this.
A few complications:
- files that aren't CSS/JS (e.g. fonts, maps) can still be served through the same routes but should not have a
<script>or<link>in the HTML source. This is achieved withdynamic=True - the CDN minifies CSS/JS files automatically, but some we have minified manually ourselves (currently just
vizro-bootstrap.min.css) - it's not possible to serve JS as modules this way, which means we can't easily do
import/exportin them
In future:
- when we release vizro-bootstrap, nothing changes for Vizro framework. Pure Dash users would treat it like any other bootstrap theme i.e. set it through
external_stylesheetsthat points to the stylesheet on our CDN or download the file to their ownassetsfolder. We'd have a shortcut for this likevizro.themeor do it through bootswatch if that's possible. - ideally we would webpack our JS and ship the
.min.jsrather than just relying on the CDN to minify it. This would let us write "proper" rather than just in-browser JS and mean we benefit from tree-shaking etc. rather than just minification that the CDN does. In reality the optimisations would make very little difference to performance, but it's kind of the "right" way to do things. It's more effort than it's worth to set up at the moment, but if we end up maintaining a bigger JS codebase we might do it - vizro-boostrap.css map file + all the SASS should live in the repo so that it can be handled correctly through developer tools in a browser
The order Dash inserts stylesheets is always:
external_stylesheets- library stylesheets as we add on
import vizro - stylesheets added through
append_cssas we do inVizro() - user assets (also go through
append_cssbut only when server gets its first request)
The problem was that figures.css was served in stage 2 and therefore could come before vizro-bootstrap.min.css. I hoped this wouldn't cause any issues but unfortunately it did...
So now what we do is remove the library stylesheets in Vizro() and then add them using the framework's append_css mechanism. This means that vizro-boostrap.min.css always comes first for a framework user because we sort the stylesheets added in stage 3 to put it first (the rest are in alphabetical order). For a Dash user, it will be specified using external_stylesheets so always come first anyway.
See https://github.com/mckinsey/vizro/pull/615. Then altered slightly by:
See https://github.com/mckinsey/vizro/pull/598#pullrequestreview-2200302196.
Here's the simplified callback flow when the page loads or refreshes:
-
HTTP:
page build- All Vizro models return output from their
buildmethod. - This is fast because
builddoesn't rely on the data frame loading. - Dynamic components return a placeholder to be filled later by
on page load. - They must return default original/initial values so persistence can work properly.
- All Vizro models return output from their
- Client-side persistence is applied.
-
If there are
show_in_urlcontrols:- URL query parameter and control values update each other (URL params take the precedence in this process).
-
If control values (like Slider/RangeSlider) come from the URL:
-
update_slider_values/update_range_slider_valuessync the numeric text input fields.
-
-
Dropdown and checklist "select all" sync after the URL
-
update_dropdown_select_allandupdate_checklist_select_alltrigger to sync theselect-alloption in case the controlvalueis changed through the URL.
-
-
HTTP:
on page load- Dynamic placeholder components turn into a loading state.
- All current control values are sent to the server.
- All dynamic components get updated and replace the earlier placeholders.
- Dynamic Vizro models return output from their
__call__method. - This step may take time because it loads and processes the data frame.
- It updates dynamic content (e.g. charts, cards, or control options).
-
Dropdown and checklist "select all" sync again after the OPL
-
update_dropdown_select_allandupdate_checklist_select_alltrigger to sync theselect-alloption in case the controloptionsis changed.
-
Source: https://github.com/mckinsey/vizro/pull/580
Development
Vizro().build(dashboard).run() and then python app.py, which is what we do across our docs. This only works while you're developing but I like recommending it as the first port of call for users because it's simple, quick and easy, like Vizro should be. There's no need to define app.
Deployment
app = Vizro().build(dashboard)
# If you also want it to run during development with python app.py you also need this:
if __name__ == "__main__":
app.run()
and then e.g. gunicorn app:app. The key change of this PR is that in this context there's no longer any need to define server = app.dash.server (although that will still work).
The integration tests in this repo do something a bit different but that's just due to some technicalities of how they run and so don't show a generally recommended pattern.
Source: https://github.com/mckinsey/vizro/pull/151
Here's the rules for how we should write code so that paths are always correctly formed:
- always use
dash.get_relative_pathto link to pages withhref(see_make_page_404_layoutexample link) - always use
dash.get_relative_path(f"/{STATIC_URL_PREFIX}/..")to refer to built-in assets in thestaticfolder (see_make_page_404_layoutexamplehtml.Img) - always use
dash.get_asset_urlto refer to things in the userassetsfolder, e.g. the logo is done this way
Source: https://github.com/mckinsey/vizro/pull/188
- prefer to use
Noneoverhtml.Div(hidden=True)in the case that we don't need to refer to the element in any way (basically whenever you don't set anid). e.g.html.P(self.title) if self.title else None - prefer to use
html.Div(hidden=True)overNonein the case that we do need to refer to the element (basically when you do set anid). e.g.html.Div(hidden=True, id="nav_panel_outer"). Generally these can be identified by the fact thatbuildreturn values have a type like_NavBuildType - prefer to use
""as default value for optional fields which arestr. These fields do not need to acceptNonevalues at all
Source: https://github.com/mckinsey/vizro/pull/367#issuecomment-1994052080
when it comes to using CapturedCallable we should always prefer to use the highest-level interface possible to avoid delving into private/protected things. There's basically three categories of attributes here:
- dunder attributes like
__call__and__getitem__: these are the main point of entry to any callers and should be used wherever possible - protected attributes like
_functionand_arguments: ok to use if needed but will be removed or made into proper public things in due course, so put some thought into exactly what you're trying to do and whether you really need to use them or if you can already achieve it just with dunder attributes - private attributes like
__arguments: you should never need to use these
Source: https://vizro.readthedocs.io/en/stable/pages/user-guides/data/
This sum-up can help you to quickly decide what Vizro data type and configuration to use in your future examples.
Static data:
- When to use: Use for data that does not need to be reloaded while the dashboard is running.
- Production ready: 🟢
- Performances: 🟢
- Limitations: Data can only be reloaded/refreshed by restarting the dashboard app.
- Use cases: Any time when your data does not need to be reloaded while the dashboard is running.
Dynamic data:
- When to use: Use for data that does need to be reloaded while the dashboard is running.
- Production ready: 🟠 The reason is performance that might degrade you app.
-
Performances: 🟠 Your dashboard performance may suffer if: (Use the cache to solve the problem.)
- loading your data is a slow operation or
- you have many figures that use dynamic data or
- many users use the app at the same time.
- Limitations: Performances
-
Use cases: When loading your data is fast operation and you strictly have to get the really latest results from the data source: For example:
- Displaying the results from the just finished workflow triggered by the user. (e.g. some model interaction flow).
- Repetitive reading logs from the file.
- Chat apps.
Dynamic data with cache:
- When to use: Use to improve app performances when the dynamic data is used. Use it only in a case you don't need the really latest data to be always displayed.
- Production ready: SimpleCache: 🔴 FileSystemCache: 🟢 and RedisCache: 🟢
- Performances: SimpleCache: 🟢 FileSystemCache: 🟠 and RedisCache: 🟢
- Limitations: Loaded/displayed data is old for the timeout (user specified) number of seconds, which means that real data (from its source) and the displayed one can differ.
-
Use cases: When loading your data is slow operation or you don't have to get the really latest results from the data source: For example:
- The forecast app (because you don't need the latest results),
- The data that presents searching engine results.
- Use it to reduce the number of external API calls (especially if you pay per API call).
Parametrised dynamic data:
- When to use: Use when the entire data source, or its version, or its chunk that will be loaded into the app should depends on the user's UI input.
- Production ready: Same as Dynamic data
- Performances: Same as Dynamic data
- Limitations: Same as Dynamic data + filter/parameter options are not natively updated according to the newly loaded data.
-
Use cases: For example:
- Selecting the source from where the data is going to be retrieved (e.g. selecting between: linkedin_result, twitter_results, instagram_results options).
- Displaying a certain version of the model interaction results.
- Displaying a only a chunk of the big data (where the concrete chunk depends on the user's input).
P.S. Parametrised data loading is compatible with caching. The cache uses memoization, so that the dynamic data function's arguments are included in the cache key. Pros/Cons between using the Parametrised dynamic data with or without the cache are the same as pros/cons between the Dynamic data with and without the cache which is presented above.
See https://github.com/mckinsey/vizro/pull/195.
This question was posed and discussed in a PR that introduces the vm.Title model as a way to have an icon next to a title. The example, and the question discussed (and thus reused below) is whether a model like vm.Dashboard should have a field title: Union[str, vm.Title] or title: vm.Title and how before and after validators are allowed to coerce types. And how this would affect the Vizro schema.
-
title = Annotated[vm.Title, BeforeValidator(convert_str, json_schema_input_type=Union[str, vm.Title])]"with downside of being less clear to developers -
title = Annotated[Union[str, vm.Title]]and we need to deal with the true consequences, i.e. thattitlecan be bothstrandvm.Title
Ultimately I think we should follow pydantic, and they say:
In essence, Pydantic's primary goal is to assure that the resulting structure post-processing (termed "validation") precisely conforms to the applied type hints.
We definitely do not always obey this, but I think eventually we should. In the ideal case, our type hint in the above example would be vm.Title and this alone, and the user would need to configure this. This raises 2 important points:
- If the user is allowed to configure additional types?
In our context this would be the str - so what if we want to allow Union[str, vm.Title] (but ultimately convert to vm.Title) as input. I think the answer is given here: https://docs.pydantic.dev/latest/concepts/validators/#json-schema-and-field-validators
While the type hint for value is str, the cast_ints validator also allows integers. To specify the correct input type, the json_schema_input_type argument can be provided
So in our example, what we really should be doing is keeping the type hint at vm.Title, but having json_schema_input_type=Union[str, vm.Title].
Beware this is for before validators, after validators are a different story again. While pydantic recommends the above, I am not sure this is so good in our case. We are (atm) not working so much with json and json schema, and our models are the main configuration way, so I am not sure what the vm.Title type hint would do with a different json schema input thing? Would IDE's know about this and the fact that str is also allowed? (Edit: see answer further below) I don't think so... Vizro-AI would would be happy though, and so would a schema based GUI.
For after validators, there are some strong opinions flying about from the maestro himself - it "breaks type hints": https://github.com/pydantic/pydantic/discussions/3997#discussioncomment-3099169. I think we are doing this precise thing a fair bit lol. I agree with him actually, although of course our models are not really consumed much by anyone but us (and maybe vizro-ai) so it's not so problematic - except for the actions chain (see below)
- What if the final type is not a subset of the input type (think for above example of just
json_schema_input_type=str) or even worse if the coerced type is not even public (think of the actions chain in the case ofaftervalidators).
The is problematic for the reasons described above, but it also concretely affects some of our code: to_python. We fixed it for the actions chain, but in essence the model you would serialize contains things that you can't re-instantiate the model with. Not fun.
In the str example, this could be hacked in the before validator passing through any received vm.Title, but it's very ugly. In the case of the actions chain this would fail unless we hacked serialization (which we did).
-
AfterValidatorsthat convert into things that are not public -
BeforeValidatorsthat convert into things that are not in JSON schema
- It has always been our intention to, wherever practical, follow the pydantic guidance that the type hint guarantees the coerced type. Just to understand, the problem you point with doing title: Title is that IDEs/mypy don't like it if you then specified title as a str? This is indeed quite annoying and I don't think there's much we can do about it, but let's double check with a search because surely (hopefully) someone must have wondered about this before (?). e.g. would this help? https://docs.pydantic.dev/latest/integrations/mypy/#init_typed
- The other problem I foresee with this is that if and when we generate API docs automatically, it's slightly confusing because the the API docs here would give vm.Title without mentioning str, even though str is actually the most common user input. That's not a huge problem though so long as we explain it, especially since most of our narrative docs examples will make it clear that title can be a string because that's what we'll do 99% of the time.
- It looks like without additional config, PyLance in VS code at least complains about a
strwhen it's not part of the field. Code for the screenshot below:
import json
from typing import Annotated, Any, Optional, Union
from pydantic import BaseModel, BeforeValidator
class Title(BaseModel):
text: str
icon: Optional[str] = None
def convert_to_title(v: Any) -> Title:
if isinstance(v, str):
return Title(text=v)
return v
class User(BaseModel):
title: Annotated[
Title,
BeforeValidator(convert_to_title, json_schema_input_type=Union[str, Title]),
]
foo = User(title="Foo")
print(foo.model_dump_json())
bar = User(title=Title(text="Bar", icon="👋"))
print(bar.model_dump_json())
print(json.dumps(foo.model_json_schema(), indent=2))Note this does not cause a problem with mypy by default thanks to the init_typed option being off by default.
See https://github.com/mckinsey/vizro/pull/1377 for examples.
Code
Any changes must include the word legacy and/or deprecat* (e.g. deprecation, deprecated, deprecate). This will make it easy to keep track of everything and ensure we remove everything we want to in the breaking version.
Warnings
Emit FutureWarning rather than DeprecationWarning to ensure users see the message. Warning message should be:
- short and sweet
- where possible, include one-line strategy for resolution in imperative tense ("replace x with y")
- state version where breaking change will be released
- link to relevant section of docs page on deprecations (unless this doesn't exist, which should only be the case for very insignificant deprecations)
How to emit these warnings:
- for a class/function/model,
typing_extensions.deprecated, which unliketyping.deprecatedsupports runtime checks - see PEP 702. Iftyping_extensionsstops supporting this then we would instead useDeprecated. The message appears automatically in the API docs. We can use Markdown here so the docs version works nicely so long as the warning is still readable when raised in the console. Use absolute link to deprecation docs section. - for a field, use
make_deprecated_field_warningvalidator (message only shown in console, link to deprecation docs section is absolute). The message doesn't show in API docs so you must write one manually in thedescriptionand/or docstring using "❗Deprecated: " (message shown in docs so use Markdown, link to deprecation docs section is relative). Don't mark field asdeprecatedsince it only affects the JSON schema and raises unwanted warnings when looking through model attributes. - for warning message only shown at runtime and not in API docs, message should be optimised for console, link to deprecation docs section is absolute.
Make sure you get the stacklevel right for warnings so the message raised in console points to the right line of code that triggers the warning.
Tests
- Where feasible and worth the effort to do so, make tests of legacy feature:
- Copy (or cut, depending on what the feature is) and paste tests that use old API and label as
legacy(e.g.legacy_layout). Depending on how important/widespread the change is, this might require copy and pasting a single file or several tests spread throughout different files across the test codebase. Don'tpytest.mark.parameterizetests to check old and new API, just copy and paste instead. This will make it easier to remove the legacy features in future without needing to rewrite tests. - Write new tests to check
FutureWarningis emitted where expected - Filter out
FutureWarningto make legacy tests pass as before
- Copy (or cut, depending on what the feature is) and paste tests that use old API and label as
- Update all other tests to use new API until no
FutureWarnings are raised. Any ignores still required should be done in minimal way and not using global ignore. - There's never any need to test for absence of deprecation warning since tests will automatically fail if unexpected warnings are raised as all warnings are elevated to error by default.
Examples
All examples should change to use the new API.
Documentation
- We have a single page listing deprecations/breaking changes in the API reference. Descriptions should mirror
FutureWarningmessages but there's more space to expand if needed. - Narrative docs can, but do not have to, mention deprecations/breaking changes and resolution strategy where relevant. They should always link to the deprecations/breaking changes API reference page.
- Maybe in future we'll have a more "how to" migration guide also to ease the upgrade to 0.2.0, but I don't imagine it will be very difficult. Could maybe also have some "what's new in 0.2.0" post.
- Deprecated API remains in the API reference but links to the new one.
- Just like with Examples, all other references to the old API should be updated to use the new one.
Changelog
Entry goes in "Deprecate" category.
We follow GitHub flow. In short:
-
mainis the only long-lived branch and is always in a releasable state -
mainis a protected branch and cannot be committed to directly - contributions to
mainmust go through a PR process - contributions must be up to date with
mainbefore merging - PRs are merged using Squash and merge
To keep your PR in sync with main you can use rebase or merge. There are pros and cons to each method. Rebasing rewrites commit history, which can make it cleaner but complicate thingss if there are multiple contributors to a branch. Ask yourself "do I understand what update by rebase does and think it's a good idea to use it here?". If yes, then do it. If no, then update by merge instead. So if in doubt, just use merge (the default).
You should try to avoid long-lived PRs. Tips to achieve this:
- keep code changes small. As a very general rule, it's better to have two small PRs rather than one big one. Consider basing one feature off another to break your work down into more manageable chunks
- make reviewers' lives easy, e.g. with a clear PR description, clean commit history (e.g. use rebase if you understand it), instructions on how to review
- reviewers should try to review quickly (e.g. within a day). PR authors should remind reviewers if required
- several long conversations on PRs and multiple rounds of reviewing can be slow and hard to follow. Consider just talking directly to the PR reviewers
- for complex changes, raise a draft PR early for visibility of your work and to get initial comments and feedback. Talk to PR reviewers and other developers before and while you do the work rather than just waiting for a single "big bang" review when it's complete
- consider merging a feature that's work in progress (e.g. code without tests) so long as you keep it undocumented and ideally private (use
_). This allows an incomplete feature to be present in the codebase without being visible to users. Only do this sparingly or things get confusing though
Sometimes it's impossible to avoid long-lived PRs, e.g. for some big new features, large refactoring work, etc. This is ok. It just shouldn't be the norm.
Ideally, all the following happen on the same merge to main (as above, this doesn't prevent you opening multiple PRs that point to a feature branch):
- source code
- tests
- changelog
- docs
Sometimes it might not be feasible to achieve all of these in one merge to main. How then do we keep main always release-ready? The key is that a feature is publicly available only when it is visible in documentation or changelog. This is ultimately what defines our functionality, rather than the existence of source code or tests in our codebase. This means that it's ok to merge code to main that you are not yet happy for the general public to use, so long as it is not publicly documented and does not break existing functionality. If such code is released then this is fine because the feature isn't yet visible to users. The important thing is to not make documentation/changelog public until you are comfortable that the feature can be used.
- Only add an id when it's actually required
- Leave out targets if the control/action is supposed to apply to all components on screen anyway
- Don't add controls unless they’re necessary to showcase the feature
- Keep component usage minimal—no need to include multiple components if they don’t contribute to the example and feel free to just use a Card if the components are not displayed
To enable global theming in the future, we need to take a consistent approach - both in how we use components and how we apply CSS variables.
Component Usage
Whenever possible, use Dash Bootstrap Components (DBC) when adding new models to our library. These components are automatically styled via the vizro-bootstrap stylesheet.
If no suitable DBC components exists, you may use alternatives such as dash-mantine-components. However, in that case, you must include the necessary CSS in our static folder so the components match the Vizro design. When writing custom CSS, always use Bootstrap variables only.
CSS Variables
In the coming weeks, we'll phase out the use of QB design variables and switch entirely to Bootstrap variables. This simplifies theming and ensures all variables are publicly documented and easier to work with. Going forward, only use Bootstrap variables. A mapping table from QB design variables to Bootstrap variables is provided below and will be updated regularly.
| QB Design Variable | Bootstrap Variable |
|---|---|
| --border-disabled | --bs-tertiary-color |
| --border-hover | --bs-primary |
| --border-selected | --bs-primary-text-emphasis |
| --border-subtleAlpha01 | --bs-border-color |
| --border-subtleAlpha02 | --bs-border-color-translucent |
| --dropdown-label-bg | --bs-tertiary-bg |
| --elevation-0 | --bs-box-shadow-sm |
| --elevation-1 | --bs-box-shadow |
| --field-enabled | --bs-primary-bg-subtle |
| --fill-active | --bs-primary |
| --fill-hoverSelected | --bs-primary-text-emphasis |
| --fill-primary | --bs-primary |
| --fill-secondary | --bs-secondary |
| --fill-subtle | --bs-tertiary-color |
| --focus | --bs-focus-ring-color |
| --focus-color | --bs-focus-ring-color |
| --primary-100 | --bs-gray-900 |
| --primary-900 | --bs-gray-100 |
| --stateOverlays-selected | --bs-primary-bg-subtle |
| --stateOverlays-selectedHover | --bs-border-transluscent |
| --surfaces-bg01 | --bs-secondary-bg |
| --surfaces-bg02 | --bs-tertiary-bg |
| --surfaces-bg03 | --bs-body-bg |
| --surfaces-bg-card | --bs-primary-bg-subtle |