Brainstorm for incremental builds #178

dillonkearns · 2020-10-16T18:31:56Z

dillonkearns
Oct 16, 2020
Maintainer

What are the biggest opportunities for performance improvements from incremental builds in elm-pages? This issue is for exploring those ideas, and what the design might look like under the hood.

Since Elm is a language with only pure functions, we can potentially make a lot of assumptions about purity. So the question is, can we leverage that information to skip expensive operations when performing a build.

Building up the dependency graph

Here's a walkthrough of what it might look like to build up the dependency graph, which could be used to skip re-building certain pages.

We can rely on the timestamps of data sources like markdown content, and of Elm files on the filesystem. If the timestamps haven't changed, then we know the files' content hasn't changed.

The more fine-grained you can figure out what the thing depends on, the more cache hits you can get.

Example

So let's say we can assume the Template Module architecture (#109).

We could build up the dependency graph (what a rendered page depended on in order to be built) like this:

about.html (last built at 1:23pm)

Built with this template module: src/Template/Page.elm

That module imports these modules:

ViewHelpers.elm changed at 12:34pm
The view helpers depend on elm-ui, which is using the same package version as when it was built (v1.1.8) - so no timestamp for packages, but instead a version will serve as the cache buster.

Then you need to figure out, what was the data sources were. Right now, it will only be markdown or other files in the content folder, plus StaticHttp data.

We also need to look at the version of the elm-pages CLI and Elm package being used.

If we determine that all those versions that:

The versions of packages have not changed
The markdown file that page depended on, content/about.md, has not changed
The template module that built that file has not changed (nor any of the code that it imports, recursively)
The StaticData it is pulling in has not changed (we could potentially has it if there are StaticHttp requests that we need to perform again)

Then we can use the previous version of about.html. And we use those checks with all the other modules that were generated from that template module, only checking for changes to the markdown files for those pages.

Summary

If you could completely re-use the cached about.html page, and skip building it, then that could be a big win. It's an open question how much time it saves us if we have to perform the StaticHttp requests anyway, though, because we're already doing a lot of work to get the StaticHttp setup for that page and run through the decoders, etc. So perhaps it will be negligible. It would be good to look at prior art to see how they avoid doing a lot of work to figure out data that has changed. Perhaps the code for fetching data can be treated separately in other JS-based frameworks?

Prior art and design discussions in other frameworks would be helpful here.

dillonkearns · 2020-10-18T19:52:25Z

dillonkearns
Oct 18, 2020
Maintainer Author

If I defined the static programmatic routes within a Template Module, then you could purely look at a template module to see if it's changed to know what you need to rebuild.

module Template.Article exposing (template)

template : Template.Template_ TemplateType.TipMetadata ()
template =
    Template.noStaticData { head = head, routes = routes }
        |> Template.buildNoState { view = view }
        |> Template.withPages { routes = routes }


routes =
  StaticData.succeed [ { title = "Hello", tags = [ Elm ] } ]

So you would need to look at

The import graph, and see if the timestamps of any of the files in that graph have changed, and
The StaticData for that Template Module. For StaticData, you could create a dependency graph for local files that is just the timestamps per each file path it depends on. If that has changed, then you bust the cache. For StaticHttp, you would have to hash the HTTP responses. If the hashes are the same, you don't need to bust the cache.
For example:

StaticData cache

GET https://api.github.com/repos/dillonkearns/elm-pages, Empty Body -> Response Hash: abc123
Directory content/articles pattern **/*.md -> articles/decoders.md Timestamp 1:23pm, articles/opaque-types.md Timestamp 2:34pm

Import Graph
Template.Article Last changed 3:45pm
...Other imported files from Template.Article entrypoint
It sounds like a lot, but really it's pretty manageable! Figuring out the import graph should be pretty quick and easy. You can just parse the top of the file until imports are done (no need to parse any other syntax). Then build up the timestamps. If the code has changed, then bust all the cache and don't re-use it.

For the StaticData, you need to 1) perform HTTP requests to see if the hashes change, and 2) look at any files it touches, see if there are new files in that pattern or files that have changed timestamps. If those things haven't changed, then you don't need to rebuild the pages.
I'd need to think more about how you would get more nuanced caching to figure out if you can just rebuild any new pages, but retain existing ones 🤔 Not sure if that structure would support that or not.

0 replies

tennety · 2020-10-25T02:50:19Z

tennety
Oct 25, 2020

For the StaticData, you need to 1) perform HTTP requests to see if the hashes change

Would the HTTP status be reliable in this scenario? Ideally the API should be able to send header tags and/or response code (e.g. 304) which we could check, and if we know from there that the content wasn't modified we'd likely not need to even compute the hash? Or is that already handled?

0 replies

dillonkearns · 2020-11-05T16:00:30Z

dillonkearns
Nov 5, 2020
Maintainer Author

What I had in mind for computing whether it's stale or not is just re-fetching it, but not needing to re-compute any of the things that were derived from that data source if it hadn't changed. And then optionally setting a period of time where you assume that it's not stale and don't need to perform the request again (for example, if you don't want to re-perform it every time, you could say it's good for 1 hour, or 1 day).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brainstorm for incremental builds #178

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Brainstorm for incremental builds #178

dillonkearns Oct 16, 2020 Maintainer

Building up the dependency graph

Example

Summary

Replies: 3 comments

dillonkearns Oct 18, 2020 Maintainer Author

tennety Oct 25, 2020

dillonkearns Nov 5, 2020 Maintainer Author

dillonkearns
Oct 16, 2020
Maintainer

dillonkearns
Oct 18, 2020
Maintainer Author

tennety
Oct 25, 2020

dillonkearns
Nov 5, 2020
Maintainer Author