Description
As the maintainer of <model-viewer>
, I'm humbled to have Apple referencing it in a web standards proposal. I've had a number of conversations now in various standards bodies about the <model>
proposal, as well as various internal conversations at Google about whether we should propose something similar in Chrome going back at least three years. I figured I should summarize those conversations here publicly to stimulate further discussion.
As much as it would have been good for my career to push <model-viewer>
into Chrome and the standards process, I have instead argued against it because I think would hinder innovation in what is currently a rapidly-evolving field. I'll list out some pros and cons below of standardizing a <model>
element vs. using a JS library like <model-viewer>
, SketchFab, babylon.js, etc. Please add comments with any pros and cons I've missed, as well as discussion of those mentioned.
Pro: I'm just going to quote the only pro given in the explainer:
Do not add a new element. Pass enough data to WebGL to render accurately
As noted above, this would require any site that wants to use an AR experience to request and have the user trust that site enough to allow them access to the camera stream as well as other information. A new element allows this use case without requiring the user to make that decision.
First, this is largely false. AR within the browser today is accomplished via the WebXR standard (which iOS Safari has not implemented) and it was explicitly developed with privacy in mind. WebXR in fact works without giving the website access to the camera feed, hence the distinction between the XR permission and the camera permission. It does give access to the camera pose in order make canvas rendering possible, but all of this has gone through numerous rounds of privacy and security review. Even the precision of available data is capped to limit fingerprinting.
Con: Device/browser compatibility & consistency.
The various JS libraries for 3D give uniform rendering and universal support for the file formats and extensions of their choice across devices and browsers today (including Safari). And when they implement a new extension, it is available on all browsers simultaneously. The only exception is AR QuickLook on iOS, which has neither the format support, nor the customizability, to achieve rendering consistency, which is constantly noticed by our users. First, <model>
appears excited to follow the debacle of the <audio>
tag regarding format support across browsers. However, even if a format was agreed upon, I would love to hear the plan for keeping extension support and rendering quality consistent across browsers over time. This is a rapidly-evolving field; Khronos has been releasing several new PBR extension per year for some time, and that looks unlikely to slow. There is more competition between JS libraries than between browsers because the cost of switching is so much lower; the last thing we want to do is hand an innovative field over to a duopoly.
Con: Scale of the API to standardize.
The current <model>
API proposal is deceptively simple. This may be because it is so focused on the AR use case and proposes to also solve 3D-in-the-DOM as a side-effect. glTF's usage across e-commerce has demonstrated clearly that while AR has some great niche value, 3D-in-the-DOM is actually the dominant use case. And it requires a lot more customization than AR, especially around camera controls, limits, interactions, and prompts. You can get a taste of the critical APIs <model>
is currently missing here. Nevermind the arbitrary choices like model framing, movement behaviors, etc that Apple proposes be left up to browsers to create totally inconsistent experiences.
The bigger problem I foresee though, is requirements creep. I know this well from maintaining MV; I am constantly pushed to expose more and more of the underlying three.js API. I resist in order to keep my product differentiated at a higher level of abstraction, but it's a very fuzzy line. By natively supporting a 3D model in the browser, I predict no one will be satisfied until a Unity-sized API has been web-standardized around it. This is the same problem VRML ran into decades ago. Standards bodies are powerful, but slow - I fear to think how many years it would take to agree on a standard so complex.
In conclusion, I would say Apple's use case can be well solved with today's JS rendering libraries if they simply add WebXR support to iOS. Even if that privacy barrier is somehow insurmountable, they could also propose a standard way to launch native AR experiences from the browser without the need to standardize a new DOM element, which would keep the proposal much simpler, but sadly without any JS-based customization opportunity.