-
Notifications
You must be signed in to change notification settings - Fork 305
KaTeX (1/n): Initial support for displaying basic KaTeX content #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
476f6aa
to
5164619
Compare
c2a9c4a
to
babe5c8
Compare
babe5c8
to
fe49e16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
Generally this structure looks good. Here's a high-level review — so just a handful of comments mostly about things that I think will help make the changes clearer to understand.
Then once these aspects look good (should be quick, I think), we'll do maintainer review as usual.
test/model/settings_test.dart
Outdated
check(globalSettings).getBool(BoolGlobalSetting.renderKatex) | ||
.isFalse(); | ||
assert(!BoolGlobalSetting.placeholderIgnore.default_); | ||
assert(!BoolGlobalSetting.renderKatex.default_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add these feature flags to the tests (or to remove them from the tests when later taking them out) — adding them can be purely a matter of the one line adding the enum value, plus its dartdoc (and blank line above that).
That also means that adding the feature flag can be squashed into the first commit that uses it.
final spanClass = spanClasses[index]; | ||
switch (spanClass) { | ||
case 'textbf': | ||
// .textbf { font-weight: bold; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are quoting from KaTeX's CSS (or rather the source file for its CSS), and that's an important reference for understanding this code. So let's include a link to that in a comment.
lib/model/content.dart
Outdated
final String? text; | ||
final List<KatexSpanNode> nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these two are mutually exclusive: there's always at most one present, and in fact exactly one.
Let's make that invariant explicit in this class, by at least an assertion at the constructor.
It'd be good to also add some dartdoc on these two fields to say how they relate to each other and to the parent node.
lib/model/content.dart
Outdated
|
||
final String texSource; | ||
final List<KatexSpanNode>? nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the next PR #1452 changes this to be a list of KatexNode
, a new common base class for KatexSpanNode
and some others.
It's probably cleanest if this says KatexNode
from the start, then. That makes the conceptual relationship to this parent node (MathBlockNode
) somewhat clearer — it's not that a math block necessarily contains only "KaTeX span nodes" as direct children, it can contain "KaTeX nodes" in general, and it's just that at this stage the spans are the only kind of KaTeX node that are yet implemented.
lib/model/katex.dart
Outdated
default: | ||
// TODO handle more CSS properties | ||
assert(debugLog('Unsupported CSS property: $property of type ${expression.runtimeType}')); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this code has three levels of how well it supports a given KaTeX blob:
- The HTML doesn't follow any structure we've yet anticipated. The parser in this file throws KatexHtmlParseError, producing
nodes
of null. - The HTML contains something we know we don't yet support, but there's no exception;
nodes
ends up with some kind of parse result. - The HTML consists only of constructs we believe are fully supported — meaning we believe the current version of the code will produce widgets that make the intended math show up exactly the same as with KaTeX on the web.
This default
case is an example of that middle state.
The middle state is very useful for development: you want to let the parser do its thing as best it currently can, and see how the resulting widgets look, while you work on supporting that case.
But I'd also really like to be able to run the parser in a mode where it will only accept constructs we believe are fully supported. That way we can run it on a corpus of public messages collected by the scripts in tools/content/
, and get a survey of what remains to be implemented. That mode could also be useful for turning KaTeX support on by default, but only for expressions we believe will show up exactly right, and for other expressions falling back to the raw TeX like we do in main today.
I think the most effective way to draw that boundary will be to do so starting in this first PR, so that when we include a given construct on the "fully supported" side of it we do so at the same time as we're implementing support for that construct and thinking about it in detail.
Maybe have two experimental flags?
- One flag enables showing KaTeX at all. Things that are fully supported get rendered; things that are incompletely supported, just like those not supported, get the fallback.
- A second flag enables showing KaTeX even where it's incomplete — so the behavior this revision has when
renderKatex
is true. (When the first flag is false, this one would just do nothing.)
Then when support is mostly complete, we might turn the first flag into always-true, while keeping the second flag as experimental (and default-false).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added another flag (forceRenderKatex
) to control if the KaTeX content should be rendered even if there were some errors encountered while parsing the HTML. Currently, the "ignorable errors" happen either when the parser encounters an unknown CSS class or an unknown CSS property in the inline styles.
It will be used in later commits to parse inline CSS styles in HTML while parsing KaTeX HTML spans.
fe49e16
to
8f77f13
Compare
Thanks for the review @gnprice. Pushed an update, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision!
This resolves the high-level feedback I had — @PIG208, over to you for maintainer review.
lib/model/katex.dart
Outdated
} | ||
|
||
// Work around the duplicated case statement with a new switch block, | ||
// to preserve the same order and to keep the cases mirroring the CSS | ||
// definitions in katex.scss . | ||
switch (spanClass) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had wondered about these two separate switch statements 🙂 — this comment is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I just went through the entire PR and played with this a bit on my device. It looks pretty good!
With regards to testing, I'm not sure what the current plan is. It doesn't seem quite useful to have tests structured basically a duplicate of all the KaTeX classes we support. Maybe it would be better to test with examples crawled online?
@@ -0,0 +1,21 @@ | |||
The MIT License (MIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fonts: Add KaTeX custom fonts
These fonts will be used in later commits to show KaTeX
content.
For future reference, perhaps we can also mention where we found these fonts in the commit message with a link.
test/widgets/content_test.dart
Outdated
testWidgets('displays TeX source; experimental flag default', (tester) async { | ||
final globalSettings = testBinding.globalStore.settings; | ||
await globalSettings.setBool(BoolGlobalSetting.renderKatex, null); | ||
check(globalSettings.getBool(BoolGlobalSetting.renderKatex)).isFalse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use check(globalSettings).getBool
here
lib/model/content.dart
Outdated
@@ -864,7 +906,10 @@ class GlobalTimeNode extends InlineContentNode { | |||
|
|||
//////////////////////////////////////////////////////////////// | |||
|
|||
String? _parseMath(dom.Element element, {required bool block}) { | |||
({List<KatexNode>? spans, bool debugHasError, String texSource})? _parseMath( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be cleaner to make a class for the return value of this, and make factories for classes are constructed from the result?
lib/model/content.dart
Outdated
result.add(MathBlockNode( | ||
texSource: texSource, | ||
texSource: parsed.texSource, | ||
nodes: parsed.spans, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentionally that we leave out debugHasError
for this?
lib/model/katex.dart
Outdated
// | ||
// Each case in the switch blocks below is a separate CSS class definition | ||
// in the same order as in katex.scss : | ||
// https://github.com/KaTeX/KaTeX/blob/2fe1941b7e6c0603680ef6edd799bd8a8b46871a/src/styles/katex.scss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use just the first 8 characters of the commit hash to shorten the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FTR the length I normally use is 9 hex digits: https://github.com/zulip/zulip-mobile/blob/main/docs/style.md#mentioning-commits
Use 9 (or 10) hex digits to identify a commit. To help make this convenient, you can tell Git to routinely print 9-digit abbreviations for you by setting
git config core.abbrev 9
.Rationale: The full 40 hex digits is a lot, and generally doesn't flow well in prose. On the other hand 7 hex digits, which is what GitHub shows in its UI whenever it doesn't show 40, is short enough that there's a material risk of collisions: in zulip.git there are already a handful of commits that are ambiguous when identified by just 7 digits, and in a very large project like the Linux kernel such collisions can become routine.
A 9-digit abbreviation is still short enough to fit well in running text; and it's long enough that it's extremely unlikely any given such abbreviation will ever be ambiguous.
For KaTeX 8 is fine and probably 7 would be fine, because it's a smaller repo; but I just use 9 everywhere to simplify.)
|
||
KatexSpanStyles? _parseSpanInlineStyles(dom.Element element) { | ||
if (element.attributes case {'style': final styleStr}) { | ||
final stylesheet = css_parser.parse('*{$styleStr}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need to sanitize styleStr
. I guess not: the html package might have done that, and in the worst case this will just fail to parse.
Regardless, it should be helpful to explain why we want to wrap styleStr
here.
if (expression is css_visitor.EmTerm && expression.value is num) { | ||
return (expression.value as num).toDouble(); | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case, i.e. non-em values, something that we eventually want to add support for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only support em
values currently because, they are the only ones observed in the Katex HTML currently, but if we encounter any other than this will return null
and we handle that later in switch
block. Error-ing if we get an unexpected type (non-em).
lib/model/katex.dart
Outdated
switch (property) { | ||
case 'margin-left': | ||
marginLeftEm = _getEm(expression); | ||
if (marginLeftEm != null) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this continue
and the following ones look like no-ops
lib/model/katex.dart
Outdated
|
||
default: | ||
// TODO handle more CSS properties | ||
assert(debugLog('Unsupported CSS property: $property of type ${expression.runtimeType}')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use _logError
here?
} | ||
|
||
class KatexSpanStyles { | ||
double? heightEm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like heightEm
was introduced here but not used/handled. Is this intended for the next PR?
Yeah, good question. It looks like the current version of the PR doesn't add tests for most of the functionality. Let's at a minimum have tests that cover the interesting logic like for
I think these are two complementary types of tests which are both useful:
We'll definitely want to be doing the second kind once we're a couple of PRs in to this current effort and feel that we've covered the majority of usage — it'll help us prioritize which remaining areas to do next. |
Downloaded from: https://github.com/KaTeX/KaTeX/tree/2fe1941b/fonts These fonts will be used in later commits to show KaTeX content.
This later avoids a collision for the `TextDirection` type, which is also defined in `dart:ui`.
This will be useful in the later commits to query GlobalSettings while content HTML parsing phase.
5d39bef
to
3764df5
Compare
Thanks for the review @PIG208! Pushed an update, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revision! I did another pass and left some small comments.
final String texSource; | ||
} | ||
|
||
MathParserResult? parseMath(dom.Element element, { required bool block }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a dartdoc for this function, for things like the meaning of null
as the return value and what is expected of element
.
switch (property) { | ||
case 'height': | ||
heightEm = _getEm(expression); | ||
if (heightEm != null) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to #1408 (comment), I'm not sure if we need continue
's here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should've replied to the above comment how I addressed it in the revision.
I had addressed that bug by moving the _logError
call outside of the switch
block. If _getEm
returns null
then the condition here will be false
, in which case it would fall down to _logError
where the error will be logged – reporting we got something other than em
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense. Thanks.
@@ -33,6 +34,8 @@ import 'model.dart'; | |||
/// * lib/model/content.dart, which implements of the content parser. | |||
/// * tools/content/fetch_messages.dart, which produces the corpora. | |||
void main() async { | |||
TestZulipBinding.ensureInitialized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a bit different from regular tests, it might be worth it to explain why we need this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "to" from "retrieves the GlobalSettings
to for the experimental flag".
lib/model/content.dart
Outdated
|
||
/// The text or a single character this KaTeX span contains, generally | ||
/// observed to be the leaf node in the KaTeX HTML tree. | ||
/// It will be null if this span has child nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be talking about a "span", do we expect all KatexNode
's to each correspond to a _KatexSpan
?
lib/widgets/content.dart
Outdated
fontSize: kBaseFontSize * 1.21, | ||
fontFamily: 'KaTeX_Main', | ||
height: 1.2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.21
and 1.2
seem quite mysterious. Let's perhaps create some const
's for them or leave some comments explaining them.
lib/widgets/content.dart
Outdated
@@ -805,7 +805,6 @@ class MathBlock extends StatelessWidget { | |||
@override | |||
Widget build(BuildContext context) { | |||
final contentTheme = ContentTheme.of(context); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this belongs to another commit: 32aea8e
3764df5
to
52ff92c
Compare
With this, if the new experimental flag is enabled, the result will be really basic rendering of each text character in KaTeX spans.
This adds another experimental flag called `forceRenderKatex` which, if enabled, ignores any errors generated by the parser (like when encountering an unsupported CSS class) tries to do a "broken" render of the available span and their styles. Allowing the developer to test the different KaTeX content in the wild easily, while still in development.
52ff92c
to
0a8c846
Compare
Thanks for the review @PIG208! Pushed an update, PTAL. |
Thanks for the revision! I just tried this out locally again. Marking this for Greg's review. Left a question on the font style. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajveermalviya for building this, and @PIG208 for the previous reviews!
Exciting to have this feature underway. Here's a review of the first 7/8 commits:
f7a4721 deps: Add csslib as a direct dependency
b4b38d7 fonts: Add KaTeX custom fonts
0106b95 content [nfc]: Alias package:intl import as intl
671fa12 content [nfc]: Use Dart patterns for parsing math content
1fca686 binding: Add getGlobalStoreSync method
7609852 content: Add KaTeX spans parser, initial rendering; w/o styles
c273b0a content: Support basic text styles for KaTeX content
so leaving one more for a later round:
0a8c846 content: Support parsing and handling inline styles for KaTeX content
(I see Zixuan also has one open comment just above.)
/// Get the app's singleton [GlobalStore], null if not already loaded. | ||
/// | ||
/// Where possible, use [GlobalStoreWidget.of] to get access to a [GlobalStore]. | ||
/// Use this method only in contexts like notifications where | ||
/// a widget tree may not exist. | ||
GlobalStore? getGlobalStoreSync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Get the app's singleton [GlobalStore], null if not already loaded. | |
/// | |
/// Where possible, use [GlobalStoreWidget.of] to get access to a [GlobalStore]. | |
/// Use this method only in contexts like notifications where | |
/// a widget tree may not exist. | |
GlobalStore? getGlobalStoreSync(); | |
/// Get the app's singleton [GlobalStore] if already loaded, else null. | |
/// | |
/// Where possible, use [GlobalStoreWidget.of] to get access to a [GlobalStore]. | |
/// Use this method only in contexts where getting access to a [BuildContext] | |
/// is inconvenient. | |
GlobalStore? getGlobalStoreSync(); |
Because this new method doesn't cause the global store to get loaded if it wasn't already, it's really most useful in contexts where a widget tree does presumably exist (because that will have caused the global store to get loaded).
testWidgets('displays TeX source; experimental flag enabled', (tester) async { | ||
final globalSettings = testBinding.globalStore.settings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test should always reset the binding state if it manipulates it:
testWidgets('displays TeX source; experimental flag enabled', (tester) async { | |
final globalSettings = testBinding.globalStore.settings; | |
testWidgets('displays TeX source; experimental flag enabled', (tester) async { | |
addTearDown(testBinding.reset); | |
final globalSettings = testBinding.globalStore.settings; |
testWidgets('displays TeX source; experimental flag default', (tester) async { | ||
final globalSettings = testBinding.globalStore.settings; | ||
await globalSettings.setBool(BoolGlobalSetting.renderKatex, null); | ||
check(globalSettings).getBool(BoolGlobalSetting.renderKatex).isFalse(); | ||
|
||
await prepareContent(tester, plainContent(ContentExample.mathBlock.html)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then when the test cases that set the setting properly reset the state, I think these can be simplified:
testWidgets('displays TeX source; experimental flag default', (tester) async { | |
final globalSettings = testBinding.globalStore.settings; | |
await globalSettings.setBool(BoolGlobalSetting.renderKatex, null); | |
check(globalSettings).getBool(BoolGlobalSetting.renderKatex).isFalse(); | |
await prepareContent(tester, plainContent(ContentExample.mathBlock.html)); | |
testWidgets('displays TeX source; experimental flag default', (tester) async { | |
await prepareContent(tester, plainContent(ContentExample.mathBlock.html)); |
because the default is by definition the state the setting starts in when it hasn't been set.
|
||
final String texSource; | ||
final List<KatexNode>? nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should get dartdoc in particular to clarify what null means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, in fact I guess it can just copy what MathParserResult.nodes
has 🙂
/// The text or a single character this KaTeX node contains, generally | ||
/// observed to be the leaf node in the KaTeX HTML tree. | ||
/// It will be null if [nodes] is non-null. | ||
final String? text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "a single character" a different case from "the text"? It sounds like that's just the text when the text happens to be only one character.
KatexSpanFontStyle? fontStyle; | ||
KatexSpanFontWeight? fontWeight; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put these in same order as the enums are defined
(And I think weight, then style, is the preferred way around: one says "bold italic", e.g. in the names of the fonts this PR adds, not "italic bold")
|
||
@override | ||
String toString() { | ||
if (this == _zero) return '${objectRuntimeType(this, 'KatexSpanStyles')}()'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an optimization, right?
Can leave it out in favor of simplifying the code — toString
on these should only come up in errors or debugging.
if (styles.fontFamily != null) { | ||
textStyle ??= TextStyle(); | ||
textStyle = textStyle.copyWith(fontFamily: styles.fontFamily); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
if (styles.fontFamily != null) { | |
textStyle ??= TextStyle(); | |
textStyle = textStyle.copyWith(fontFamily: styles.fontFamily); | |
} | |
final fontFamily = styles.fontFamily; |
and then construct textStyle
and textAlign
at the end after defining several locals like that?
I think that would be simpler. It'd also avoid copying a TextStyle
repeatedly when there are a couple of different properties being set.
|
||
@override | ||
Widget build(BuildContext context) { | ||
final em = DefaultTextStyle.of(context).style.fontSize!; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lookup is only used in one condition, so can move inside that condition. That way we don't do the lookup when it doesn't end up being needed.
'<span class="mord mathnormal sizing reset-size6 size1">λ</span></span></span></span></span></p>', | ||
[ | ||
MathBlockNode( | ||
texSource: "\\tiny {\\lambda \\Huge \\lambda}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a widget test for this?
I think that'll actually be more pertinent than the parsing test (though we'll naturally have this parsing test as a consequence given the way the ContentExample system is structured, and that's fine): the important thing isn't that one layer says "0.5em" and then a layer inside it says "4.976em" so much as it is that the text winds up at the size that would be 2.488em if found at the root of the math tree.
For the widget test it may be helpful to have this example split into two examples: one for each of the two blocks in this example. That way the widget test can render only the one that it intends to inspect.
This initial implementation include:
Displaying the characters in their corresponding
custom text styles (fonts, font weight, font style).
Character and symbol sizing.
And some subset of inline styles.
This results in support for displaying some simple KaTeX functions.
Related: #46