ultratree-results

This shows the results of the ultrametric tree-based, explainable, solar-powered language model

These charts update each day.

Key Charts

Levels of carefulness

Training up an ultrametric tree by finding the optimal split at each step is computationally prohibitive. We can only subsample. Each order of magnitude increase in carefulness is roughly three orders of magnitude more compute time required. "Sense Annotated 1" is the alias of the first training of Careful1000, which seems like a reasonable compromise. It requires about 100 times as many nodes to achieve the same result as Careful10000, but it can train 1000 times faster.

Careful100 and Careful10 are much, much faster to train, but there's a threshold somewhere between Careful100 and Careful1000 where there are too many bad choices. It's open question what that threshold is, and why a threshold even exists.

Does sense annotation work?

The key question that this work set out to answer was whether sense annotation, and indeed, the whole idea of synergistic semantic and statistical models were worth exploring.

The "Unannotated Model 1" can be seen as being a baseline statistical model. It's equivalent to a one-hot encoded decision tree. The sense annotated model's learning generalises where the unannotated model is overfitting very early.

Reproducibility and variance in models

Broadly speaking, re-training on the same data yields similar results. Loss on the hold-out training data goes down, roughly linearly with the logarithm of the number of nodes in the model. Note that these are only sorted by time (the model that was trained first). It's just co-incidence that model 1 is the best and model 5 the worst.

Even the worst model is doing much better than the unannotated model. The probability of this happening by chance is 1/32, which is equivalent to a p-value of 0.03.

Ensembling

Ensembling works. The ensemble of 5 "Careful 1000" models gets results that don't look all that different to an extrapolation of the best of them.

Baseline Comparison

Comparison with a neural baseline shows that the best-trained ultrametric trees need a few orders of magnitude more nodes than a neural network needs trainable parameters. But different ultrametric training regimes have several orders of magnitude difference within them, so it's not hard to believe that a better training regime might close this gap.

Weirder is that here sense annotation makes barely any difference to the neural network models.

Noun loss

Instead of looking at the total loss over all parts of speech, we would expect that nouns would get the most benefit from having sense annotation into a hierarchy.

But the data shows the exact opposite: as we train, we are increasing the loss on nouns, which means that the loss on all other parts of speech much be dropping even more rapidly.

We do see that the ultratree models soundly outperform neural network models on nouns though. Neural networks are behaving as one would expect: larger models have more generalised learning.

Theory: the ultrametric models mostly predict nouns, because nouns are the most common part of speech in the corpus, and they can group parts of speech together into an aggregate. The neural network mostly predicts punctuation, since it has no way of aggregating parts of speech together without internalising rules of grammar. The .'' character is the most common word'' in the corpus, so all else being equal, it will get predicted more often.

Context usage

We can see which contexts get used for node splitting. (This is not the same as asking which nodes get used the most often in inference.)

Everything Else

Total Loss

Noun Loss

Time Views

Model Complexity

Context Usage

How to reproduce these results

Download the TinyStories data set, and sense-annotate some of it

Clone github.com:solresol/wordnetify-tinystories.git

Follow the instructions in the README.md there.

I stored the sense-annotated training data in /tinystories/wordnetify-tinystories/TinyStories.sqlite and the sense-annotated validation data in /tinystories/wordnetify/w2.sqlite

Make an ultrametric tree model

Clone github.com:solresol/ultrametric-trees and follow the instructions in README.md there, including running the cronscript.sh to export results.

I stored the prepared data (and did training) in /ultratree/language-model/tiny.sqlite and the the validation data in /ultratree/language-model/validation.sqlite

Make a baseline comparison

Clone github.com:solresol/ultratree-neural-baseline and follow the instructions in the README.md file there.

Name	Name	Last commit message	Last commit date
Latest commit github-actions[bot] Add generated images [skip ci] Mar 18, 2025 dd6a9f6 · Mar 18, 2025 History 304 Commits
.github	.github	Uncompress inferences.sql.gz in the neural-results.yml as well.	Jan 20, 2025
.gitignore	.gitignore	Let's build the usage histograms automatically.	Jan 9, 2025
README.md	README.md	Let's add context usage for the Careful 10000 model.	Jan 17, 2025
average_depth_vs_time.png	average_depth_vs_time.png	Add generated images [skip ci]	Mar 18, 2025
average_in_region_hits_vs_time.png	average_in_region_hits_vs_time.png	Add generated images [skip ci]	Mar 18, 2025
careful10000-context-usage.png	careful10000-context-usage.png	Add generated images [skip ci]	Mar 18, 2025
careful10000_loss_vs_size.png	careful10000_loss_vs_size.png	Add generated images [skip ci]	Mar 18, 2025
careful10000_noun_loss_vs_size.png	careful10000_noun_loss_vs_size.png	Add generated images [skip ci]	Mar 18, 2025
charts.py	charts.py	feat: Updated charts.py	Jan 16, 2025
confusionchart.py	confusionchart.py	feat: Updated confusionchart.py	Jan 17, 2025
context_snapshots.sql	context_snapshots.sql	Automatic updates 2025-03-18	Mar 18, 2025
context_usage.sql	context_usage.sql	Automatic updates 2025-03-18	Mar 18, 2025
ctxkhistochart.py	ctxkhistochart.py	Trigger on ctxkhistochart changes, and fix up a (commented) reference…	Jan 10, 2025
evaluation_runs.sql	evaluation_runs.sql	Automatic updates 2025-03-18	Mar 18, 2025
exotic_models_loss_vs_size.png	exotic_models_loss_vs_size.png	Add generated images [skip ci]	Mar 18, 2025
inferences.sql.gz	inferences.sql.gz	Automatic updates 2025-03-18	Mar 18, 2025
levels_of_careful.png	levels_of_careful.png	Add generated images [skip ci]	Mar 18, 2025
model_node_count_vs_time.png	model_node_count_vs_time.png	Add generated images [skip ci]	Mar 18, 2025
neural-results-chart.py	neural-results-chart.py	Wrong column name.	Jan 24, 2025
neural-results.csv	neural-results.csv	Include the noun loss	Jan 1, 2025
neural-results.png	neural-results.png	Add generated images [skip ci]	Mar 18, 2025
noun-baseline.png	noun-baseline.png	Add generated images [skip ci]	Mar 18, 2025
noun_loss_vs_model_size.png	noun_loss_vs_model_size.png	Add generated images [skip ci]	Mar 18, 2025
noun_loss_vs_model_size_with_ensemble.png	noun_loss_vs_model_size_with_ensemble.png	Add generated images [skip ci]	Mar 18, 2025
noun_loss_vs_time.png	noun_loss_vs_time.png	Add generated images [skip ci]	Mar 18, 2025
plain_models_loss_vs_size.png	plain_models_loss_vs_size.png	Add generated images [skip ci]	Mar 18, 2025
sense-annotated1-context-usage.png	sense-annotated1-context-usage.png	Add generated images [skip ci]	Mar 18, 2025
sense_vs_unannotated.png	sense_vs_unannotated.png	Add generated images [skip ci]	Mar 18, 2025
sweep.yaml	sweep.yaml	Shorter version.	Dec 3, 2024
timingchart.py	timingchart.py	feat: Updated timingchart.py	Jan 21, 2025
total_loss_vs_model_size.png	total_loss_vs_model_size.png	Add generated images [skip ci]	Mar 18, 2025
total_loss_vs_model_size_with_ensemble.png	total_loss_vs_model_size_with_ensemble.png	Add generated images [skip ci]	Mar 18, 2025
total_loss_vs_time.png	total_loss_vs_time.png	Add generated images [skip ci]	Mar 18, 2025
training-results.csv	training-results.csv	Automatic updates 2025-02-26	Feb 25, 2025
unannotated-context-usage.png	unannotated-context-usage.png	Add generated images [skip ci]	Mar 18, 2025
unannotated-model1-training-results.csv	unannotated-model1-training-results.csv	Automatic updates 2025-02-26	Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ultratree-results

Key Charts

Levels of carefulness

Does sense annotation work?

Reproducibility and variance in models

Ensembling

Baseline Comparison

Noun loss

Context usage

Everything Else

Total Loss

Noun Loss

Time Views

Model Complexity

Context Usage

How to reproduce these results

Download the TinyStories data set, and sense-annotate some of it

Make an ultrametric tree model

Make a baseline comparison

About

Releases

Packages

Contributors 3

Languages

solresol/ultratree-results

Folders and files

Latest commit

History

Repository files navigation

ultratree-results

Key Charts

Levels of carefulness

Does sense annotation work?

Reproducibility and variance in models

Ensembling

Baseline Comparison

Noun loss

Context usage

Everything Else

Total Loss

Noun Loss

Time Views

Model Complexity

Context Usage

How to reproduce these results

Download the TinyStories data set, and sense-annotate some of it

Make an ultrametric tree model

Make a baseline comparison

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages