Literature backing the model architecture

Hi, I really like the model . It has been trained good and is generating good results compared to the size and the slight uniqueness in architecture. I know the dataset used here is coming from the paper [TinyStories](https://arxiv.org/abs/2305.07759), but is there also a literature backing for the model architecture or are you planing to publish a paper?