Split Article model #415

twm · 2019-06-05T03:38:14Z

Focusing on Article, though the same issues are present in Feed:

Right now the Article model (and corresponding SQL table) contains several categories of fields:

Server-set fields

Field	Size¹	Notes
`guid`	small	immutable after creation
`author`	small	there should really be a `raw_author`
`url`	small²	there is currently no limit on URL size
`date`	tiny	fixed-size
`raw_title`	small²
`raw_content`	large²
`title`	small²
`content`	large²
`content_snippet`	small	up to 500 characters
`content_rev`	tiny

¹ This column represents the typical size. Variable-size fields set by the feed generally accept arbitrary values and are marked ².

² This field is variable size. The size is controlled by the source site.

Client-set fields

Two boolean fields may be set by the user:

Field	Size
`read`	tiny
`fave`	tiny

Problems

Mixed writers

Some field values are taken directly from the feed. Some are post-processed versions of values from the feed. Some are set by the user. In the Feed model (which has all the same issues) this has caused bugs like Bulk marking articles read can produce negative unread count #323. It would be better to model the server-side stuff in a more formal way i.e. fetch feed → postprocess → available for read.
Write amplification of client operations.

Tables are stored in the SQLite record format in b-tree pages. When a page changes, its revised content is written to a WAL frame. Because the read and fave fields are part of the same page as large fields like content, bulk mutation of articles is likely to cause many page rewrites.
Possible read amplification.

SQLite stores the columns of a row together in a page. Large fields which are not needed for a query are still read from disk. The "raw" fields are located before the sanitized content on disk, and may be large enough to force SQLite to chase to a continuation page.

The text was updated successfully, but these errors were encountered:

twm · 2024-07-21T05:57:30Z

Given the altered deployment environment since filing this, I think that problem 1 is the only one worth addressing here, and largely to enable features like #1193.

twm added the enhancement label Jun 5, 2019

twm mentioned this issue Jun 22, 2024

Deduplicate articles #1193

Open

twm added this to Yarrharr fetch pipeline Jun 23, 2024

twm moved this to To do in Yarrharr fetch pipeline Jun 23, 2024

twm changed the title ~~Split Feed and Article models~~ Split Article models Jul 20, 2024

twm changed the title ~~Split Article models~~ Split Article model Jul 20, 2024

twm mentioned this issue Jul 20, 2024

Split Feed model #1207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Article model #415

Split Article model #415

twm commented Jun 5, 2019 •

edited

Loading

twm commented Jul 21, 2024

Split Article model #415

Split Article model #415

Comments

twm commented Jun 5, 2019 • edited Loading

Server-set fields

Client-set fields

Problems

twm commented Jul 21, 2024

twm commented Jun 5, 2019 •

edited

Loading