You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Focusing on Article, though the same issues are present in Feed:
Right now the Article model (and corresponding SQL table) contains several categories of fields:
Server-set fields
Field
Size¹
Notes
guid
small
immutable after creation
author
small
there should really be a raw_author
url
small²
there is currently no limit on URL size
date
tiny
fixed-size
raw_title
small²
raw_content
large²
title
small²
content
large²
content_snippet
small
up to 500 characters
content_rev
tiny
¹ This column represents the typical size. Variable-size fields set by the feed generally accept arbitrary values and are marked ².
² This field is variable size. The size is controlled by the source site.
Client-set fields
Two boolean fields may be set by the user:
Field
Size
read
tiny
fave
tiny
Problems
Mixed writers
Some field values are taken directly from the feed. Some are post-processed versions of values from the feed. Some are set by the user. In the Feed model (which has all the same issues) this has caused bugs like Bulk marking articles read can produce negative unread count #323. It would be better to model the server-side stuff in a more formal way i.e. fetch feed → postprocess → available for read.
Write amplification of client operations.
Tables are stored in the SQLite record format in b-tree pages. When a page changes, its revised content is written to a WAL frame. Because the read and fave fields are part of the same page as large fields like content, bulk mutation of articles is likely to cause many page rewrites.
Possible read amplification.
SQLite stores the columns of a row together in a page. Large fields which are not needed for a query are still read from disk. The "raw" fields are located before the sanitized content on disk, and may be large enough to force SQLite to chase to a continuation page.
The text was updated successfully, but these errors were encountered:
Given the altered deployment environment since filing this, I think that problem 1 is the only one worth addressing here, and largely to enable features like #1193.
Focusing on Article, though the same issues are present in Feed:
Right now the Article model (and corresponding SQL table) contains several categories of fields:
Server-set fields
guid
author
raw_author
url
date
raw_title
raw_content
title
content
content_snippet
content_rev
¹ This column represents the typical size. Variable-size fields set by the feed generally accept arbitrary values and are marked ².
² This field is variable size. The size is controlled by the source site.
Client-set fields
Two boolean fields may be set by the user:
read
fave
Problems
Mixed writers
Some field values are taken directly from the feed. Some are post-processed versions of values from the feed. Some are set by the user. In the Feed model (which has all the same issues) this has caused bugs like Bulk marking articles read can produce negative unread count #323. It would be better to model the server-side stuff in a more formal way i.e. fetch feed → postprocess → available for read.
Write amplification of client operations.
Tables are stored in the SQLite record format in b-tree pages. When a page changes, its revised content is written to a WAL frame. Because the
read
andfave
fields are part of the same page as large fields likecontent
, bulk mutation of articles is likely to cause many page rewrites.Possible read amplification.
SQLite stores the columns of a row together in a page. Large fields which are not needed for a query are still read from disk. The "raw" fields are located before the sanitized content on disk, and may be large enough to force SQLite to chase to a continuation page.
The text was updated successfully, but these errors were encountered: