Skip to content

Commit 427d701

Browse files
authored
Tweak performance storage guide (#278)
Changed some wording and swapped text table with its DDL query
1 parent 8946a4b commit 427d701

File tree

1 file changed

+37
-21
lines changed

1 file changed

+37
-21
lines changed

docs/performance/storage.md

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,28 +8,44 @@ the query, the engine uses the most efficient store.
88
This is one of the many features that makes CrateDB very fast when reading
99
and aggregating data, but it has an impact on storage size.
1010

11-
We are going to
12-
use [Yellow taxi trip - January 2024](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
13-
which has 2_964_624 rows.
11+
We are going to use [Yellow taxi trip - January 2024](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page) which has 2_964_624 rows.
1412

15-
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee |
16-
|----------|----------------------|-----------------------|-----------------|---------------|------------|--------------------|--------------|--------------|--------------|-------------|-------|---------|------------|--------------|-----------------------|--------------|----------------------|-------------|
17-
| 2 | 1704073016000 | 1704074392000 | 4 | 6.88 | 1 | "N" | 170 | 231 | 1 | 32.4 | 1 | 0.5 | 7.48 | 0 | 1 | 44.88 | 2.5 | 0 |
18-
| 1 | 1704071008000 | 1704072649000 | 0 | 4.1 | 1 | "N" | 148 | 233 | 2 | 22.6 | 3.5 | 0.5 | 0 | 0 | 1 | 27.6 | 2.5 | 0 |
19-
| 1 | 1704071126000 | 1704071510000 | 2 | 1 | 1 | "N" | 140 | 141 | 1 | 7.9 | 3.5 | 0.5 | 2.55 | 0 | 1 | 15.45 | 2.5 | 0 |
20-
| 2 | 1704072696000 | 1704073070000 | 1 | 1.03 | 1 | "N" | 262 | 75 | 1 | 8.6 | 1 | 0.5 | 2.72 | 0 | 1 | 16.32 | 2.5 | 0 |
21-
| 2 | 1704074134000 | 1704074399000 | 1 | 1.08 | 1 | "N" | 249 | 68 | 1 | 7.2 | 1 | 0.5 | 2.44 | 0 | 1 | 14.64 | 2.5 | 0 |
13+
This is the schema:
2214

23-
The taxi dataset takes:
15+
```sql
16+
CREATE TABLE IF NOT EXISTS "doc"."taxi" (
17+
"VendorID" BIGINT,
18+
"tpep_pickup_datetime" TIMESTAMP WITHOUT TIME ZONE,
19+
"tpep_dropoff_datetime" TIMESTAMP WITHOUT TIME ZONE,
20+
"passenger_count" BIGINT,
21+
"trip_distance" REAL,
22+
"RatecodeID" BIGINT,
23+
"store_and_fwd_flag" TEXT,
24+
"PULocationID" BIGINT,
25+
"DOLocationID" BIGINT,
26+
"payment_type" BIGINT,
27+
"fare_amount" REAL,
28+
"extra" REAL,
29+
"mta_tax" REAL,
30+
"tip_amount" REAL,
31+
"tolls_amount" REAL,
32+
"improvement_surcharge" REAL,
33+
"total_amount" REAL,
34+
"congestion_surcharge" REAL,
35+
"Airport_fee" REAL
36+
)
37+
```
38+
39+
It takes:
2440

25-
- ~48MiB in Parquet (very optimized for storage)
26-
- ~342MiB in CSV
27-
- ~1.2GiB in JSON
28-
- ~510MiB in PostgreSQL 16.1 (Debian 16.1-1.pgdg120+1)
29-
- ~775MiB in CrateDB 5.9.3 (3 nodes, default settings)
30-
- ~431MiB in CrateDB 5.10.9 (3 nodes, default settings)
41+
- ~`48MiB` in Parquet
42+
- ~`342MiB` in CSV
43+
- ~`1.2GiB` in JSON
44+
- ~`510MiB` in PostgreSQL 16.1 (Debian 16.1-1.pgdg120+1)
45+
- ~`775MiB` in CrateDB 5.9.3 (3 nodes, default settings)
3146

32-
We will dive deeper to really understand what is going on.
47+
At first sight, it might look that in CrateDB data takes more space than in PostgreSQL,
48+
but we need to dive deeper to really understand what is going on, the reality is the opposite.
3349

3450
:::{note}
3551
In version 5.10 storage usage was improved, some users report up to 70% of storage reduction,
@@ -156,10 +172,10 @@ CREATE TABLE taxi_noindex
156172
)
157173
```
158174

159-
The index can only be disabled when the table is created, if the table already exists,
160-
it will have to be re-created.
175+
The index can only be disabled when the table is created, if the table already exists and it cannot
176+
be deleted, it will have to be re-created.
161177

162-
One of the ways of re-creating a table is by `renaming`, for example:
178+
One of the ways of re-creating a table is by renaming it, for example:
163179

164180
1. Rename table `taxi` (with INDEX) to `taxi_deleteme` with:
165181

0 commit comments

Comments
 (0)