Skip to content

Commit 9f83314

Browse files
authored
Update amazon-reviews.md
1 parent 3d351fc commit 9f83314

File tree

1 file changed

+27
-19
lines changed

1 file changed

+27
-19
lines changed

docs/getting-started/example-datasets/amazon-reviews.md

+27-19
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ title: 'Amazon Customer Review'
88
This dataset contains over 150M customer reviews of Amazon products. The data is in snappy-compressed Parquet files in AWS S3 that total 49GB in size (compressed). Let's walk through the steps to insert it into ClickHouse.
99

1010
:::note
11-
The queries below were executed on a **Production** instance of [ClickHouse Cloud](https://clickhouse.cloud).
11+
The queries below were executed on a **Production** instance of ClickHouse Cloud. For more information see
12+
["Playground specifications"](/getting-started/playground#specifications).
1213
:::
1314

1415
## Loading the dataset {#loading-the-dataset}
@@ -86,21 +87,26 @@ CREATE DATABASE amazon
8687

8788
CREATE TABLE amazon.amazon_reviews
8889
(
89-
review_date Date,
90-
marketplace LowCardinality(String),
91-
customer_id UInt64,
92-
review_id String,
93-
product_id String,
94-
product_parent UInt64,
95-
product_title String,
96-
product_category LowCardinality(String),
97-
star_rating UInt8,
98-
helpful_votes UInt32,
99-
total_votes UInt32,
100-
vine Bool,
101-
verified_purchase Bool,
102-
review_headline String,
103-
review_body String
90+
`review_date` Date,
91+
`marketplace` LowCardinality(String),
92+
`customer_id` UInt64,
93+
`review_id` String,
94+
`product_id` String,
95+
`product_parent` UInt64,
96+
`product_title` String,
97+
`product_category` LowCardinality(String),
98+
`star_rating` UInt8,
99+
`helpful_votes` UInt32,
100+
`total_votes` UInt32,
101+
`vine` Bool,
102+
`verified_purchase` Bool,
103+
`review_headline` String,
104+
`review_body` String,
105+
PROJECTION helpful_votes
106+
(
107+
SELECT *
108+
ORDER BY helpful_votes
109+
)
104110
)
105111
ENGINE = MergeTree
106112
ORDER BY (review_date, product_category)
@@ -146,7 +152,7 @@ The original data was about 70G, but compressed in ClickHouse it takes up about
146152

147153
## Example queries {#example-queries}
148154

149-
7. Let's run some queries...here are the top 10 most-helpful reviews in the dataset:
155+
7. Let's run some queries. Here are the top 10 most-helpful reviews in the dataset:
150156

151157
```sql runnable
152158
SELECT
@@ -157,7 +163,9 @@ ORDER BY helpful_votes DESC
157163
LIMIT 10
158164
```
159165

160-
Notice the query has to process all 151M rows in less than a second!
166+
:::note
167+
This query is using a projection to speed up performance.
168+
:::
161169

162170
8. Here are the top 10 products in Amazon with the most reviews:
163171

@@ -214,7 +222,7 @@ ORDER BY count DESC
214222
LIMIT 50;
215223
```
216224

217-
The query only takes 4 seconds - which is impressive - and the results are a fun read:
225+
Notice the query time for such a large amount of data. The results are also a fun read!
218226

219227
12. We can run the same query again, except this time we search for **awesome** in the reviews:
220228

0 commit comments

Comments
 (0)