modify es read write

puppylpg · puppylpg · commit 239682d63861 · 2024-03-22T17:28:51.000+08:00
diff --git a/_posts/2022-01-23-innodb-buffer-pool.md b/_posts/2022-01-23-innodb-buffer-pool.md
@@ -1,3 +1,5 @@
+[toc]
+
 ---
 layout: post
 title: "Innodb - Buffer Pool"
@@ -230,7 +232,7 @@ MySQL有个关于[优化innodb使用磁盘io](https://dev.mysql.com/doc/refman/8
 - **对于提供事务支持的数据库，在事务提交时，都要确保事务日志（包含该事务所有的修改操作以及一个提交记录）完全写到硬盘上，才认定事务提交成功并返回给应用层**；
 - **对于需要保证事务的持久化（durability）和一致性（consistency）的数据库程序来说，write()所提供的“松散的异步语义”是不够的，通常需要操作系统提供的同步IO（synchronized-IO）原语来保证（`fsync`）**；
 
-> **但是es就不这样：只要胆子大，redo log也敢不fsync**！[Elasticsearch：分片读写]({% post_url 2022-05-05-es-deep-dive %})
+> **但是es就不这样：只要胆子大，redo log也敢不fsync**！[Elasticsearch：分片读写]({% post_url 2022-05-05-es-read-write %})
 
 # disk buffer
 还有一个叫disk buffer的东西，和disk cache听起来很像：
@@ -248,3 +250,4 @@ page cache和disk buffer虽然理念几乎一致，但是两个完全不同的
 所以Wikipedia特意注明page cache(disk cache)和disk buffer是两种东西，不要混淆：
 - https://en.wikipedia.org/wiki/Disk_cache
 
+
diff --git a/_posts/2022-04-19-es-summary.md b/_posts/2022-04-19-es-summary.md
@@ -17,7 +17,7 @@ tags: elasticsearch
 - 聚合：[Elasticsearch：aggregation]({% post_url 2022-09-04-es-agg %})；
 - reindex和task：[Elasticsearch：alias、reindex、task]({% post_url 2022-05-02-es-reindex-task %})；
 - es对关系型数据的支持，同时也介绍了全局序数：[Elasticsearch：关系型文档]({% post_url 2022-05-03-es-relations %})；
-- es底层的分片、查询、数据提交：[Elasticsearch：内部原理]({% post_url 2022-05-05-es-deep-dive %})；
+- es底层的分片、查询、数据提交：[Elasticsearch：分片读写]({% post_url 2022-05-05-es-read-write %})；
 - 调优、jvm内存、ssd、分页：[Elasticsearch：performance]({% post_url 2022-05-08-es-performance %})；
 - 配置集群，集群部署：[Elasticsearch：配置部署]({% post_url 2022-05-09-es-config-deploy %})；
 - index default template：[Elasticsearch：default index template]({% post_url 2022-05-05-es-template %})；
diff --git a/_posts/2022-04-22-es-search.md b/_posts/2022-04-22-es-search.md
@@ -1,3 +1,5 @@
+[toc]
+
 ---
 layout: post
 title: "Elasticsearch：search"
@@ -18,7 +20,7 @@ es有两种搜索方式：
 
 search api是最常用的搜索，功能很丰富，但它是 **从磁盘搜索**，所以新写入的数据必须在refresh后才能被搜索到，**仅仅在buffer pool里是不能被搜索的**。
 
-> 关于refresh和buffer pool：[Elasticsearch：deep dive]({% post_url 2022-05-05-es-deep-dive %})
+> 关于refresh和buffer pool：[Elasticsearch：分片读写]({% post_url 2022-05-05-es-read-write %})
 
 - https://www.elastic.co/guide/cn/elasticsearch/guide/current/making-text-searchable.html
 > 早期的全文检索会为整个文档集合建立一个很大的倒排索引并将其 **写入到磁盘**。 一旦新的索引就绪，旧的就会被其替换，这样最近的变化便可以被检索到。
@@ -980,3 +982,4 @@ mega <em>pikachu</em> z"""
 highlight默认高亮的是search query搜索的内容，不过elasticsearch还支持传入自定义的`highlight_query`，它可以和search query无关，它所搜索的内容就是最终所高亮的内容。参考[这个场景](https://github.com/spring-projects/spring-data-elasticsearch/issues/2636)。
 
 > `highlight_query`用的应该不多。
+
diff --git a/_posts/2022-05-03-es-relations.md b/_posts/2022-05-03-es-relations.md
@@ -1,3 +1,5 @@
+[toc]
+
 ---
 layout: post
 title: "Elasticsearch：关系型文档"
@@ -152,14 +154,19 @@ GET my-index-000001/_search
 
 ### 存储于同一segment
 
-> 什么是segment：[Elasticsearch：分片读写]({% post_url 2022-05-05-es-deep-dive %})
+> 什么是segment：[Elasticsearch：分片读写]({% post_url 2022-05-05-es-read-write %})
 
 **nested文档在逻辑上，依然是一条嵌套了子文档的大文档。但是实际存储的时候，nested文档在物理上产生了n个子文档和1个父文档，并把他们存放在同一个segment上**：
 - 同一个segment：https://discuss.elastic.co/t/index-nested-documents-separately/11748/3
 - https://discuss.elastic.co/t/whats-nested-documents-layout-inside-the-lucene/59944
 - https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
 - https://www.elastic.co/guide/cn/elasticsearch/guide/current/nested-objects.html
 - 挨着的排序方式：https://stackoverflow.com/a/54023434/7676237
+- https://www.elastic.co/guide/en/elasticsearch/reference/6.8/nested.html#_limits_on_nested_mappings_and_objects
+
+> 正因为如此，**索引的document count并不是真实的count，而是文档+嵌套文档的总数。通过`_count` API查到的才是索引里的父文档总数**。
+>
+> 参考[How come my elasticsearch doc count is greater than number of items?](https://github.com/Smile-SA/elasticsuite/issues/1729#issuecomment-592522434)
 
 **为什么要放在同一个segment上？快！查询的时候，一和多都在一起，就可以快速把他们检索出来，并做join操作了**。
 
@@ -432,7 +439,7 @@ GET user-blogs-nested/_search
 ### 缺点
 因为segment是只读的，需要更新文档时只能在新的segment里创建文档。又因为nested文档必须把父文档和子文档都存放在同一个segment，**所以更新任何一个子文档或者父文档，就意味着重新索引整个文档到新的segment。所以nested文档不适合子文档频繁更新的情况**。
 
-> 部分更新nested数据需要用到脚本，参考[这篇文章](https://iridakos.com/programming/2019/05/02/add-update-delete-elasticsearch-nested-objects)。
+> 部分更新nested数据需要用到脚本，参考这篇文章
 
 ## parent child - 存放于同一shard
 **nested文档使用隐式的独立文档存储子文档，parent child则使用一个显式的单独的field指明父子关系，用来关联父子文档**。
@@ -1187,3 +1194,4 @@ denormalizing的缺点在于数据会有多份，不好维护。所以更适合
 1. 因地制宜：不同的系统因为不同的设计用途，带来了不同的特性。而在支持理念上比较相似的功能的时候，不同的特性往往导致大家的实现千差万别，但其中很重要的就是：按照自己的特性，因地制宜设计和实现功能。在分布式数据库里，join如果引入网络开销必然是十分耗时的，而es作为一个快速搜索数据库又不能允许这种非常慢的搜索，那怎么办？**那就不要让有关系的数据跨节点**！所以es提出的nested（同一segment）和parent join（同一shard），都是基于这个前提的。
 
 做软件要牢记自己的初衷，也要能想清楚自己的特性，才能提出适合自己的做法，做出有个性的东西。
+
diff --git a/_posts/2022-05-05-es-read-write.md b/_posts/2022-05-05-es-read-write.md
@@ -1,3 +1,5 @@
+[toc]
+
 ---
 layout: post
 title: "Elasticsearch：分片读写"
@@ -93,6 +95,8 @@ es的查询是按照得分给结果排序的。如果返回top10，有两个mast
 - shard：一个Lucene索引，多个segment和一个commit point组成；
 - segment：一个倒排索引；
 
+> commit point记录了当前被flush到磁盘上的segment，这些segment已经被持久化了。translog记录了还没有被持久化到磁盘上的数据。所以“shard还包含commit point”的意思是不是说：持久化的segment + translog里没持久化的数据（包括memory里的数据和未持久化的segment） = 整个index？不论如何，这么理解肯定是可以的。
+
 Ref：
 - https://stackoverflow.com/a/15429578/7676237
 
@@ -120,14 +124,15 @@ es是基于Lucene，Lucene就是按段搜索的。一个Lucene索引包含：
 
 ### 创建段 - 增加数据
 段是写在磁盘上的，但一开始为了速度，是写在内存里的：
-1. 一个新的segment（一个新的倒排索引）先写入内存里的buffer（in memory buffer）；
-2. 段提交commit：
-    1. segment写入磁盘；
-    2. commit point写入磁盘；
-    3. fsync后，才算真正地物理写入完毕；
-3. **新的段被打开，变成可搜索状态（实际上fsync之前，已经算写入了，已经是可搜索了）**；
+1. 数据先写入内存里的buffer（in memory buffer）；
+2. 段提交refresh：
+    1. segment（一个新的倒排索引）写入磁盘（指page cache。注意：fsync后，才算真正地物理写入完毕）；
+3. **新的段被打开，变成可搜索状态（searchable but uncommitted）**；
 4. in memory buffer清空；
 
+[searchable but uncommitted](https://discuss.elastic.co/t/question-about-segment-described-in-elasticsearch-the-definitive-guide/35980/6?u=puppylpg)：如果系统崩了，这个段会丢失，但可以通过translog做redo操作。
+> On a re-refresh the IndexReader used to read the segments is updated to include the new uncommitted segment (this is why the segment is referred to in the documentation as searchable but uncommitted). If the node was to fail and restart that segment would be lost since it is uncommitted, but the data from that segment would be replayed from the translog as part of the startup sequence so the data was not lost from Elasticsearch entirely.
+
 ### 逐段搜索 per-segment search
 搜索请求实际是逐段搜索的：
 1. 搜索某个index，会把query发给所有的分片（对于同一个分片，primary和replica是一样的）；
@@ -139,11 +144,9 @@ es是基于Lucene，Lucene就是按段搜索的。一个Lucene索引包含：
 
 ### 删改数据
 段不可变，那删改数据怎么办？
-- 删：**使用的是标记删除，每个commit point都包含一个`.del`文件，记录着哪个段里的哪个文档被删了**；
+- 删：**使用的是标记删除，[`.del`文件和segment一样不断生成](https://discuss.elastic.co/t/question-about-segment-described-in-elasticsearch-the-definitive-guide/35980/9?u=puppylpg)，记录着哪个段里的哪个文档被删了**；
 - 改：先删后加。在旧segment标记删除 + 在新segment添加；
 
-> commit point相当于记录着所有已打开的segment的metadata文件，所以也会记录哪个segment的哪个文档被删了。
-
 ## `refresh`：不写回磁盘
 创建segment的时候，理论上完全写入磁盘（fsync）才算写入完毕。**但是使用fsync完全写回磁盘太慢了。同时为了让segment能更快被打开、被搜索，Lucene调用sync把segment写入了page cache，此时segment就可以被打开被搜索。相当于把page cache当做innodb的buffer pool使用了。**
 
@@ -168,7 +171,7 @@ es是基于Lucene，Lucene就是按段搜索的。一个Lucene索引包含：
 - https://www.elastic.co/guide/cn/elasticsearch/guide/current/near-real-time.html
 - https://www.elastic.co/guide/en/elasticsearch/guide/current/near-real-time.html
 
-**默认情况下，refresh的频率是1s一次，所以1s后新的倒排索引才是可见的。因此es被称为近实时搜索（near real-time search）**。
+**默认情况下，refresh的频率是1s一次，所以1s后新的倒排索引才是可见的。因此es被称为[近实时搜索（near real-time search）](https://www.elastic.co/guide/cn/elasticsearch/guide/current/near-real-time.html)**。
 
 因此，刚索引进es的数据无法被立刻搜索到。但如果每索引一条数据就手动调一次refresh api，在索引量比较大的时候，对性能影响非常大。
 
@@ -177,7 +180,9 @@ es是基于Lucene，Lucene就是按段搜索的。一个Lucene索引包含：
 可以修改全局[`refresh_interval`](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-refresh-interval-setting)设置，也可以修改单个索引的`refresh_interval`：
 ```json
 PUT /<index>/_settings
-{ "refresh_interval": "1s" }
+{
+  "refresh_interval": "1s"
+}
 ```
 
 ## `flush`：translog - es的redo log
@@ -223,10 +228,10 @@ translog的时机和redo log也一样，在每次写请求完成之后执行(e.g
 - https://www.elastic.co/guide/cn/elasticsearch/guide/current/translog.html
 
 ### refresh vs. flush
-refresh和flush是两个概念：
-- **数据写入buffer pool，就写入translog了，证明它不会丢了，只是此时还不可见**；
-- **refresh操作的是buffer pool**: buffer pool -> page cache。**数据可见了**，会清空in memory buffer，但和translog无关;
-- **flush操作的是translog**: fsync to disk，此时会清空translog；
+所以refresh和flush是两个概念：
+- **数据写入elasticsearch的内存（叫buffer pool也行，只不过不是MySQL的那个buffer pool，这个buffer pool里的数据不可见，因为elasticsearch是文件型db，直接从文件（实际是page cache）查询；MySQL是加载到memory里的buffer pool再查的），同时也要写入translog，之后client返回200，数据就不会丢了，只是此时还不可见**；
+- **refresh操作的是buffer pool**: `sync` to page cache。**数据可见了**，会清空in memory buffer，但和translog无关;
+- **flush操作的是translog**: `fsync` to disk，此时会清空translog；
 
 所以es refresh之后，文档就可见了，可搜索。es flush之后，数据持久化了，translog可以删掉了。
 
@@ -254,7 +259,7 @@ refresh和flush是两个概念：
 分清楚三个interval：
 1. refresh interval：1s，是segment可用的时间。**开的越大，数据可见速度越慢**；
 2. translog sync interval：5s，是胆子大的情况下，允许写操作不fsync到translog的时间。**开的越大，es崩了之后丢的数据越多**；
-3. flush interval：30min，是数据从translog持久化到磁盘的时间。**开的越大，es崩了之后恢复数据的时间越长**；
+3. flush interval：30min，是数据从translog持久化到磁盘的时间。**开的越大，es崩了之后恢复数据的时间越长**，当然前提是translog足够大，不然快满的时候也会提前触发flush；
 
 Ref：
 - translog设置：https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html#_translog_settings
@@ -282,3 +287,4 @@ api：
 但是正常情况下，es自己做段合并就够了。
 
 
+
diff --git a/_posts/2022-05-08-es-performance.md b/_posts/2022-05-08-es-performance.md
@@ -1,3 +1,5 @@
+[toc]
+
 ---
 layout: post
 title: "Elasticsearch：performance"
@@ -17,7 +19,7 @@ tags: elasticsearch
 es的可调之处有很多：
 1. 段合并相关：https://www.elastic.co/guide/cn/elasticsearch/guide/current/indexing-performance.html#segments-and-merging
 2. 分片数量设置：https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html
-3. translog大小/是否异步、refresh_interval：[Elasticsearch：deep dive]({% post_url 2022-05-05-es-deep-dive %})；
+3. translog大小/是否异步、refresh_interval：[Elasticsearch：分片读写]({% post_url 2022-05-05-es-read-write %})；
 4. 全局序数要不要预加载：[Elasticsearch：关系型文档]({% post_url 2022-05-03-es-relations %})；
 4. 存储限流：merge如果太猛，会拖慢index和query的速度。`indices.store.throttle.max_bytes_per_se`；
 5. 缓存大小：主要是filter查询的缓存，`indices.cache.filter.size`；
@@ -212,7 +214,7 @@ scroll api会在查询的“那一刻”，生成一个scroll id，这个scroll
 这也就意味着，在scroll结束之前，这些context是会一直保留下来的：
 - https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-context
 
-elasticsearch后台每次refresh会生成一个segment，还会有后台任务进行段合并的工作（[Elasticsearch：deep dive]({% post_url 2022-05-05-es-deep-dive %})），“保留context”就意味着：
+elasticsearch后台每次refresh会生成一个segment，还会有后台任务进行段合并的工作（[Elasticsearch：分片读写]({% post_url 2022-05-05-es-read-write %})），“保留context”就意味着：
 1. **占用磁盘、文件描述符**：这些合并后的segment不能被删掉，还要继续被保留下去，直到scroll id被删掉；
 2. **占用内存（heap space）**：每个段里的doc并不都是有效的，那些被delete或者update的doc会被记录在`.del`文件里，保留这些会占用额外的内存：https://www.elastic.co/guide/cn/elasticsearch/guide/current/dynamic-indices.html#deletes-and-updates
 
diff --git a/_posts/2022-11-11-es-traverse-index.md b/_posts/2022-11-11-es-traverse-index.md
diff --git a/_posts/2023-02-09-es-rehash.md b/_posts/2023-02-09-es-rehash.md