Skip to content

Commit 1dbace8

Browse files
committed
modify elasticsearch: dot in names
1 parent 77f1442 commit 1dbace8

File tree

3 files changed

+129
-2
lines changed

3 files changed

+129
-2
lines changed

_posts/2022-04-20-es-basic.md

+40-1
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,46 @@ object类型用于存储有层级的数据,**本质上它是把嵌套的field
215215
嵌套数据使用nested类型。**它和object类型最大的区别是在用于array时:object无法维护field之间的对应关系,但是nested可以**
216216
- https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html#nested-arrays-flattening-objects
217217

218-
TODO:**关于object、nested、join,会在存储关系型数据里做更详细的阐述。**
218+
关于object、nested、join,会在[Elasticsearch:关系型文档]({% post_url 2022-05-03-es-relations %})里做更详细的阐述。
219+
220+
#### 带dot的名字
221+
[以前](https://stackoverflow.com/a/67278093/7676237),es可以配置是否把点分的名字改成object形式,**从es5开始,[点分的名字](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html)一定会被转换为object形式**
222+
223+
比如`"server.latency.max": 100`,生成的mapping为:
224+
```json
225+
{
226+
"properties": {
227+
"server": {
228+
"type": "object",
229+
"properties": {
230+
"latency": {
231+
"type": "object",
232+
"properties": {
233+
"max": {
234+
"type": "long"
235+
}
236+
}
237+
}
238+
}
239+
}
240+
}
241+
}
242+
```
243+
244+
但是有一种情况会失败:
245+
```json
246+
{
247+
"properties": {
248+
"server.latency": {
249+
"type": "long"
250+
},
251+
"server": {
252+
"type": "string"
253+
}
254+
}
255+
}
256+
```
257+
**这种情况下,server既是object(`server.latency`),又是string(`server`),是不可能的。**
219258

220259
### date
221260
- https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html

_posts/2022-05-03-es-relations.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,13 @@ es对关系型数据的处理方式:
3232
## object - flatten
3333
[object](https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html)是层级数据最简单的组织方式:flatten,而非我们平常理解的嵌套。
3434

35+
### dot in names
36+
首先,所有的[带点的名字](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html)都会被转换成object field(除了最后一个),参考[Elasticsearch:basic]({% post_url 2022-04-20-es-basic %})。
37+
38+
或者看[这个回答](https://stackoverflow.com/a/72095595/7676237)
39+
3540
### flatten
36-
包含一个object就是在定义mapping的时候出现了properties的嵌套。**但是这个嵌套的属性并非我们理解的那种嵌套,在es里object实际是被flatten为每个属性的全路径名,并使用点分隔,存储为独立字段**。比如:
41+
包含一个object就是在定义mapping的时候出现了properties的嵌套。**object的属性并非我们理解的那种嵌套,在es里object实际是被flatten为每个属性的全路径名,并使用点分隔,存储为独立字段**。比如:
3742
```json
3843
PUT my-index-000001/_doc/1
3944
{

_posts/2022-08-27-es-pipeline.md

+83
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
[toc]
2+
13
---
24
layout: post
35
title: "Elasticsearch:pipeline"
@@ -167,6 +169,86 @@ POST /_ingest/pipeline/_simulate?verbose
167169
```
168170
对于pikachu,remove执行成功,set则跳过了,因为不满足if前置条件;对于raichu,remove执行成功,set也执行成功。其中,每一个processor执行过后的数据状态也一一进行了展示。
169171

172+
再看一个**remove object field**的例子——
173+
174+
要删除hello下的similarity,所以hello应该是一个object:
175+
```json
176+
PUT _ingest/pipeline/remove_similarity
177+
{
178+
"description": "remove hello.similarity",
179+
"processors": [
180+
{
181+
"remove": {
182+
"field": "hello.similarity",
183+
"ignore_missing": true
184+
}
185+
}
186+
]
187+
}
188+
```
189+
我们预期pika和pikachu都会被删掉:
190+
```json
191+
POST /_ingest/pipeline/_simulate?verbose
192+
{
193+
"pipeline": {
194+
"processors": [
195+
{
196+
"pipeline": {
197+
"name": "remove_similarity"
198+
}
199+
}
200+
]
201+
},
202+
"docs": [
203+
{
204+
"_source": {
205+
"hello.similarity": "pika",
206+
"hello": {
207+
"similarity": "pikachu"
208+
},
209+
"age": 12,
210+
"height": 171
211+
}
212+
}
213+
]
214+
}
215+
```
216+
结果只有pikachu被删掉了:
217+
```json
218+
{
219+
"docs": [
220+
{
221+
"processor_results": [
222+
{
223+
"processor_type": "pipeline",
224+
"status": "success"
225+
},
226+
{
227+
"processor_type": "remove",
228+
"status": "success",
229+
"doc": {
230+
"_index": "_index",
231+
"_version": "-3",
232+
"_id": "_id",
233+
"_source": {
234+
"hello.similarity": "pika",
235+
"hello": {},
236+
"age": 12,
237+
"height": 171
238+
},
239+
"_ingest": {
240+
"pipeline": "remove_similarity",
241+
"timestamp": "2024-05-21T03:45:36.742463115Z"
242+
}
243+
}
244+
}
245+
]
246+
}
247+
]
248+
}
249+
```
250+
可能对于simulate api来说,没有把`hello.similarity`这个field转成object,直接把给出的doc当做最终的doc来处理了。
251+
170252
# processor
171253
processor才是pipeline的灵魂!
172254

@@ -416,3 +498,4 @@ PUT _ingest/pipeline/branding
416498
# 性能分析
417499
还能统计pipeline的使用频率和时间消耗,强啊:
418500
- https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html#get-pipeline-usage-stats
501+

0 commit comments

Comments
 (0)