Skip to content

Segment和binlog的关系如何? #39032

Closed Answered by yhmo
ivivi asked this question in Q&A and General discussion
Jan 7, 2025 · 1 comments · 3 replies
Discussion options

You must be logged in to vote

一般用户没必要了解这么仔细,解释起来比较麻烦。

当一个binlog数据量凑齐时,就会把这个binlog往s3里面写。当segment数据量达到一百多兆后,seal的意思是不会再有数据往这个segment里写,此时,其实那一百多兆数据已经全部写入s3。客户端调用的flush其实意思是把当前所有growing segment转为sealed,并且把所有growing segment的当前没落盘的binlog写入s3。

假设某个growing segment写入了3个binlog文件,第4个binlog还没到足够落盘的大小,此时,如果milvus崩了,那么,重启之后,datanode需要去从kafka/pulsa拉取第4个binglog的数据,然后等待达到落盘标准

索引的粒度在segment级别,只有sealed segment才能构建索引,index node把该segemnt的所有binlog全部读进内存,然后构建索引。
一旦某个segment的索引构建完成,querynode会得到通知,去s3上读取该segment的索引文件,把索引数据读进内存。

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@ivivi
Comment options

@yhmo
Comment options

yhmo Jan 7, 2025
Collaborator

@ivivi
Comment options

Answer selected by ivivi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants