Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: make pchannel level flusher #39275

Merged

Conversation

chyezh
Copy link
Contributor

@chyezh chyezh commented Jan 15, 2025

issue: #38399

  • Add a pchannel level checkpoint for flush processing
  • Refactor the recovery of flushers of wal
  • make a shared wal scanner first, then make multi datasyncservice on it

@sre-ci-robot sre-ci-robot added area/internal-api area/test sig/testing test/integration integration test size/XXL Denotes a PR that changes 1000+ lines. labels Jan 15, 2025
@mergify mergify bot added the dco-passed DCO check passed. label Jan 15, 2025
Copy link
Contributor

mergify bot commented Jan 15, 2025

@chyezh

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

  1. Title Format: The PR title must begin with one of these prefixes:
  • feat: for introducing a new feature.
  • fix: for bug fixes.
  • enhance: for improvements to existing functionality.
  • test: for add tests to existing functionality.
  • doc: for modifying documentation.
  • auto: for the pull request from bot.
  1. Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly 

Please review and update your PR to comply with these guidelines.

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 79.68338% with 154 lines in your changes missing coverage. Please review.

Project coverage is 80.16%. Comparing base (2d9bef4) to head (b172067).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
...mingnode/server/flusher/flusherimpl/wal_flusher.go 67.56% 37 Missing and 11 partials ⚠️
...e/server/flusher/flusherimpl/flusher_components.go 74.37% 31 Missing and 10 partials ⚠️
...l/streamingnode/server/flusher/flusherimpl/util.go 68.49% 18 Missing and 5 partials ⚠️
.../server/flusher/flusherimpl/pchannel_checkpoint.go 85.84% 12 Missing and 3 partials ⚠️
internal/metastore/kv/streamingnode/kv_catalog.go 52.17% 10 Missing and 1 partial ⚠️
internal/flushcommon/pipeline/data_sync_service.go 66.66% 2 Missing and 2 partials ⚠️
internal/rootcoord/meta_table.go 0.00% 4 Missing ⚠️
...server/flusher/flusherimpl/data_service_wrapper.go 91.30% 2 Missing ⚠️
pkg/streaming/util/message/message_id.go 60.00% 1 Missing and 1 partial ⚠️
pkg/streaming/util/message/test_case.go 97.05% 1 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #39275       +/-   ##
===========================================
+ Coverage   69.39%   80.16%   +10.76%     
===========================================
  Files         302     1480     +1178     
  Lines       27077   204774   +177697     
===========================================
+ Hits        18790   164152   +145362     
- Misses       8287    34773    +26486     
- Partials        0     5849     +5849     
Components Coverage Δ
Client 79.44% <ø> (∅)
Core 69.39% <ø> (ø)
Go 81.89% <79.46%> (∅)
Files with missing lines Coverage Δ
internal/flushcommon/syncmgr/sync_manager.go 91.13% <100.00%> (ø)
internal/flushcommon/util/checkpoint_updater.go 88.57% <100.00%> (ø)
internal/metastore/catalog.go 100.00% <ø> (ø)
...erycoordv2/balance/channel_level_score_balancer.go 89.24% <ø> (ø)
internal/streamingnode/server/builder.go 100.00% <100.00%> (ø)
.../server/flusher/flusherimpl/vchannel_checkpoint.go 100.00% <100.00%> (ø)
internal/streamingnode/server/resource/resource.go 96.96% <100.00%> (ø)
internal/streamingnode/server/server.go 93.10% <ø> (ø)
...ternal/streamingnode/server/wal/adaptor/builder.go 61.53% <100.00%> (ø)
...al/streamingnode/server/wal/adaptor/wal_adaptor.go 93.45% <100.00%> (ø)
... and 20 more

... and 1148 files with indirect coverage changes

Copy link
Contributor

mergify bot commented Jan 23, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from 44e5e19 to 22ad3ac Compare January 23, 2025 10:03
Copy link
Contributor

mergify bot commented Jan 23, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 23, 2025

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from 22ad3ac to e2f0c28 Compare February 5, 2025 08:14
Copy link
Contributor

mergify bot commented Feb 5, 2025

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 5, 2025

@chyezh cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@chyezh
Copy link
Contributor Author

chyezh commented Feb 5, 2025

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Feb 5, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch 2 times, most recently from 1c8d96c to 35c53a9 Compare February 5, 2025 12:35
Copy link
Contributor

mergify bot commented Feb 5, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 5, 2025

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from 35c53a9 to ec9d889 Compare February 6, 2025 02:41
Copy link
Contributor

mergify bot commented Feb 6, 2025

@chyezh cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Feb 6, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from 13e31a0 to 86e9c2a Compare February 7, 2025 03:43
@mergify mergify bot added ci-passed and removed ci-passed labels Feb 7, 2025
Copy link
Contributor

mergify bot commented Feb 7, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from 86e9c2a to c020806 Compare February 7, 2025 07:15
@chyezh
Copy link
Contributor Author

chyezh commented Feb 7, 2025

rerun ut

@chyezh chyezh changed the title enhance: make pchannel level flusher enhance: make pchannel level flusher Feb 7, 2025
@mergify mergify bot added kind/enhancement Issues or changes related to enhancement and removed do-not-merge/invalid-pr-format labels Feb 7, 2025
Copy link
Contributor

mergify bot commented Feb 7, 2025

@chyezh E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@chyezh
Copy link
Contributor Author

chyezh commented Feb 7, 2025

/run-cpu-e2e

1 similar comment
@chyezh
Copy link
Contributor Author

chyezh commented Feb 7, 2025

/run-cpu-e2e

@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from c020806 to 3e8866b Compare February 7, 2025 10:03
Copy link
Contributor

mergify bot commented Feb 7, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

- Add a pchannel level checkpoint for flush processing
- Refactor the recovery of flushers of wal
- make a shared wal scanner first, then make multi datasyncservice on it

Signed-off-by: chyezh <[email protected]>
@chyezh chyezh force-pushed the enhance_make_pchannel_lv_flusher branch from 3e8866b to b172067 Compare February 7, 2025 12:55
Copy link
Contributor

mergify bot commented Feb 7, 2025

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chyezh
Copy link
Contributor Author

chyezh commented Feb 7, 2025

rerun go-sdk

@mergify mergify bot added the ci-passed label Feb 7, 2025
startMessageID = message.MustUnmarshalMessageID(walName, checkpoint.MessageID.Id)
previous = startMessageID
} else {
startMessageID = vchannelManager.MinimumCheckpoint()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could vchannelManager.MinimumCheckpoint() possibly be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if there's no vchannel, the scan operation will consume from the beginning of wal.

impl.logger.DPanic("the message type is not CreateCollectionMessage", zap.Error(err))
return nil
}
impl.flusherComponents.WhenCreateCollection(createCollectionMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whay if the create collection message is produced successfully but the collection creation fails, I think we should add a rollback mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will modify the rootcoord current create collection logic.
if there's a rollback operation, the rootcoord should promise to send a dropcollection message to wal.

Copy link
Contributor Author

@chyezh chyezh Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement it in another PR.

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: chyezh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jaime0815
Copy link
Contributor

/lgtm

@sre-ci-robot sre-ci-robot merged commit d3e32bb into milvus-io:master Feb 10, 2025
19 of 20 checks passed
@chyezh chyezh deleted the enhance_make_pchannel_lv_flusher branch February 10, 2025 08:47
dataServices[vchannel] = ds.(*dataSyncServiceWrapper)
continue
}
if firstErr == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err if any error is not nil directly.

return err
}
// The channel has been dropped, skip to recover it.
if len(resp.GetInfo().GetSeekPosition().GetMsgID()) == 0 && resp.GetInfo().GetSeekPosition().GetTimestamp() == math.MaxUint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping the isDroppedChannel method clarifies the implicit checks involved with a dropped channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/internal-api area/test ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm sig/testing size/XXL Denotes a PR that changes 1000+ lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants