-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for RDMA #65
Fixes for RDMA #65
Conversation
…c_rdma_res()) returns err.
…_irq methods, to prevent interrupt settings getting changed. One example of this is an interrupt inadvertently getting enabled inside an IRQ handler: [ 84.546271] ------------[ cut here ]------------ [ 84.546290] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:362 __local_bh_enable_ip+0x3a/0x60 [ 84.546313] Modules linked in: drbd_transport_rdma(O) drbd(O) ip6table_nat iptable_nat nf_nat bpfilter nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c rdma_ucm rdma_cm ib_cm iw_cm ib_umad ib_ipoib mlx4_ib kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd nvme nvme_core mlx4_core ib_uverbs ib_core dummy bonding [last unloaded: drbd] [ 84.546374] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 5.15.75 LINBIT#2 [ 84.546386] Hardware name: Insyde Grantley/Analytic Blade Board, BIOS 05.04.21.0038.00.011 05/09/2018 [ 84.546394] RIP: 0010:__local_bh_enable_ip+0x3a/0x60 [ 84.546406] Code: a9 00 00 0f 00 75 23 83 ee 01 f7 de 65 01 35 5d 6a f9 7e 65 8b 05 56 6a f9 7e a9 00 ff ff 00 74 0d 5d 65 ff 0d 47 6a f9 7e c3 <0f> 0b eb d9 65 66 8b 05 fa 5a fa 7e 66 85 c0 74 e6 e8 20 ff ff ff [ 84.546419] RSP: 0018:ffff88fe7f805c80 EFLAGS: 00010206 [ 84.546429] RAX: 0000000080010200 RBX: ffff888104de3000 RCX: 0000000000000001 [ 84.546437] RDX: ffff888104de3238 RSI: 0000000000000200 RDI: ffffffffa022c3f0 [ 84.546445] RBP: ffff88fe7f805c80 R08: 000000000000000e R09: 0000000000000535 [ 84.546452] R10: 0000000000000001 R11: ffff8881082a0450 R12: 0000000000000000 [ 84.546460] R13: 0000000000000a20 R14: 0000000000000021 R15: ffff888104de323c [ 84.546468] FS: 0000000000000000(0000) GS:ffff88fe7f800000(0000) knlGS:0000000000000000 [ 84.546477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 84.546486] CR2: 00005586e40e7688 CR3: 000000000220a006 CR4: 00000000003706f0 [ 84.546494] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 84.546502] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 84.546509] Call Trace: [ 84.546517] <IRQ> [ 84.546525] _raw_spin_unlock_bh+0x1a/0x20 [ 84.546555] dtr_send_flow_control_msg+0xb0/0x410 [drbd_transport_rdma] [ 84.546566] ? mlx4_ib_post_recv+0x10/0x20 [mlx4_ib] [ 84.546582] dtr_recycle_rx_desc.constprop.0+0xb7/0xc0 [drbd_transport_rdma] [ 84.546591] dtr_control_data_ready+0xbe/0xe0 [drbd_transport_rdma] [ 84.546599] dtr_rx_cq_event_handler+0x413/0x5e0 [drbd_transport_rdma] [ 84.546607] mlx4_ib_cq_comp+0x20/0x30 [mlx4_ib] [ 84.546618] mlx4_cq_completion+0x42/0x60 [mlx4_core] [ 84.546650] mlx4_eq_int+0x1d4/0x7f0 [mlx4_core] [ 84.546670] ? note_gp_changes+0x60/0x70 [ 84.546681] mlx4_msi_x_interrupt+0x11/0x20 [mlx4_core] [ 84.546701] ? mlx4_msi_x_interrupt+0x11/0x20 [mlx4_core] [ 84.546722] __handle_irq_event_percpu+0x3f/0x150 [ 84.546731] handle_irq_event+0x4d/0xb0 [ 84.546739] handle_edge_irq+0x94/0x1f0 [ 84.546745] __common_interrupt+0x44/0xa0 [ 84.546752] common_interrupt+0x85/0xa0 [ 84.546761] </IRQ> [ 84.546765] <TASK> [ 84.546769] asm_common_interrupt+0x27/0x40 [ 84.546776] RIP: 0010:cpuidle_enter_state+0xd3/0x350 [ 84.546784] Code: 89 c6 0f 1f 44 00 00 31 ff e8 59 15 a0 ff 80 7d d7 00 74 12 9c 58 f6 c4 02 0f 85 69 02 00 00 31 ff e8 f1 46 a5 ff fb 45 85 ff <0f> 88 fa 00 00 00 49 63 cf 4c 8b 55 c8 48 8d 04 49 48 8d 14 81 48 [ 84.546793] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202 [ 84.546798] RAX: ffff88fe7f82a580 RBX: ffffe8ffff402888 RCX: 000000000000001f [ 84.546804] RDX: 00000013af5996af RSI: 000000003d17f3e5 RDI: 0000000000000000 [ 84.546809] RBP: ffffffff82203e28 R08: 0000000000000002 R09: ffff88fe7f8294c4 [ 84.546814] R10: 0000000000000018 R11: 0000000000000067 R12: 0000000000000004 [ 84.546819] R13: ffffffff82377ce0 R14: 00000013af5996af R15: 0000000000000004 [ 84.546826] cpuidle_enter+0x2e/0x40 [ 84.546834] do_idle+0x1ca/0x220 [ 84.546842] cpu_startup_entry+0x1d/0x20 [ 84.546849] rest_init+0xbf/0xd0 [ 84.546854] arch_call_rest_init+0xe/0x1b [ 84.546871] start_kernel+0x65f/0x685 [ 84.546879] x86_64_start_reservations+0x24/0x26 [ 84.546887] x86_64_start_kernel+0x9c/0x9f [ 84.546894] secondary_startup_64_no_verify+0xc2/0xcb [ 84.546903] </TASK> [ 84.546907] ---[ end trace baa9db6983265450 ]---
…ot yet started drbd.
…ace condition possibly ending in GPF or NPR
drbd testres1: Preparing cluster-wide state change 1915447682 (0->-1 3/2) drbd testres1: State change 1915447682: primary_nodes=0, weak_nodes=0 drbd testres1: Committing cluster-wide state change 1915447682 (0ms) drbd testres1: role( Primary -> Secondary ) drbd testres1: Preparing cluster-wide state change 4059059782 (0->1 496/16) drbd testres1: State change 4059059782: primary_nodes=0, weak_nodes=0 drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Cluster is now split drbd testres1: Committing cluster-wide state change 4059059782 (0ms) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) drbd testres1/0 drbd2 ybos-00000000-0000-0000-0000-38b8ebd03c78: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: sock_recvmsg returned -4 drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Terminating sender thread drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Starting sender thread (from drbd_r_testres1 [3923]) BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 0 P4D 0 Oops: 0000 [LINBIT#1] SMP CPU: 0 PID: 16 Comm: kworker/0:1 Tainted: G O 5.15.75 LINBIT#3 Hardware name: Insyde Grantley/Analytic Blade Board, BIOS 05.04.21.0038.00.011 05/09/2018 Workqueue: events dtr_end_rx_work_fn [drbd_transport_rdma] RIP: 0010:dtr_free_rx_desc.part.0+0x15/0xa0 [drbd_transport_rdma] Code: 00 48 89 d7 e8 8c 6f 2a e1 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 4c 8b 67 20 48 89 fb <49> 8b 44 24 08 4d 8b 6c 24 10 48 8b 00 48 8b 38 48 85 ff 74 1f 49 RSP: 0018:ffff888100c0fe30 EFLAGS: 00010082 RAX: ffff888460e46b88 RBX: ffff888460e46b80 RCX: ffff888460e46bc8 RDX: 0000000000000001 RSI: 807fffffffffffff RDI: ffff888460e46b80 RBP: ffff888100c0fe48 R08: ffff8881102cb728 R09: ffff88810006b1b4 R10: 0000000000000018 R11: fefefefefefefeff R12: 0000000000000000 R13: ffff8881102cb718 R14: ffff888460e46bc0 R15: ffff88fe7f829f00 FS: 0000000000000000(0000) GS:ffff88fe7f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000000220a002 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dtr_end_rx_work_fn+0x48/0x70 [drbd_transport_rdma] process_one_work+0x1e4/0x390 worker_thread+0x50/0x3e0 ? rescuer_thread+0x3a0/0x3a0 kthread+0x12a/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: ext4 mbcache jbd2 drbd_transport_rdma(O) drbd(O) ip6table_nat iptable_nat nf_nat bpfilter nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c rdma_ucm rdma_cm ib_cm iw_cm ib_umad ib_ipoib mlx4_ib kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd nvme nvme_core mlx4_core ib_uverbs ib_core dummy bonding [last unloaded: drbd] CR2: 0000000000000008 ---[ end trace 5d134c4748bcd1c9 ]--- RIP: 0010:dtr_free_rx_desc.part.0+0x15/0xa0 [drbd_transport_rdma] Code: 00 48 89 d7 e8 8c 6f 2a e1 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 4c 8b 67 20 48 89 fb <49> 8b 44 24 08 4d 8b 6c 24 10 48 8b 00 48 8b 38 48 85 ff 74 1f 49 RSP: 0018:ffff888100c0fe30 EFLAGS: 00010082 RAX: ffff888460e46b88 RBX: ffff888460e46b80 RCX: ffff888460e46bc8 RDX: 0000000000000001 RSI: 807fffffffffffff RDI: ffff888460e46b80 RBP: ffff888100c0fe48 R08: ffff8881102cb728 R09: ffff88810006b1b4 R10: 0000000000000018 R11: fefefefefefefeff R12: 0000000000000000 R13: ffff8881102cb718 R14: ffff888460e46bc0 R15: ffff88fe7f829f00 FS: 0000000000000000(0000) GS:ffff88fe7f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000000220a002 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Rebooting in 5 seconds..
The can be repeated within about 10-20 iterations of: "while true; do drbdadm up testres1 && sleep 2 && drbdadm down testres1 && sleep 2; done" The key is that the testres1.res file points to another node where either the node doesn't exist, or isn't booted yet. drbd testres1/0 drbd2: disk( UpToDate -> Detaching ) drbd testres1/0 drbd2: disk( Detaching -> Diskless ) drbd testres1/0 drbd2: drbd_bm_resize called with capacity == 0 drbd testres1: Terminating worker thread drbd testres1: Starting worker thread (from drbdsetup [4990]) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Starting sender thread (from drbdsetup [4997]) drbd testres1/0 drbd2: meta-data IO uses: blk-bio drbd testres1/0 drbd2: disk( Diskless -> Attaching ) drbd testres1/0 drbd2: Maximum number of peer devices = 1 drbd testres1: Method to ensure write ordering: flush drbd testres1/0 drbd2: drbd_bm_resize called with capacity == 7501244792 drbd testres1/0 drbd2: resync bitmap: bits=937655599 words=14650869 pages=28615 drbd2: detected capacity change from 0 to 7501244792 drbd testres1/0 drbd2: size = 3577 GB (3750622396 KB) drbd testres1/0 drbd2: size = 3577 GB (3750622396 KB) drbd testres1/0 drbd2: recounting of set bits took additional 40ms drbd testres1/0 drbd2: disk( Attaching -> UpToDate ) drbd testres1/0 drbd2: attached to current UUID: 027B94FF1B3EC8D4 drbd testres1/0 drbd2: Setting exposed data uuid: 027B94FF1B3EC8D4 drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: conn( StandAlone -> Unconnected ) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Starting receiver thread (from drbd_w_testres1 [4991]) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: conn( Unconnected -> Connecting ) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: conn( Connecting -> Disconnecting ) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Failed to initiate connection, err=-512 drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Terminating sender thread drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Starting sender thread (from drbd_r_testres1 [5010]) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Connection closed drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: conn( Disconnecting -> StandAlone ) drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Terminating receiver thread drbd testres1 ybos-00000000-0000-0000-0000-38b8ebd03c78: Terminating sender thread BUG: kernel NULL pointer dereference, address: 0000000000000208 PGD 0 P4D 0 Oops: 0002 [LINBIT#1] SMP CPU: 11 PID: 0 Comm: swapper/11 Tainted: G O 5.15.75 LINBIT#3 Hardware name: Insyde Grantley/Analytic Blade Board, BIOS 05.04.21.0038.00.011 05/09/2018 RIP: 0010:__run_timers+0x1df/0x280 Code: 48 c7 43 08 00 00 00 00 48 85 c0 0f 84 86 00 00 00 49 8b 0c 24 48 89 4b 08 66 90 48 8b 01 48 8b 51 08 48 89 02 48 85 c0 74 04 <48> 89 50 08 48 c7 41 08 00 00 00 00 48 8b 71 18 4c 89 31 f6 41 22 RSP: 0018:ffff88fe7fac5ed0 EFLAGS: 00010006 RAX: 0000000000000200 RBX: ffff88fe7fadb740 RCX: ffff88810bf17578 RDX: ffff88fe7fac5ef8 RSI: 0000000140003d80 RDI: ffff88fe7fadb768 RBP: ffff88fe7fac5f68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88fe7fac5ef8 R13: 0000000100003d80 R14: dead000000000122 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88fe7fac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000208 CR3: 000000000220a002 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> run_timer_softirq+0x1d/0x40 __do_softirq+0xc6/0x27d irq_exit_rcu+0x86/0xb0 sysvec_apic_timer_interrupt+0x78/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1b/0x20 RIP: 0010:cpuidle_enter_state+0xd3/0x350 Code: 89 c6 0f 1f 44 00 00 31 ff e8 59 15 a0 ff 80 7d d7 00 74 12 9c 58 f6 c4 02 0f 85 69 02 00 00 31 ff e8 f1 46 a5 ff fb 45 85 ff <0f> 88 fa 00 00 00 49 63 cf 4c 8b 55 c8 48 8d 04 49 48 8d 14 81 48 RSP: 0018:ffff8881010dbe70 EFLAGS: 00000202 RAX: ffff88fe7faea580 RBX: ffffe8ffff6c2888 RCX: 000000000000001f RDX: 0000006a8769d584 RSI: 000000003d17f1fb RDI: 0000000000000000 RBP: ffff8881010dbea8 R08: 0000000000000002 R09: ffff88fe7fae94a4 R10: 0000000000000008 R11: 0000000000066863 R12: 0000000000000004 R13: ffffffff82377ce0 R14: 0000006a8769d584 R15: 0000000000000004 cpuidle_enter+0x2e/0x40 do_idle+0x1ca/0x220 cpu_startup_entry+0x1d/0x20 start_secondary+0xe1/0xf0 secondary_startup_64_no_verify+0xc2/0xcb </TASK> Modules linked in: drbd_transport_rdma(O) ip6table_nat iptable_nat nf_nat bpfilter drbd(O) nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c rdma_ucm rdma_cm ib_cm iw_cm ib_umad ib_ipoib mlx4_ib kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd nvme nvme_core mlx4_core ib_uverbs ib_core dummy bonding CR2: 0000000000000208 ---[ end trace 9617d0e986125e0b ]--- RIP: 0010:__run_timers+0x1df/0x280 Code: 48 c7 43 08 00 00 00 00 48 85 c0 0f 84 86 00 00 00 49 8b 0c 24 48 89 4b 08 66 90 48 8b 01 48 8b 51 08 48 89 02 48 85 c0 74 04 <48> 89 50 08 48 c7 41 08 00 00 00 00 48 8b 71 18 4c 89 31 f6 41 22 RSP: 0018:ffff88fe7fac5ed0 EFLAGS: 00010006 RAX: 0000000000000200 RBX: ffff88fe7fadb740 RCX: ffff88810bf17578 RDX: ffff88fe7fac5ef8 RSI: 0000000140003d80 RDI: ffff88fe7fadb768 RBP: ffff88fe7fac5f68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88fe7fac5ef8 R13: 0000000100003d80 R14: dead000000000122 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88fe7fac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000208 CR3: 000000000220a002 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 5 seconds..
In my setup I need more control stream buffers to prevent the "Not sending flow_control mgs, no receive window!" error message from happening (usually during resync). This is a hackish way to increase that. Perhaps this should be configurable. The original value was 64. I get better resync performance if I set it to 2 (half as many buffers as the data stream).
Hi @mtisza! Thanks for your contribution to the LINBIT software! Development for this project happens on mailing lists, rather than on GitHub - this GitHub repository is a read-only mirror that isn't used for accepting contributions. So that your change can become part of our software, please email it to us as a patch. Here's what to do:
How do I format my contribution?Firstly, all contributions need to be formatted as patches. A patch is a plain text document showing the change you want to make to the code, and documenting why it is a good idea. You can create patches with Secondly, patches need 'commit messages', which is the human-friendly documentation explaining what the change is and why it's necessary. Who do I send my contribution to?There are two mailing lists:
If you're interested in DRBD development, subscribing to these mailing lists is a good idea. How do I send my contribution?Use For more information about using How do I get help if I'm stuck?Firstly, don't get discouraged, we are here to help! If you are lost in the process, and really tried, you will usually find contact information in header/implementation files, or see who touched the code with I sent my patch - now what?You wait. You can check that your email has been received by checking the mailing list archives for the mailing list you sent your patch to. Messages may not be received instantly, so be patient. Developers are generally very busy people, so it may take a few days, even weeks before your patch is looked at. Then, you keep waiting. It is fine to kick us again if you did not receive an answer within 2 weeks, but usually we are a lot faster. Further information
Happy hacking! This message was posted by a bot - if you have any questions or suggestions, please talk to my owner, @rck |
I know this is closed, but I submitted a fixed #66, as there was an issue I failed to detect on this one prior to submitting it (build issue due to rebasing onto latest master). |
I've spent some time debugging issues with RDMA. Without these patches RDMA did not work at all (crashes, hangs, random ping timeouts, ...). After these patches it works quite well.
This does resolve #58, as well as other issues.