Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve gossip handling of in-memory expired certs #5110

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions gossip/comm/comm_impl.go
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,14 @@ func (c *commImpl) createConnection(endpoint string, expectedPKIID common.PKIidT
return nil, errors.WithStack(err)
}

if len(expectedPKIID) > 0 {
identity, _ := c.idMapper.Get(expectedPKIID)
if len(identity) == 0 {
c.logger.Warningf("Identity of %x is no longer found in the identity store, aborting connection", expectedPKIID)
return nil, errors.New("identity no longer recognized")
}
}

Comment on lines +165 to +172
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came close to fixing the problem... membership is indeed established now, but the new warning and one of the existing warnings keeps getting repeated in the logs forever:

      2025-01-23 00:59:56.269 EST 00a4 WARN [gossip.gossip] disclosurePolicy -> Cannot determine organization of Endpoint: 127.0.0.1:22510, InternalEndpoint: , PKI-ID: fd8548f3b6fb8164a711b6d6dfa2a55f5ae013f512015f884e9af282ef553aa5, Metadata: 
      2025-01-23 00:59:56.273 EST 00a5 WARN [gossip.comm] createConnection -> Identity of 66643835343866336236666238313634613731316236643664666132613535663561653031336635313230313566383834653961663238326566353533616135 is no longer found in the identity store, aborting connection
      2025-01-23 00:59:56.273 EST 00a6 WARN [gossip.comm] sendToEndpoint -> Failed obtaining connection for 127.0.0.1:22510, PKIid:fd8548f3b6fb8164a711b6d6dfa2a55f5ae013f512015f884e9af282ef553aa5 reason: identity no longer recognized
      2025-01-23 00:59:57.778 EST 00a7 WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:{membership:{endpoint:"127.0.0.1:22510" pki_id:"\xfd\x85H\xf3\xb6\xfb\x81d\xa7\x11\xb6\xd6ߢ\xa5_Z\xe0\x13\xf5\x12\x01_\x88N\x9a\xf2\x82\xefU:\xa5"} timestamp:{inc_num:1737611884416490000 seq_num:28}}
      2025-01-23 01:00:01.780 EST 00a8 WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:{membership:{endpoint:"127.0.0.1:22510" pki_id:"\xfd\x85H\xf3\xb6\xfb\x81d\xa7\x11\xb6\xd6ߢ\xa5_Z\xe0\x13\xf5\x12\x01_\x88N\x9a\xf2\x82\xefU:\xa5"} timestamp:{inc_num:1737611884416490000 seq_num:28}}
      2025-01-23 01:00:05.781 EST 00a9 WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:{membership:{endpoint:"127.0.0.1:22510" pki_id:"\xfd\x85H\xf3\xb6\xfb\x81d\xa7\x11\xb6\xd6ߢ\xa5_Z\xe0\x13\xf5\x12\x01_\x88N\x9a\xf2\x82\xefU:\xa5"} timestamp:{inc_num:1737611884416490000 seq_num:28}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came close to fixing the problem... membership is indeed established now, but the new warning and one of the existing warnings keeps getting repeated in the logs forever:

      2025-01-23 00:59:56.269 EST 00a4 WARN [gossip.gossip] disclosurePolicy -> Cannot determine organization of Endpoint: 127.0.0.1:22510, InternalEndpoint: , PKI-ID: fd8548f3b6fb8164a711b6d6dfa2a55f5ae013f512015f884e9af282ef553aa5, Metadata: 
      2025-01-23 00:59:56.273 EST 00a5 WARN [gossip.comm] createConnection -> Identity of 66643835343866336236666238313634613731316236643664666132613535663561653031336635313230313566383834653961663238326566353533616135 is no longer found in the identity store, aborting connection
      2025-01-23 00:59:56.273 EST 00a6 WARN [gossip.comm] sendToEndpoint -> Failed obtaining connection for 127.0.0.1:22510, PKIid:fd8548f3b6fb8164a711b6d6dfa2a55f5ae013f512015f884e9af282ef553aa5 reason: identity no longer recognized
      2025-01-23 00:59:57.778 EST 00a7 WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:{membership:{endpoint:"127.0.0.1:22510" pki_id:"\xfd\x85H\xf3\xb6\xfb\x81d\xa7\x11\xb6\xd6ߢ\xa5_Z\xe0\x13\xf5\x12\x01_\x88N\x9a\xf2\x82\xefU:\xa5"} timestamp:{inc_num:1737611884416490000 seq_num:28}}
      2025-01-23 01:00:01.780 EST 00a8 WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:{membership:{endpoint:"127.0.0.1:22510" pki_id:"\xfd\x85H\xf3\xb6\xfb\x81d\xa7\x11\xb6\xd6ߢ\xa5_Z\xe0\x13\xf5\x12\x01_\x88N\x9a\xf2\x82\xefU:\xa5"} timestamp:{inc_num:1737611884416490000 seq_num:28}}
      2025-01-23 01:00:05.781 EST 00a9 WARN [gossip.gossip] func3 -> Unable to determine org of message tag:EMPTY alive_msg:{membership:{endpoint:"127.0.0.1:22510" pki_id:"\xfd\x85H\xf3\xb6\xfb\x81d\xa7\x11\xb6\xd6ߢ\xa5_Z\xe0\x13\xf5\x12\x01_\x88N\x9a\xf2\x82\xefU:\xa5"} timestamp:{inc_num:1737611884416490000 seq_num:28}}

This happens because the peer with the renewed certificate sends the first peer a membership request ("give me all the peers you know about, alive or dead").

The first peer still has the old incarnation of the second peer in its memory. It then asks itself whether it can disclose knowledge of that peer in its membership response, but it cannot determine that because it no longer knows what is the identity corresponding to the PKI-ID of its previous incarnation.

If you run the peer for long enough (I think up to 2 hours?) it will garbage collect that peer's alive messages and it will no longer print that.

Ideally we would purge everything we know about a peer once its certificate has expired.
However I consider this a separate problem and it's orthogonal to this issue.
The fact that the peer issues warnings in the log doesn't mean something is necessarily wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exists functionality to update PKI-ID and to purge as can be seen in the regular (non-expired) cert renewal scenario, e.g.:

INFO [gossip.comm] createConnection -> Peer 127.0.0.1:22510 changed its PKI-ID from 3257ef1cb73517a1c07b06c0c8f06903cb537199ffb11ecd23643b900f3f43d2 to 9f3e40a456e9f1a22c73e1206bd78b3b1b0ed7276ffbb37b9de00d23530f0877
INFO [gossip.discovery] purge -> Purging 3257ef1cb73517a1c07b06c0c8f06903cb537199ffb11ecd23643b900f3f43d2 from membership

Why wouldn't we do the same in the case of an expired cert renewal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exists functionality to update PKI-ID and to purge as can be seen in the regular (non-expired) cert renewal scenario, e.g.:

INFO [gossip.comm] createConnection -> Peer 127.0.0.1:22510 changed its PKI-ID from 3257ef1cb73517a1c07b06c0c8f06903cb537199ffb11ecd23643b900f3f43d2 to 9f3e40a456e9f1a22c73e1206bd78b3b1b0ed7276ffbb37b9de00d23530f0877
INFO [gossip.discovery] purge -> Purging 3257ef1cb73517a1c07b06c0c8f06903cb537199ffb11ecd23643b900f3f43d2 from membership

Why wouldn't we do the same in the case of an expired cert renewal?

Yeah I know the code :-)

I didn't have the time to look whether it's the best option from a code change perspective.

I'll look at it more carefully when I have time

ctx, cancel = context.WithCancel(context.Background())
if stream, err = cl.GossipStream(ctx); err == nil {
connInfo, err = c.authenticateRemotePeer(stream, true, false)
Expand Down
180 changes: 180 additions & 0 deletions integration/gossip/gossip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,13 @@
package gossip

import (
"crypto/ecdsa"
"crypto/rand"
"crypto/sha256"
"crypto/x509"
"encoding/pem"
"fmt"
"math/big"
"os"
"path/filepath"
"syscall"
Expand Down Expand Up @@ -285,8 +291,182 @@ var _ = Describe("Gossip State Transfer and Membership", func() {
assertPeerMembershipUpdate(network, peer1Org1, []*nwo.Peer{peer0Org2, peer1Org2}, nwprocs, expectedMsgFromExpirationCallback)
})
})

It("updates membership for a peer with a renewed certificate", func() {
network.Bootstrap()
orderer := network.Orderer("orderer")
nwprocs.ordererRunner = network.OrdererRunner(orderer)
nwprocs.ordererProcess = ifrit.Invoke(nwprocs.ordererRunner)
Eventually(nwprocs.ordererProcess.Ready(), network.EventuallyTimeout).Should(BeClosed())

peer0Org1 := network.Peer("Org1", "peer0")
peer0Org2 := network.Peer("Org2", "peer0")

By("bringing up a peer in each organization")
startPeers(nwprocs, false, peer0Org1, peer0Org2)

channelparticipation.JoinOrdererAppChannel(network, "testchannel", orderer, nwprocs.ordererRunner)

By("joining peers to channel")
network.JoinChannel(channelName, orderer, peer0Org1, peer0Org2)

By("verifying membership of both peers")
// bothPeers := []*nwo.Peer{peer0Org1, peer0Org2}
// network.VerifyMembership(bothPeers, channelName)
// expectedPeers := make([]nwo.DiscoveredPeer, 1)
// expectedPeers[0] = network.DiscoveredPeer(peer0Org2, "_lifecycle")
Eventually(nwo.DiscoverPeers(network, peer0Org1, "User1", "testchannel"), 50*time.Second, 100*time.Millisecond).Should(ContainElements(network.DiscoveredPeer(peer0Org2, "_lifecycle")))

fmt.Println("===MEMBERSHIP VERIFIED===")
time.Sleep(5 * time.Second)

By("stopping, renewing peer0Org2 certificate before expiration, and restarting")
stopPeers(nwprocs, peer0Org2)
renewPeerCertificate(network, peer0Org2, time.Now().Add(time.Minute))

fmt.Println("===STOPPED AND RENEWED peer0Org2, RESTARTING, CHECK FOR PKI-ID REPLACEMENT===")
time.Sleep(5 * time.Second)

startPeers(nwprocs, false, peer0Org2)

By("ensuring that peer0Org1 replaces peer0Org2 PKI-ID")
peer0Org1Runner := nwprocs.peerRunners[peer0Org1.ID()]
Eventually(peer0Org1Runner.Err(), network.EventuallyTimeout).Should(gbytes.Say("changed its PKI-ID from"))

fmt.Println("===PKI-ID REPLACED===")
time.Sleep(5 * time.Second)

By("verifying membership after cert renewed")
Eventually(
nwo.DiscoverPeers(network, peer0Org1, "User1", "testchannel"),
60*time.Second,
100*time.Millisecond).
Should(ContainElements(network.DiscoveredPeer(network.Peer("Org2", "peer0"), "_lifecycle")))

fmt.Println("===MEMBERSHIP VERIFIED WITH RENEWED CERT, NOW WAIT FOR CERT TO EXPIRE===")
time.Sleep(5 * time.Second)

By("waiting for cert to expire within a minute")
Eventually(peer0Org1Runner.Err(), network.EventuallyTimeout).Should(gbytes.Say("gossipping peer identity expired"))

By("stopping, renewing peer0Org2 certificate again after its expiration")
stopPeers(nwprocs, peer0Org2)
renewPeerCertificate(network, peer0Org2, time.Now().Add(time.Hour))

fmt.Println("===STOPPED AND RENEWED peer0Org2 AGAIN AFTER EXPIRATION, RESTARTING, CHECK FOR MEMBERSHIP AGAIN===")
time.Sleep(5 * time.Second)

By("ensuring that peer0Org1 establishes membership with peer0Org2 after final restart post-expiration")
startPeers(nwprocs, false, peer0Org2)

Eventually(
nwo.DiscoverPeers(network, peer0Org1, "User1", "testchannel"),
60*time.Second,
100*time.Millisecond).
Should(ContainElements(network.DiscoveredPeer(network.Peer("Org2", "peer0"), "_lifecycle")))

time.Sleep(300 * time.Minute)
})
})

// renewPeerCertificate renews the certificate with a given expirationTime and re-writes it to the peer's signcert directory
func renewPeerCertificate(network *nwo.Network, peer *nwo.Peer, expirationTime time.Time) {
peerDomain := network.Organization(peer.Organization).Domain

peerCAKeyPath := filepath.Join(network.RootDir, "crypto", "peerOrganizations", peerDomain, "ca", "priv_sk")
peerCAKey, err := os.ReadFile(peerCAKeyPath)
Expect(err).NotTo(HaveOccurred())

peerCACertPath := filepath.Join(network.RootDir, "crypto", "peerOrganizations", peerDomain, "ca", fmt.Sprintf("ca.%s-cert.pem", peerDomain))
peerCACert, err := os.ReadFile(peerCACertPath)
Expect(err).NotTo(HaveOccurred())

peerCertPath := filepath.Join(network.PeerLocalMSPDir(peer), "signcerts", fmt.Sprintf("peer0.%s-cert.pem", peerDomain))
peerCert, err := os.ReadFile(peerCertPath)
Expect(err).NotTo(HaveOccurred())

renewedCert, _ := expireCertificate(peerCert, peerCACert, peerCAKey, expirationTime)
err = os.WriteFile(peerCertPath, renewedCert, 0o600)
fmt.Println(peerCertPath)
Expect(err).NotTo(HaveOccurred())
}

// expireCertificate re-creates and re-signs a certificate with a new expirationTime
func expireCertificate(certPEM, caCertPEM, caKeyPEM []byte, expirationTime time.Time) (expiredcertPEM []byte, earlyMadeCACertPEM []byte) {
keyAsDER, _ := pem.Decode(caKeyPEM)
caKeyWithoutType, err := x509.ParsePKCS8PrivateKey(keyAsDER.Bytes)
Expect(err).NotTo(HaveOccurred())
caKey := caKeyWithoutType.(*ecdsa.PrivateKey)

caCertAsDER, _ := pem.Decode(caCertPEM)
caCert, err := x509.ParseCertificate(caCertAsDER.Bytes)
Expect(err).NotTo(HaveOccurred())

certAsDER, _ := pem.Decode(certPEM)
cert, err := x509.ParseCertificate(certAsDER.Bytes)
Expect(err).NotTo(HaveOccurred())

cert.Raw = nil
caCert.Raw = nil
// The certificate was made 1 minute ago (1 hour doesn't work since cert will be before original CA cert NotBefore time)
cert.NotBefore = time.Now().Add((-1) * time.Minute)
// As well as the CA certificate
caCert.NotBefore = time.Now().Add((-1) * time.Minute)
// The certificate expires now
cert.NotAfter = expirationTime

// The CA creates and signs a temporary certificate
tempCertBytes, err := x509.CreateCertificate(rand.Reader, cert, caCert, cert.PublicKey, caKey)
Expect(err).NotTo(HaveOccurred())

// Force the certificate to use Low-S signature to be compatible with the identities that Fabric uses

// Parse the certificate to extract the TBS (to-be-signed) data
tempParsedCert, err := x509.ParseCertificate(tempCertBytes)
Expect(err).NotTo(HaveOccurred())

// Hash the TBS data
hash := sha256.Sum256(tempParsedCert.RawTBSCertificate)

// Sign the hash using forceLowS
r, s, err := forceLowS(caKey, hash[:])
Expect(err).NotTo(HaveOccurred())

// Encode the signature (DER format)
signature := append(r.Bytes(), s.Bytes()...)

// Replace the signature in the certificate with the low-s signature
tempParsedCert.Signature = signature

// Re-encode the certificate with the low-s signature
certBytes, err := x509.CreateCertificate(rand.Reader, tempParsedCert, caCert, cert.PublicKey, caKey)
Expect(err).NotTo(HaveOccurred())

// The CA signs its own certificate
caCertBytes, err := x509.CreateCertificate(rand.Reader, caCert, caCert, caCert.PublicKey, caKey)
Expect(err).NotTo(HaveOccurred())

expiredcertPEM = pem.EncodeToMemory(&pem.Block{Bytes: certBytes, Type: "CERTIFICATE"})
earlyMadeCACertPEM = pem.EncodeToMemory(&pem.Block{Bytes: caCertBytes, Type: "CERTIFICATE"})
return
}

// forceLowS ensures the ECDSA signature's S value is low
func forceLowS(priv *ecdsa.PrivateKey, hash []byte) (r, s *big.Int, err error) {
r, s, err = ecdsa.Sign(rand.Reader, priv, hash)
Expect(err).NotTo(HaveOccurred())

curveOrder := priv.Curve.Params().N
halfOrder := new(big.Int).Rsh(curveOrder, 1) // curveOrder / 2

// If s is greater than half the order, replace it with curveOrder - s
if s.Cmp(halfOrder) > 0 {
s.Sub(curveOrder, s)
}

return r, s, nil
}

func runTransactions(n *nwo.Network, orderer *nwo.Orderer, peer *nwo.Peer, chaincodeName string, channelID string) {
for i := 0; i < 5; i++ {
sess, err := n.PeerUserSession(peer, "User1", commands.ChaincodeInvoke{
Expand Down
1 change: 1 addition & 0 deletions integration/nwo/network.go
Original file line number Diff line number Diff line change
Expand Up @@ -1211,6 +1211,7 @@ func (n *Network) ConfigTxGen(command Command) (*gexec.Session, error) {
func (n *Network) Discover(command Command) (*gexec.Session, error) {
cmd := NewCommand(n.Components.Discover(), command)
cmd.Args = append(cmd.Args, "--peerTLSCA", n.CACertsBundlePath())
cmd.Env = []string{"FABRIC_LOGGING_SPEC=info:grpc=warn"} // suppress chatty grpc info messages
return n.StartSession(cmd, command.SessionName())
}

Expand Down
Loading