Skip to content

hap-ingress in external mode not updating certificates on disk (sporadic) #721

@joachimbuechse

Description

@joachimbuechse

Setup:

  • Two instances of hap-ingress (3.1.6/3.1.7) running on debian serving as (redundant) external ingress controllers for k8s.
  • Identical setup / versions of both instances
  • Pipeline updating certificates by replacing (updating) tls secrets issued by letsencrypt in k8s

Problem:

  • update of tls secret in k8s leads to replacement of the runtime certificate in haproxy (good) but not of the file on disk (problem).
  • the problem happens sporadically only (i.e. sometimes both hapi instances replace the disk file, sometimes only one does)

Analysis:

This happens only sporadically. I can see that another cert was successfully updated on disk by both hapi instances just 2 days before. No errors / warnings in the hapi logs.

The disk file is replaced only on one instance

Checked with

ls -ltr /opt/haproxy-ingress/config/certs/frontend/
openssl x509 -enddate -noout -in certs/frontend/haproxy-ingress_epg.[snip].pem

The runtime cert is replaced on both instances

Checked with

openssl s_client -connect [snip]:443 -servername epg.[snip] < /dev/null 2>/dev/null | openssl x509 -noout -enddate -subject

The (defunct) instance is no longer returning a valid certificate chain (i.e. issuer cert is missing)

Checked with

openssl s_client -connect [snip]:443 -servername epg.[snip] < /dev/null 2>/dev/null
OK instance:

depth=2 C=US, O=Internet Security Research Group, CN=ISRG Root X1
verify return:1
depth=1 C=US, O=Let's Encrypt, CN=R10
verify return:1
depth=0 CN=epg.[snip]
verify return:1
---
Certificate chain
 0 s:CN=epg.[snip]
   i:C=US, O=Let's Encrypt, CN=R10
   a:PKEY: RSA, 2048 (bit); sigalg: sha256WithRSAEncryption
   v:NotBefore: May 21 06:40:13 2025 GMT; NotAfter: Aug 19 06:40:12 2025 GMT
 1 s:C=US, O=Let's Encrypt, CN=R10
   i:C=US, O=Internet Security Research Group, CN=ISRG Root X1
   a:PKEY: RSA, 2048 (bit); sigalg: sha256WithRSAEncryption
   v:NotBefore: Mar 13 00:00:00 2024 GMT; NotAfter: Mar 12 23:59:59 2027 GMT

Defunct instance:

depth=0 CN=epg.[snip]
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN=epg.[snip]
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 CN=epg.[snip]
verify return:1
---
Certificate chain
 0 s:CN=epg.[snip]
   i:C=US, O=Let's Encrypt, CN=R10
   a:PKEY: RSA, 2048 (bit); sigalg: sha256WithRSAEncryption
   v:NotBefore: May 21 06:40:13 2025 GMT; NotAfter: Aug 19 06:40:12 2025 GMT

Clueless as to how to debug this further. Any help appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions