-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
core-teamIssue can be worked on by the core teamIssue can be worked on by the core teamops-teamIssue can be worked on by the ops teamIssue can be worked on by the ops team
Description
What problem does your feature solve?
We currently see very large coreDNS traffic during some missions. For example small tests can get us to 80k or even 100k rps at which point CoreDNS starts failing
What happens is this:
- When SSC starts we create service that catches all pods. For example
apiVersion: v1
kind: Service
metadata:
labels:
app: stellar-core
name: ssc-1015z-15a2d2-stellar-core
namespace: stellar-supercluster
spec:
clusterIP: None
selector:
app: stellar-core
- We also create service of type
ExternalName
for each pod:
apiVersion: v1
kind: Service
metadata:
labels:
app: stellar-core
name: ssc-1015z-15a2d2-sts-complete1-0
namespace: stellar-supercluster
spec:
externalName: ssc-1015z-15a2d2-sts-complete1-0.ssc-1015z-15a2d2-stellar-core.stellar-supercluster.svc.cluster.local
ports:
- name: core
port: 11626
protocol: TCP
targetPort: 11626
- name: history
port: 80
protocol: TCP
targetPort: 80
type: ExternalName
- We also create fairly large Ingresses that utilize the ExternalName services
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: private
nginx.ingress.kubernetes.io/rewrite-target: /$2
generation: 1
name: ssc-1015z-15a2d2-stellar-core-ingress
namespace: stellar-supercluster
spec:
rules:
- host: ssc-1015z-15a2d2.stellar-supercluster.example.com
http:
paths:
- backend:
service:
name: ssc-1015z-15a2d2-sts-complete1-0
port:
number: 11626
path: /ssc-1015z-15a2d2-sts-complete1-0/core(/|$)(.*)
pathType: Prefix
- backend:
service:
name: ssc-1015z-15a2d2-sts-complete2-0
port:
number: 11626
path: /ssc-1015z-15a2d2-sts-complete2-0/core(/|$)(.*)
pathType: Prefix
- When nginx ingress controlles sets things up it needs to resolve all ExternalNames
- When pods are not ready the service endpoints return NXDOMAIN
- Above causes each nginx controller, of which there are many, to flood coredns with requests.
What would you like to see?
I found a way to significantly simplify the whole setup. What we need is:
- Create one Service that catches all pods. Let's call it foobar
- Create small "proxy" nginx instance. This instance will use above service and proxy traffic to pods. Example config:
apiVersion: v1
kind: ConfigMap
metadata:
name: proxy
namespace: stellar-supercluster
data:
default.conf: |
server {
listen 80 default_server;
server_name _;
resolver 10.96.0.10 ipv6=off;
location ~ ^/(.+)/core$ {
proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:11626/;
}
location ~ ^/(.+)/core/(.*)$ {
proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:11626/$2;
}
location ~ ^/(.+)/history$ {
proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:80/;
}
location ~ ^/(.+)/history/(.*)$ {
proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:80/$2;
}
}
- Expose above proxy nginx using Ingress
What alternatives are there?
It may be possible to throw resources at the problem but due to the way DNS traffic is amplified it won't take us very far.
Metadata
Metadata
Assignees
Labels
core-teamIssue can be worked on by the core teamIssue can be worked on by the core teamops-teamIssue can be worked on by the ops teamIssue can be worked on by the ops team