Symptom:
The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com
The issues are intermittent.
Actions:
- Follow k8s guide and check all DNS pods are running well.
- One possible reason is one or a few of namespaces in /etc/resolv.conf of hosts may not be able to solve the DNS name test.testcorp.com
- i.e. *testcorp.com is corp intranet name, it needs to be resolved by corp name servers. however, in normal cloud VM setup, we have name server option 169.254.169.254 in the /etc/resolv.conf, in this case 169.254.169.254 has no idea for *.testcorp.com, thus we have intermittent issues
- To solve this, we need to update DHCP server, remove 169.254.169.254 from /etc/resolv.conf
- kubectl rollout restart deployment coredns -n kube-system
- One possible reason is some of the nodes have network issues which DNS pods are not functioning well. use below commands to test DNS pods.
kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txtcat /tmp/1.txt | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done
- Enable debug log of DNS pods per k8s guide
- test the DNS and kubectl tail all DNS pods to get debug info
kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp
- You may get log like
INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s
[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s
- The /etc/resolv.conf has "options ndots:5" which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com. (there is a . at the end)
- Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client
- Add below into coredns config file. refer coredns rewrite
- rewrite stop type AAAA A
- Install node local DNS to speed DNS queries. Refer kubernetes doc
- test dig test.testcorp.com +all many times, it will show authorization section
;; AUTHORITY SECTION:test.com. 4878 IN NS dnsmaster1.test.com.test.com. 4878 IN NS dnsmaster5.test.com.
- to find out which DNS server timeout
- Add below parameter in /etc/resolv.conf to improve DNS query performance
Another solution is to use an external name:
// code placeholder apiVersion: v1 kind: Service metadata: annotations: name: test-stage namespace: default spec: externalName: test-stage.testcorp.com ports: - port: 636 protocol: TCP targetPort: 636 type: ExternalName
No comments:
Post a Comment