Recently, one of our k8s components’ NodePort was scanned with a vulnerability. Fixing the vulnerability is troublesome, and since this NodePort is not necessary, we decided to simply block access traffic to it using iptables. However, we found that the iptables blocking rule did not take effect.
Blocking with the filter table does not work
Our first idea was to add a DROP rule for the NodePort in the INPUT chain of the iptables filter table:
iptables -I INPUT -p tcp --dport 31002 -j DROP
But this rule did not take effect, and access to the NodePort was not blocked.
To rule out problems with the DROP rule itself, we used nc
to start a local service listening on port 30146:
nc -l -p 30146 <<<'{"status":"ok"}'
Then blocked it:
iptables -I INPUT -p tcp --dport 30146 -j DROP
Then tried to access:
curl localhost:30146
It was unreachable. This shows that the DROP rule is fine, so the reason it doesn’t work for NodePort may be because the mechanism for accessing NodePort is different from accessing a local process.
Enable iptables logging
To confirm which iptables rules were hit when accessing NodePort, we need to enable iptables logging. For CentOS, enable it as follows:
# load the nf_log_ipv4 kernel module
modprobe nf_log_ipv4
# use the nf_log_ipv4 logger for IPv4 traffic
sysctl net.netfilter.nf_log.2=nf_log_ipv4
# update /etc/rsyslog.conf to include config: kern.* /var/log/kern.log
vi /etc/rsyslog.conf
# restart rsyslog service
systemctl restart rsyslog
Then add a TRACE rule for the NodePort:
iptables -t raw -I OUTPUT -p tcp --dport 31002 -j TRACE
Trigger access again:
curl localhost:31002
Then you can find all iptables rules hit by the request in /var/log/kern.log
. I excerpted the beginning part, keeping only the important fields SRC, DST, and DPT, and replaced the IPs in SRC and DST with <ip1>
and <ip2>
:
Sep 30 10:04:04 [localhost] kernel: TRACE: raw:OUTPUT:policy:3 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: mangle:OUTPUT:policy:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:OUTPUT:rule:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:cali-OUTPUT:rule:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:cali-fip-dnat:return:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:cali-OUTPUT:return:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:OUTPUT:rule:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SERVICES:rule:458 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-NODEPORTS:rule:64 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SVC-EANZGUQV3HZVGERW:rule:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-MARK-MASQ:rule:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-MARK-MASQ:return:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SVC-EANZGUQV3HZVGERW:rule:3 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SEP-E6KGZ7LKAMZFQR62:rule:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: filter:OUTPUT:rule:1 SRC=<ip1> DST=<ip2> DPT=8200
We can see many k8s kube-proxy generated chains starting with KUBE in the nat table. The rule in the KUBE-SEP-E6KGZ7LKAMZFQR62 chain performed DNAT on DST and DPT, changing the node IP and NodePort to the pod IP and port. This explains why blocking NodePort in the filter table does not work, because by the time the request hits the filter table, the port has already been changed to the pod port.
Block before the DNAT rule
After understanding the iptables rules handling NodePort access, the solution is obvious: just block the NodePort before DNAT happens. The iptables flowchart is as follows:
network -> PREROUTING -> routing decision -> INPUT -------> process
raw | mangle
mangle | filter
nat | nat
V
FORWARD
filter
|
|
V
process -> OUTPUT -----> POSTROUTING -> network
raw mangle
mangle nat
nat
filter
This matches the rule hit order we saw in the logs. Therefore, we can add a DROP rule to the OUTPUT chain of the mangle table:
iptables -t mangle -I OUTPUT -p tcp --dport 31002 -j DROP
Then try accessing the NodePort again, and you’ll find it is finally unreachable.
Summary
k8s kube-proxy adds DNAT rules to iptables to forward NodePort access traffic to pods. So if you want to block NodePort, you need to do it before the DNAT rules. For example, you can add a DROP rule in the OUTPUT chain of the mangle table to block local access to NodePort.