Why Using iptables to Block k8s NodePort Does Not Work

Recently, one of our k8s components’ NodePort was scanned with a vulnerability. Fixing the vulnerability is troublesome, and since this NodePort is not necessary, we decided to simply block access traffic to it using iptables. However, we found that the iptables blocking rule did not take effect.

Blocking with the filter table does not work

Our first idea was to add a DROP rule for the NodePort in the INPUT chain of the iptables filter table:

iptables -I INPUT -p tcp --dport 31002 -j DROP

But this rule did not take effect, and access to the NodePort was not blocked.

To rule out problems with the DROP rule itself, we used nc to start a local service listening on port 30146:

nc -l -p 30146 <<<'{"status":"ok"}'

Then blocked it:

iptables -I INPUT -p tcp --dport 30146 -j DROP

Then tried to access:

curl localhost:30146

It was unreachable. This shows that the DROP rule is fine, so the reason it doesn’t work for NodePort may be because the mechanism for accessing NodePort is different from accessing a local process.

Enable iptables logging

To confirm which iptables rules were hit when accessing NodePort, we need to enable iptables logging. For CentOS, enable it as follows:

# load the nf_log_ipv4 kernel module
modprobe nf_log_ipv4

# use the nf_log_ipv4 logger for IPv4 traffic
sysctl net.netfilter.nf_log.2=nf_log_ipv4

# update /etc/rsyslog.conf to include config: kern.* /var/log/kern.log
vi /etc/rsyslog.conf

# restart rsyslog service
systemctl restart rsyslog

Then add a TRACE rule for the NodePort:

iptables -t raw -I OUTPUT -p tcp --dport 31002 -j TRACE

Trigger access again:

curl localhost:31002

Then you can find all iptables rules hit by the request in /var/log/kern.log. I excerpted the beginning part, keeping only the important fields SRC, DST, and DPT, and replaced the IPs in SRC and DST with <ip1> and <ip2>:

Sep 30 10:04:04 [localhost] kernel: TRACE: raw:OUTPUT:policy:3 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: mangle:OUTPUT:policy:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:OUTPUT:rule:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:cali-OUTPUT:rule:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:cali-fip-dnat:return:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:cali-OUTPUT:return:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:OUTPUT:rule:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SERVICES:rule:458 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-NODEPORTS:rule:64 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SVC-EANZGUQV3HZVGERW:rule:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-MARK-MASQ:rule:1 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-MARK-MASQ:return:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SVC-EANZGUQV3HZVGERW:rule:3 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: nat:KUBE-SEP-E6KGZ7LKAMZFQR62:rule:2 SRC=<ip1> DST=<ip1> DPT=31002
Sep 30 10:04:04 [localhost] kernel: TRACE: filter:OUTPUT:rule:1 SRC=<ip1> DST=<ip2> DPT=8200

We can see many k8s kube-proxy generated chains starting with KUBE in the nat table. The rule in the KUBE-SEP-E6KGZ7LKAMZFQR62 chain performed DNAT on DST and DPT, changing the node IP and NodePort to the pod IP and port. This explains why blocking NodePort in the filter table does not work, because by the time the request hits the filter table, the port has already been changed to the pod port.

Block before the DNAT rule

After understanding the iptables rules handling NodePort access, the solution is obvious: just block the NodePort before DNAT happens. The iptables flowchart is as follows:

network  ->  PREROUTING  ->  routing decision ->  INPUT  ------->  process
               raw              |                  mangle
               mangle           |                  filter
               nat              |                  nat
                                V
                             FORWARD
                               filter
                                |
                                |
                                V
process  ->  OUTPUT  ----->  POSTROUTING  ->  network
               raw             mangle
               mangle          nat
               nat
               filter

This matches the rule hit order we saw in the logs. Therefore, we can add a DROP rule to the OUTPUT chain of the mangle table:

iptables -t mangle -I OUTPUT -p tcp --dport 31002 -j DROP

Then try accessing the NodePort again, and you’ll find it is finally unreachable.

Summary

k8s kube-proxy adds DNAT rules to iptables to forward NodePort access traffic to pods. So if you want to block NodePort, you need to do it before the DNAT rules. For example, you can add a DROP rule in the OUTPUT chain of the mangle table to block local access to NodePort.