Kubernetes Networking Deep Dive: Part 1
Foundations
As promised in the Introduction here is the first post in series following the life of a packet through a Kubernetes cluster. Before I start, let’s establish the foundational concepts: how Kubernetes allocates IP addresses to resources, how pods get their network interfaces, and what components in a cluster are responsible for routing traffic.
The Kubernetes Networking Model
Kubernetes has three fundamental requirements in any networking implementation:
Pods can communicate with all other pods on any node without NAT
Nodes can communicate with all pods without NAT
The IP that a pod sees for itself is the same IP that other pods see for it
These requirements are defined in the Kubernetes documentation and any CNI plugin you install in your cluster must satisfy them. This also means every pod gets a routable IP address.
This is different from, say, default Docker networking, where containers on different hosts cannot communicate without port mapping or overlay configuration (was always a bit annoying which is why I like to work in Kubernetes). Kubernetes abstracts this away: from an app perspective, all pods are directly reachable by IP, though services are preferred for connectivity.
Reference: Kubernetes Networking Model documentation at: https://kubernetes.io/docs/concepts/services-networking/
IP Address Allocation
A Kubernetes cluster uses two separate IP ranges defined when a cluster is first created.
Pod CIDR
The “Pod CIDR” is the IP range from which all pod IP addresses are allocated.
For example: the CIDR block 10.244.0.0/16 provides 65,536 addresses. The cluster divides this range among all the nodes in a cluster. Each node gets a chunk, say via a /24 subnet mask (providing 256 addresses for the node), and from here Kubernetes assigns IPs to pods that get scheduled on that node.
For example in a 3-node cluster:
Cluster pod CIDR: 10.244.0.0/16
Node 1 range: 10.244.0.0/24 (pods get 10.244.0.2, 10.244.0.3, etc.)
Node 2 range: 10.244.1.0/24 (pods get 10.244.1.2, 10.244.1.3, etc.)
Node 3 range: 10.244.2.0/24 (pods get 10.244.2.2, 10.244.2.3, etc.)
The node’s kubelet, along with the CNI plugin, handles the IP assignment when a pod starts up.
NOTE:
Remember from Kubernetes basics that Kubernetes itself does not provide any networking functionality per se. It has a Container Networking Interface (CNI) which lets you install whatever networking plugin you wish and the kubelet doesn’t have to care which. I’ll talk more about that later.
Service CIDR
The Service CIDR is a different range used for ClusterIP services. For example, let’s use 10.96.0.0/12. Unlike pod IPs, service IPs are actually virtual. They DO NOT get assigned to any network interface. They are only entries in iptables (or IPVS rules) for redirecting network traffic to pod endpoints. Neither are exactly fun to manage which is part of the beauty of Kubernetes. It takes care of all that for you.
When a Service gets created, Kubernetes grabs an IP from this range and updates the routing rules on EVERY node to handle traffic destined for that specific Service IP.
Showing the Cluster CIDR Configuration
You can check out the CIDR ranges for your cluster with kubectl commands like the below:
# View the pod CIDR, for example
kubectl cluster-info dump | grep -m 1 cluster-cidr
# The output would contain something like: cluster-cidr=10.244.0.0/16
# View the service CIDR, for example
kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
# The output would contain something like: service-cluster-ip-range=10.96.0.0/12
# View the CIDR allocated to a specific node
kubectl get node node-1 -o jsonpath=’{.spec.podCIDR}’
# Output would be something like: 10.244.0.0/24Reference: Cluster Networking documentation at: https://kubernetes.io/docs/concepts/cluster-administration/networking/
Network Namespaces and the Pod Sandbox
As I was researching this topic, I dug into some interesting things about how Kubernetes actually grabs IP addresses from the Pod CIDR, and handles pod creation and isolation,
Every pod has its own Linux network namespace. A network namespace (not to be confused with a Kubernetes Namespace) provides isolated networking components: its own interfaces, routing tables, iptables rules, etc. Processes running in one networking namespace cannot see nor interact with network resources in another namespace unless explicitly connected. This helps give pods their own isolated environments for their containers.
The Pause Container
Whenever Kubernetes creates a new pod, the container runtime (e.g. containerd, cri-o) first creates what is called a “pause” container, also known as a “sandbox” container. This pause container doesn’t actually do anything. It just waits forever, but holds the network namespace for the upcoming workload containers for the pod.
Then these workload containers (e.g. nginx, your app, a logging sidecar) join this existing networking namespace instead of creating their own. This shared namespace is how containers within the same pod share the same IP address and can communicate with each other over localhost. This eases the sidecar paradigm. Kinda neat.
You can check out pause containers on a node (though you’ll seldom have to do so, it’s still interesting to see how things work):
# On a node that has containerd running, you use the ctr command
sudo ctr -n k8s.io containers list | grep pause
# Output example:
# a1b2c3d4e5f6 registry.k8s.io/pause:3.9 io.containerd.runc.v2
# Or with crictl if you are using cri-o
sudo crictl ps -a | grep pause
# Output example:
# 7f8e9d0c1b2a 3 hours ago Running pause 0 abc123def456 nginx-podHere’s more info on container runtimes.
Examining Network Namespaces
With root access on a node, you can inspect a pod’s networking namespace. Note that in many cases you may not have direct access to a node, especially in a production environment, so you may be able to do it via a pod with the right security context.
# List network namespaces (requires root on the node or a pod must have CAP_SYS_ADMIN)
sudo lsns -t net
# Output example:
# NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
# 4026531840 net 145 1 root unassigned /sbin/init
# 4026532509 net 2 1842 65535 0 /run/netns/cni-a1b2c3d4-e5f6 /pause
# 4026532592 net 3 2156 65535 1 /run/netns/cni-f7g8h9i0-j1k2 /pause
# You can also “enter” a pod’s network namespace and inspect it
POD_PID=$(sudo crictl inspect <container-id> | jq .info.pid)
sudo nsenter -t $POD_PID -n ip addr
# Output example:
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
# link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# inet 127.0.0.1/8 scope host lo
# valid_lft forever preferred_lft forever
# 3: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
# link/ether 62:a1:b2:c3:d4:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
# inet 10.244.0.5/24 brd 10.244.0.255 scope global eth0
# valid_lft forever preferred_lft foreverThat funny-looking eth0@if12 name indicates that this is one end of a veth pair (next section), with the other end being interface index 12 on the node itself.
Reference: Linux network namespaces documentation in man 7 network_namespaces and Kubernetes Pod documentation at: https://kubernetes.io/docs/concepts/workloads/pods/
Virtual Ethernet Pairs
To understand a veth (virtual ethernet) pair, think of a networking cable (virtual, of course) connecting two different network namespaces. This functionality is part of Linux. Packets sent to one side of the pair come out of the other side. Kubernetes standard networking uses veth pairs to join pod namespaces to the host namespace and is how packets travel into and out of a node in a cluster. Without a veth pair (or similar such as macvlan), the pod's network namespace would be isolated, unable to communicate with anything else.
When a pod is created:
The CNI plugin creates the veth pair for the pod.
One end of the pair is placed in the pod namespace (usually called something like eth0)
The other end stays in the host namespace (named vethXXXXXX or something like that)
The host end is attached to a bridge or configured with routes. More on that further below.
Taking a look at the veth Pairs
# On the node, list veth interfaces
ip link show type veth
# Output something like:
# 12: veth9f8e7d6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP
# link/ether 8a:1b:2c:3d:4e:5f brd ff:ff:ff:ff:ff:ff link-netns cni-a1b2c3d4-e5f6
# The “master cni0” means these are attached to a bridge named cni0
# The “link-netns” shows which network namespace the other end is located in
# View the bridge and its attached interfaces
bridge link show
# Output:
# 12: veth9f8e7d6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 master cni0 state forwardingReference: veth documentation in man 4 veth
Container Network Interface (CNI)
The CNI is a Kubernetes spec and set of libraries that configure network interfaces for Linux containers. As mentioned above, Kubernetes does not implement networking directly. Instead, it defers the functionality to CNI plugins that do the actual work; this “loose coupling” provides the flexibility to use different types of networking, including cloud networking.
What does the CNI actually do?
When a kubelet needs to set up the networking for a new pod:
The kubelet will call whichever container runtime is installed (containerd, cri-o)
The runtime creates the pod pause container with a new network namespace
The runtime invokes the CNI plugin specified in the node’s CNI configuration.
The CNI plugin then configures these in the networking namespace: creation of interfaces, assignment of IPs, setting up routes within the pod’s routing table.
The CNI plugin then returns the configuration to the runtime, which reports it back to the kubelet
CNI plugins perform three operations:
ADD (configure networking for a new container)
DEL (clean up networking when container stops)
CHECK (verify configuration is correct).
Are you still with me? Good... let’s keep going! 😅
CNI Configuration
CNI configuration lives in /etc/cni/net.d/ on each node. For example, using a flannel CNI plugin the below is a standard config. I won’t go into what the configuration options are.
cat /etc/cni/net.d/10-flannel.conflist
# Output:
# {
# “name”: “cbr0”,
# “cniVersion”: “0.3.1”,
# “plugins”: [
# {
# “type”: “flannel”,
# “delegate”: {
# “hairpinMode”: true,
# “isDefaultGateway”: true
# }
# },
# {
# “type”: “portmap”,
# “capabilities”: {
# “portMappings”: true
# }
# }
# ]
# }Overlay vs Routed Networking
Here’s where it gets even more interesting...
CNI plugins fall into two large categories depending on how they handle cross-node traffic:
Overlay Networks
Encapsulate pod traffic in an outer packet with node IPs
Function on any kind of network infrastructure
Add overhead: extra headers reduce the effective MTU* , plus encap/decap adds CPU overhead
Examples: Flannel (VXLAN mode), Calico (VXLAN or IPinIP mode), Cilium (VXLAN mode)
*MTU = Maximum Transmission Unit, the largest packet size a network link can carry, usually 1500 bytes for Ethernet. With the overhead of encapsulating with more headers, you can transmit less data.
Routed Networks (BGP, host routing)
Pod IPs are routed directly on the physical network connecting nodes
Require network infrastructure configuration: BGP peering or static routes
No encapsulation overhead, full MTU available
Pod IPs visible in network flow logs and to firewalls
Examples: Calico (BGP mode), Cilium (native routing)
Oof that’s a lot! I’ll go into both in detail in Part 2 when tracing cross-node pod-to-pod communication.
Reference: CNI specification at https://www.cni.dev/docs/spec/ and Kubernetes Network Plugins documentation at: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
kube-proxy and Service Routing
The kube-proxy is a Kubernetes component that runs on every node in a cluster and maintains all the network rules defined to route traffic to Services. Despite how it’s named, it does not proxy any traffic itself. Instead, it configures the kernel’s packet filtering and NAT facilities.
What kube-proxy actually does
kube-proxy watches the Kubernetes API for Service and EndpointSlice (the latter you generally won’t have to deal with directly) objects. Whenever these change, it updates the node’s packet processing (e.g. iptables) rules to:
Intercept traffic whose destination is a Service’s ClusterIP
Select a backend pod (load balancing)
Redirect the traffic to that pod’s IP (DNAT - mapping a public IP/port to private)
Handle return traffic back to the source
What kube-proxy Does Not Do
kube-proxy is not involved in pod-to-pod communication that does not go through a Service. When Pod A communicates directly with Pod B’s IP address, the traffic is handled by the CNI networking layer. kube-proxy’s rules are only followed when the destination is a Service IP or NodePort.
kube-proxy Modes: iptables vs IPVS
kube-proxy can operate in two main modes (a third, nftables, is newer and I won’t be covering it). The mode affects how Service routing rules are implemented.
iptables Mode
In iptables mode, kube-proxy creates iptables firewall rules for each Service and endpoint.
Advantages:
iptables has been around for a while so it’s mature and well-understood (as well as one can understand that dark art!)
No additional kernel modules are required
Works on any Linux distribution (well, I’d say most)
Disadvantages:
Rule count can grow with more Services and endpoints
Rule updates require rewriting entire chains adding latency for a large set of rules.
Sequential rule evaluation can add latency (if you have a large list and the matching rule is towards the end)
Performance characteristics:
Works well up to approximately 1,000 Services after which things start to slow down because of the sequential processing of rules.
Rule-update latency increases beyond 5,000 Services and updates are not atomic (i.e. the whole thing needs to be updated if there is one change)
Memory usage for rules can become quite a bit as you scale up
Viewing iptables rules (if you like to torture yourself):
# List Service-related NAT rules
sudo iptables -t nat -L KUBE-SERVICES -n | head -20
# Output example:
# Chain KUBE-SERVICES (2 references)
# target prot opt source destination
# KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
# KUBE-SVC-TCOU7JCQXEZGVUNU udp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
# KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
# Count total Service-related rules
sudo iptables -t nat -L -n | wc -l
# Output example: 847IPVS Mode
In IPVS mode, kube-proxy uses the kernel’s IPVS (IP Virtual Server) subsystem. IPVS is good at load balancing and uses hash tables for instant lookup regardless of the number of services so that speeds things up.
Advantages:
Rule matching via hash tables (fast!)
Supports multiple load balancing algorithms
Lower latency rule updates
Better performance at scale (10,000+ Services or so)
Disadvantages:
Requires IPVS kernel modules
More complex debugging (ipvsadm command)
Still uses iptables for some functions (masquerading (SNAT), NodePort handling)
Viewing IPVS rules (here’s some more fun!):
# Check if IPVS mode is active
sudo ipvsadm -Ln | head -10
# Output:
# IP Virtual Server version 1.2.1 (size=4096)
# Prot LocalAddress:Port Scheduler Flags
# -> RemoteAddress:Port Forward Weight ActiveConn InActConn
# TCP 10.96.0.1:443 rr
# -> 192.168.1.10:6443 Masq 1 3 0
# TCP 10.96.0.10:53 rr
# -> 10.244.0.2:53 Masq 1 0 0
# -> 10.244.1.3:53 Masq 1 0 0
# UDP 10.96.0.10:53 rr
# -> 10.244.0.2:53 Masq 1 0 12
# -> 10.244.1.3:53 Masq 1 0 8
# View IPVS connection tracking
sudo ipvsadm -Lnc | head -10
# Output:
# IPVS connection entries
# pro expire state source virtual destination
# TCP 14:56 ESTABLISHED 10.244.0.5:48892 10.96.0.1:443 192.168.1.10:6443Choosing Between Modes (The right mode for the right job)
Use iptables mode when things are smaller:
Running smaller clusters (under 1,000 Services)
You want to keep it simple
You don’t/can’t have IPVS kernel modules installed
Use IPVS mode when things scale up:
Running larger clusters (1,000+ Services)
You need specific load balancing algorithms
Service creation/update latency reduction is important
To configure the mode, set --proxy-mode in kube-proxy’s configuration:
# Check current mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# Output: mode: “ipvs” or “iptables” or empty (defaults to iptables)Reference: kube-proxy documentation at: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ and IPVS-based proxying at https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-ipvs
OSI Model Context
As we trace packets through the cluster in later posts, we will reference OSI layers to clarify where processing occurs:
Layer 7 (Application): HTTP, gRPC, DNS queries. This is where your application code operates.
Layer 4 (Transport): TCP/UDP. Ports, connections, and load balancing decisions happen here.
Layer 3 (Network): IP addresses and routing. CNI plugins, iptables, NAT, and IPVS operate at this layer.
Layer 2 (Data Link): MAC addresses and switching. Bridges like cni0 and VXLAN encapsulation operate here.
Kubernetes networking operates mostly at Layers 3 and 4, with Layer 2 involvement for local bridging and overlay encapsulation.
Summary
This post covered the foundational concepts required to understand Kubernetes networking:
The flat network model guarantees pod-to-pod communication without NAT
Pod CIDR provides addresses for pods; Service CIDR provides virtual IPs for Services
Each pod runs in an isolated network namespace
Veth pairs connect pod namespaces to the host network
CNI plugins handle the actual network interface configuration
kube-proxy handles Service routing rules using iptables or IPVS
Overlay and routed networking are the two main approaches for cross-node traffic
Part 2 will trace packets through pod-to-pod communication, covering both same-node traffic (through the bridge) and cross-node traffic (using VXLAN overlay and BGP routing).

