Avoiding collisions
One of the primary philosophies of Kubernetes is that users should not be exposed to situations that could cause their actions to fail through no fault of their own. In this situation, we are looking at network ports - users should not have to choose a port number if that choice might collide with another user. That is an isolation failure.
In order to allow users to choose a port number for their Services
, we must ensure that no two Services
can collide. We do that by allocating each Service
its own IP address.
To ensure each service receives a unique IP, an internal allocator atomically updates a global allocation map in etcd prior to creating each service. The map object must exist in the registry for services to get IPs, otherwise creations will fail with a message indicating an IP could not be allocated. A background controller is responsible for creating that map (to migrate from older versions of Kubernetes that used in memory locking) as well as checking for invalid assignments due to administrator intervention and cleaning up any IPs that were allocated but which no service currently uses.
IPs and VIPs
Unlike Pod
IP addresses, which actually route to a fixed destination, Service
IPs are not actually answered by a single host. Instead, we use iptables
(packet processing logic in Linux) to define virtual IP addresses which are transparently redirected as needed. When clients connect to the VIP, their traffic is automatically transported to an appropriate endpoint. The environment variables and DNS for Services
are actually populated in terms of the Service
’s VIP and port.
We support two proxy modes - userspace and iptables, which operate slightly differently.
Userspace
As an example, consider the image processing application described above. When the backend Service
is created, the Kubernetes master assigns a virtual IP address, for example 10.0.0.1. Assuming the Service
port is 1234, the Service
is observed by all of the kube-proxy
instances in the cluster. When a proxy sees a new Service
, it opens a new random port, establishes an iptables redirect from the VIP to this new port, and starts accepting connections on it.
When a client connects to the VIP the iptables rule kicks in, and redirects the packets to the Service proxy
’s own port. The Service proxy
chooses a backend, and starts proxying traffic from the client to the backend.
This means that Service
owners can choose any port they want without risk of collision. Clients can simply connect to an IP and port, without being aware of which Pods
they are actually accessing.
Iptables
Again, consider the image processing application described above. When the backend Service
is created, the Kubernetes master assigns a virtual IP address, for example 10.0.0.1. Assuming the Service
port is 1234, the Service
is observed by all of the kube-proxy
instances in the cluster. When a proxy sees a new Service
, it installs a series of iptables rules which redirect from the VIP to per-Service
rules. The per-Service
rules link to per-Endpoint
rules which redirect (Destination NAT) to the backends.
When a client connects to the VIP the iptables rule kicks in. A backend is chosen (either based on session affinity or randomly) and packets are redirected to the backend. Unlike the userspace proxy, packets are never copied to userspace, the kube-proxy does not have to be running for the VIP to work, and the client IP is not altered.
This same basic flow executes when traffic comes in through a node-port or through a load-balancer, though in those cases the client IP does get altered.
Using the userspace proxy for VIPs will work at small to medium scale, but will not scale to very large clusters with thousands of Services. See the original design proposal for portals for more details.
Using the userspace proxy obscures the source-IP of a packet accessing a Service
. This makes some kinds of firewalling impossible. The iptables proxier does not obscure in-cluster source IPs, but it does still impact clients coming through a load-balancer or node-port.
The Type
field is designed as nested functionality - each level adds to the previous. This is not strictly required on all cloud providers (e.g. Google Compute Engine does not need to allocate a NodePort
to make LoadBalancer
work, but AWS does) but the current API requires it.
External IPs
If there are external IPs that route to one or more cluster nodes, Kubernetes services can be exposed on those externalIPs
. Traffic that ingresses into the cluster with the external IP (as destination IP), on the service port, will be routed to one of the service endpoints. externalIPs
are not managed by Kubernetes and are the responsibility of the cluster administrator.
In the ServiceSpec, externalIPs
can be specified along with any of the ServiceTypes
. In the example below, my-service can be accessed by clients on 80.11.12.10:80 (externalIP:port)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<span class="s">kind</span><span class="pi">:</span> <span class="s">Service</span> <span class="s">apiVersion</span><span class="pi">:</span> <span class="s">v1</span> <span class="s">metadata</span><span class="pi">:</span> <span class="s">name</span><span class="pi">:</span> <span class="s">my-service</span> <span class="s">spec</span><span class="pi">:</span> <span class="s">selector</span><span class="pi">:</span> <span class="s">app</span><span class="pi">:</span> <span class="s">MyApp</span> <span class="s">ports</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">name</span><span class="pi">:</span> <span class="s">http</span> <span class="s">protocol</span><span class="pi">:</span> <span class="s">TCP</span> <span class="s">port</span><span class="pi">:</span> <span class="s">80</span> <span class="s">targetPort</span><span class="pi">:</span> <span class="s">9376</span> <span class="s">externalIPs</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">80.11.12.10</span> |
文章评论