MetalLB L2 and BGP mode¶

MetalLB L2 mode¶

In L2 mode, Metalb will announce the address of LoadBalancerIP through ARP (for ipv4), NDP (for ipv6). Before Metallb < v0.13.2, Metallb can only be configured via configMap. After v0.13.2, Metallb is configured through CRD resources, and the method of configMap has been deprecated.

In Layer2 mode, when creating a service, Metalb (speaker component) will elect a node in the cluster for this service as the host exposed to the outside world. When a request is made to the externalIP of the Service, this node will reply to the arp request instead of this externalIP. Therefore, the request sent to the Service will first reach this node in the cluster, then pass through the kube-proxy component on this node, and finally direct the traffic to a specific endpoint (endpoint) of this service.

There are three main points in the logic of service election nodes:

First filter out the nodes that are not ready and the nodes where the endpoint is not ready
If the endpoint of the service is distributed on the same node, then filter this node as the arp responder of the service IP
If the endpoints of the service are distributed on different nodes, after calculating node + # + externalIP through sha256, take the first one according to the dictionary order

In this way, MetalLB will select a node for each Service as the exposed host. metallb will direct the traffic of this single Service to a certain node, so this node may become a bottleneck that limits performance. The bandwidth limit of Service will also depend on the bandwidth of a single node, which is also the most important limitation of using ARP or NDP.

Also, when this node fails, MetalLB needs to re-elect a new node for the service. Metallb will then send a "gratis" arp to the client, telling the client that their Mac address cache needs to be updated. Traffic is still forwarded to the failed node until the client updates the cache. So from a certain point of view: the time of failover depends on the speed at which the client updates the Mac address cache.

Usage¶

Create an IP pool

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: demo-pool
  namespace: metallb-system
  labels:
    ipaddresspool: demo
spec:
  addresses:
  - 192.168.10.0/24
  - 192.168.9.1-192.168.9.5
  - fc00:f853:0ccd:e799::/124
  autoAssign: true
  avoidBuggyIPs: false

addresses: IP address list, each list member can be a CIDR, it can be an address range (such as 192.168.9.1 - 192.168.9.5), or it can be different ipFamily, Metallb will allocate IP from it Service LoadBalancer
autoAssign: Whether to automatically assign the IP address, the default is true. In some cases (insufficient IP addresses or public IPs), you don't want the IPs in the pool to be assigned easily, can be set to false. You can set annotations: metallb.universe.tf/address-pool: pool-name in service. Or set the IP in the spec.LoadBalancerIP field (note that this method has been marked as abandoned by k8s).
avoidBuggyIPs: Whether to avoid using .0 or .255 addresses in the pool, the default is false.

Configure LoadBalancerIP advertisement rule (L2)

Bind IP pools via L2Advertisement, which tells Metallb that these addresses should be advertised by ARP or NDP.

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: demo
  namespace: metallb-system  
spec:
  ipAddressPools:
  - demo-pool
  ipAddressPoolSelectors:
  - matchLabels:
      ipaddresspool: demo
  nodeSelectors:
  - matchLabels:
      kubernetes.io/hostname: kind-control-plane

ipAddressPools: optional, filter IP pools by name, if ipAddressPools and ipAddressPoolSelectors are not specified at the same time, it will be applied to all IP pools.
ipAddressPoolSelectors: optional, filter IP pools through labels, if ipAddressPools and ipAddressPoolSelectors are not specified at the same time, it will act on all IP pools.
nodeSelectors: Optional, used to filter which nodes are used as the next hop of loadBalancerIP, default to all nodes.
Create LoadBalancerService
```
apiVersion: v1
kind: Service
metadata:
  name: metallb1-cluster
  labels:
    name: metallb
      #annotations:
      #metallb.universe.tf/address-pool: lan
spec:
  type: LoadBalancer
  allocateLoadBalancerNodePorts: false
  ports:
  - port: 18081
    targetPort: 8080
    protocol: TCP
  selector:
    app: metallb-cluster
```
Just specify spec.type=LoadBalancer, so that Metallb will naturally take over the lifecycle of this Service.

Note

If you want the Service to allocate addresses from the specified IP pool, specify through annotations: metallb.universe.tf/address-pool: <pool-name>. Or specify the IP through the service.spec.loadBalancerIP field (need to ensure that it exists in a pool, this method is not recommended). If there are multiple load balancers, they can be specified through the service.spec.loadBalancerClass field. When deploying Metalb, it can be configured by --lb-class flag.

Load Balancing¶

When Service.spec.externalTrafficPolicy=cluster

In this mode, it has good load balancing, but the traffic may go through multiple hops, which will hide the source IP of the client.

                                  ______________________________________________________________________________
                                |                       -> kube-proxy(SNAT) -> pod A                          |
                                |                      |                                                      |
client -> loadBalancerIP:port -> | -> node A（Leader） ->                                                       |
                                |                      |                                                      ｜
                                |                       -> kube-proxy(SNAT) -> node B -> kube-proxy -> pod B  ｜
                                  ------------------------------------------------------------------------------

When Service.spec.externalTrafficPolicy=local

In this mode, the source IP of the client will be reserved, but the load balancing is poor, and the traffic will go to a certain backend Pod.

                                  __________________________________________________________________________________________
                                |                       -> kube-proxy -> pod A (the backend Pod is on this node)                            |
                                |                      |                                                                  |
client -> loadBalancerIP:port -> | -> node A（Leader） ->                                                                   |
                                |                      |                                                                  ｜
                                |                       -> kube-proxy -> node B -> kube-proxy -> pod B (the backend Pod is on a different node)  ｜
                                  ------------------------------------------------------------------------——————————————————

MetalLB BGP Mode(L3)¶

The Layer2 mode is limited to a two-layer network, and the traffic flowing to the Service will be forwarded to a specific node first, which is not a real load balancing. The BGP mode is not limited to a Layer 2 network. Each node in the cluster will establish a BGP session with the BGP Router, and declare that the next hop of the ExternalIP of the Service is the cluster node itself. In this way, external traffic can be connected to the cluster through the BGP Router, and every time the BGP Router receives new traffic destined for the LoadBalancer IP address, it will create a new connection to the node. But which node to choose, each router manufacturer has a specific algorithm to achieve. So from that point of view, this has good load balancing.

Usage¶

Create an IP pool

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: bgp-pool
  namespace: metallb-system
  labels:
    ipaddresspool: demo
spec:
  addresses:
  - 192.168.10.0/24
  autoAssign: true
  avoidBuggyIPs: false

Configure LoadBalancerIP advertisement rule (L3)

Note

BGP mode requires hardware support to run the BGP protocol. If not, software such as frr, bird can be used instead.

It is recommended to use frr for installation:

#ubuntu
apt install frr
#centos
yum install frr

frr configures BGP:

router bgp 7675 # Bgp as number
bgp router-id 172.16.1.1 # route-id is usually the interface IP
no bgp ebgp-requires-policy # close ebpf filter !!!
neighbor 172.16.1.11 remote-as 7776 # Configure ebgp -> neighbor 1, 172.16.1.11 as a cluster node
neighbor 172.16.1.11 description master1 # description
neighbor 172.16.2.21 remote-as 7776 # node 2
neighbor 172.16.2.21 description woker1

Metalb configuration:

Configure BGPAdvertisement

This CRD is mainly used to specify the IP pool that needs to be announced through BGP. Like the L2 mode, it can be filtered by the pool name or labelSelector. At the same time, some attributes of BGP can be configured:
```
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: local
  namespace: metallb-system
spec:
  ipAddressPools:
  -bgp-pool
  aggregationLength: 32
```
- aggregationLength: route suffix aggregation length, the default is 32, which means that the mask of the route advertised by BGP is 32, the value can be reduced to aggregate the number of routes
- aggregationLengthV6: Same as above, for ipv6, default is 128
- ipAddressPools: []string, select the IP pools that need to be advertised by BGP
- ipAddressPoolSelectors: filter IP pools by label
- nodeSelectors: Filter the next hop nodes of loadBalancerIP by node label, default is all nodes
- peers: []string, the name of a BGPPeer object declaring which BGP sessions this BGPAdvertisement applies to
- communities: Refer to BGP communities, you can configure it directly, or specify the name of the communities CRD
Configure BGP Peers

BGP Peer is used to configure BGP session configuration, including peer BGP AS and IP, etc.
```
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: test
  namespace: metallb-system
spec:
  myASN: 7776
  peerASN: 7675
  peerAddress: 172.16.1.1
  routerID: 172.16.1.11
```
- myASN: local ASN, the range is 1-64511(public AS), 64512-65535(private AS)
- peerASN: Peer ASN, the scope is the same as above. if both are equal, then iBGP; otherwise, eBGP
- peerAddress: peer router IP address
- sourceAddress: Specify the address for establishing a BGP session in this segment, which is automatically selected from the network card of this node by default
- nodeSelectors: Specify which nodes need to establish a session with the BGP Router according to the node label

Create a Service of type LoadBalancer

$ kubectl get svc | grep LoadBalancer
metallb-demo LoadBalancer 172.31.63.207 10.254.254.1 18081:30531/TCP 3h38m

Verify¶

You can see the routes learned through BGP on the BGP Router:

$ vtysh
Hello, this is FRRouting (version 8.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
router# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
        O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
        T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
        f - OpenFabric,
        > - selected route, * - FIB route, q - queued, r - rejected, b - backup
        t-trapped, o-offload failure
K>* 0.0.0.0/0 [0/100] via 10.0.2.2, eth0, src 10.0.2.15, 03:52:17
C>* 10.0.2.0/24 [0/100] is directly connected, eth0, 03:52:17
K>* 10.0.2.2/32 [0/100] is directly connected, eth0, 03:52:17
B>* 10.254.254.1/32 [20/0] via 172.16.1.11, eth1, weight 1, 03:32:16
   * via 172.16.2.21, eth2, weight 1, 03:32:16
C>* 172.16.1.0/24 is directly connected, eth1, 03:52:17

You can see that the next hops to LoadBalancerIP are cluster node 1 and node 2 respectively, and perform a connectivity test on the BGP Router:

root@router:~# curl 10.254.254.1:18081
{"pod_name":"metallb-demo","pod_ip":"172.20.166.20","host_name":"worker1","client_ip":"172.20.161.0"}

`FRR Mode`¶

Currently there are two Backend implementations of Metallb BGP mode: Native BGP and FRR BGP.

FRR BGP is currently in the experimental stage. Compared with Native BGP, FRR BGP has the following advantages:

BFD protocol support (improves fault response capability, shortens fault time)
Support IPV6 BGP
Support ECMP

MetalLB L2 and BGP mode¶

MetalLB L2 mode¶

Usage¶

Load Balancing¶

MetalLB BGP Mode(L3)¶

Usage¶

Verify¶

FRR Mode¶

Comments

`FRR Mode`¶