Better Alignment：多可用区双层负载下,如何借助F5避免局部NGINX后业务实例过载

2020年01月3日 8471点热度 9人点赞 0条评论

场景

LTM给NGINX做LB是一种较为典型的双层负载均衡，也就是典型的L4.L7分离的双层负载均衡方案。

在这样的架构下，如果多个NGINX背后所负载的server是一致的话，并不会出现不同的NGINX所面对的server可用数量不同情况。

但是，如果LTM的pool member中的NGINX是位于不同的可用区或者不同的DC，此时LTM如仅做应用层负载均衡或仅monitor nginx本身，那么LTM是无法感知到 NGINX 背后（upstream）到底有多少可用的业务服务器。如果某个 NGINX 的upstream中可用服务器已经很少，此时LTM会依旧分配同等数量的连接请求给该NGINX，会导致该 NGINX 后的服务器过载，从而降低服务质量。

思路

如果我们能够让LTM感知到NGINX的upstream中当前有多少可用的服务器，并设置一个阀值，如低于该可用数量则LTM不再向该NGINX实例分配连接。这样就可以较好的避免上述问题。运维人员可根据LTM报出的日志或 Telemetry Streaming输出，及时触发相关自动化流程对该NGINX下的服务实例进行快速扩容，当可用服务实例数量恢复大于阀值后，LTM则又开始向该NGINX分配新的连接。

NGINX Plus本身提供了一个API endpoint，通过获取该API并做相应处理即可获得可用的服务器实例数量，在LTM上则可以利用external monitor实施对该API的自动化监控与处理。

方案

1. 获取的API资源路径是： http://your-domain.com/api/6/http/upstreams/your-upstream-name/?fields=peers
注：api后的版本6可能会因nginx plus的版本不同而不同.

2.返回的内容示例如下，我们主要关心state: up, 只要获取到总的state: up数量即可

{
  "peers": [
    {
      "id": 0,
      "server": "10.0.0.1:8080",
      "name": "10.0.0.1:8080",
      "backup": false,
      "weight": 1,
      "state": "up",
      "active": 0,
      "requests": 3468,
      "header_time": 778,
      "response_time": 778,
      "responses": {
        "1xx": 0,
        "2xx": 3435,
        "3xx": 6,
        "4xx": 20,
        "5xx": 4,
        "total": 3465
      },
      "sent": 1511086,
      "received": 99693373,
      "fails": 0,
      "unavail": 0,
      "health_checks": {
        "checks": 1754,
        "fails": 0,
        "unhealthy": 0,
        "last_passed": true
      },
      "downtime": 0,
      "selected": "2020-01-03T07:52:57Z"
    },
    {
      "id": 1,
      "server": "10.0.0.1:8081",
      "name": "10.0.0.1:8081",
      "backup": true,
      "weight": 1,
      "state": "unhealthy",
      "active": 0,
      "requests": 0,
      "responses": {
        "1xx": 0,
        "2xx": 0,
        "3xx": 0,
        "4xx": 0,
        "5xx": 0,
        "total": 0
      },
      "sent": 0,
      "received": 0,
      "fails": 0,
      "unavail": 0,
      "health_checks": {
        "checks": 1759,
        "fails": 1759,
        "unhealthy": 1,
        "last_passed": false
      },
      "downtime": 17588406,
      "downstart": "2020-01-03T03:00:00.427Z"
    }
  ]
}

{

"peers": [

{

"id": 0,

"server": "10.0.0.1:8080",

"name": "10.0.0.1:8080",

"backup": false,

"weight": 1,

"state": "up",

"active": 0,

"requests": 3468,

"header_time": 778,

"response_time": 778,

"responses": {

"1xx": 0,

"2xx": 3435,

"3xx": 6,

"4xx": 20,

"5xx": 4,

"total": 3465

"sent": 1511086,

"received": 99693373,

"fails": 0,

"unavail": 0,

"health_checks": {

"checks": 1754,

"fails": 0,

"unhealthy": 0,

"last_passed": true

"downtime": 0,

"selected": "2020-01-03T07:52:57Z"

{

"id": 1,

"server": "10.0.0.1:8081",

"name": "10.0.0.1:8081",

"backup": true,

"weight": 1,

"state": "unhealthy",

"active": 0,

"requests": 0,

"responses": {

"1xx": 0,

"2xx": 0,

"3xx": 0,

"4xx": 0,

"5xx": 0,

"total": 0

"sent": 0,

"received": 0,

"fails": 0,

"unavail": 0,

"health_checks": {

"checks": 1759,

"fails": 1759,

"unhealthy": 1,

"last_passed": false

"downtime": 17588406,

"downstart": "2020-01-03T03:00:00.427Z"

}

]

}

3. 因此可以编写如下python脚本：

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import sys
import urllib2
import json

def get_nginxapi(url):
    ct_headers = {'Content-type':'application/json'}
    request = urllib2.Request(url,headers=ct_headers)
    response = urllib2.urlopen(request)
    html = response.read()
    return html

api = sys.argv[3]

try:
    data = get_nginxapi(api)
    data = json.loads(data)
except:
    data = ''
m = 0
lowwater = int(sys.argv[4])
try:
    for peer in data['peers']:
        state = peer['state']
        if state == 'up':
            m = m + 1
except:
    m = 0
#print data['peers'][]['state']
#print m
if m >= lowwater:
    print 'UP'