ETSI NFV-SOL CNF 自动愈合,通过 FM 接口与 Prometheus

本文档描述了如何通过故障管理接口,在 Tacker v2 API 中自动愈合 CNF。

注意

本文档的内容已确认在使用 Prometheus 2.45 和 Alertmanager 0.26 时有效。

概述

使用故障管理接口,有两种方法可以实现自动愈合:轮询模式和通知模式。

下图显示了 CNF 自动愈合的概述。

  1. 创建 FM 订阅(通知模式)

    NFVO 向 Tacker 发送请求以创建 FM 订阅。

  2. 收集指标

    Prometheus 收集指标并决定是否需要触发警报。

  3. POST 警报

    Prometheus 将警报发送到 Tacker。

  4. 将警报转换为告警

    Tacker 接收到通知的警报,将其转换为告警,并将其保存到 Tacker 数据库。

  5. 获取告警并返回结果(轮询模式)

    NFVO 以定期的时间间隔发送请求以获取 Tacker 中的告警。Tacker 使用 NFVO 指定的查询条件搜索 Tacker 数据库,并将匹配条件的告警返回给 NFVO。

  6. 发送告警通知(通知模式)

    VnffmDriver 查找数据库中的所有 FM 订阅,并将警报与其匹配。如果存在可以成功匹配的 FM 订阅,则将告警发送到 NFVO 指定的路径。如果匹配不成功,则处理结束。

  7. 愈合

    NFVO 从告警中识别 CNF 的故障,并向 Tacker 发送愈合请求。

  8. 调用 Kubernetes API

    在 tacker-conductor 中,请求会根据实例化参数的内容再次重定向到适当的基础设施驱动程序(在本例中为 Kubernetes 基础设施驱动程序)。然后,Kubernetes 基础设施驱动程序调用 Kubernetes API。

  9. 创建一个新的 Pod

    Kubernetes Master 根据 API 调用添加 Pod 的数量。

  10. 删除旧的 Pod

    Kubernetes Master 根据 API 调用删除 Pod 的数量。

../../../../_images/auto_heal_fm.svg

先决条件

如何配置 Prometheus 插件

Prometheus 插件默认在 Tacker 中禁用。要使其工作,我们需要在 tacker.conf 中找到 fault_management,并将其值更改为 True

$ vi /etc/tacker/tacker.conf
...
[prometheus_plugin]
fault_management = True
[v2_vnfm]
# Enable https access to notification server from Tacker (boolean value)
notification_verify_cert = true
...

修改配置文件后,请不要忘记重启 Tacker 服务以使更改生效。

$ sudo systemctl stop devstack@tacker
$ sudo systemctl restart devstack@tacker-conductor
$ sudo systemctl start devstack@tacker

如何配置 Prometheus

与通过 PM 接口进行自动扩展不同,通过 FM 接口进行自动愈合不需要通过 SSH 登录 Prometheus 服务器来修改其配置。用户需要手动修改 Prometheus 的配置文件,然后它将监视指定的资源。

有关 Prometheus 配置文件设置方法,请参阅 Prometheus 配置 以获取详细信息。

以下是示例 prometheus.yml 的内容

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - <IP of Alertmanager>:9093

rule_files:
- "tacker-samplevnf-rules.yaml"

scrape_configs:
- job_name: "kube-state-metrics"
  static_configs:
  - targets: ["<IP of Kubernetes>:<port of metrics>"]

以下是示例 tacker-samplevnf-rules.yaml 的内容

groups:
- name: example
  rules:
  - alert: KubePodCrashLooping
    annotations:
      probable_cause: The server cannot be connected.
      fault_type: Server Down
      fault_details: fault details
    expr: |
      rate(kube_pod_container_status_restarts_total{job="kube-state-metrics"}[10m]) * 60 * 5 > 0
    for: 5m
    labels:
      receiver_type: tacker
      function_type: vnffm
      vnf_instance_id: <VNF instance ID>
      perceived_severity: WARNING
      event_type: EQUIPMENT_ALARM

以下是示例 alertmanager.yml 的内容

route:
  group_by: ['cluster']
  group_wait: 30s
  group_interval: 2m
  repeat_interval: 1h
  receiver: 'web.boo'
  routes:
  - match:
      alertname: KubePodCrashLooping
    receiver: 'web.boo'
receivers:
- name: 'web.boo'
  webhook_configs:
  - url: 'http://<IP of Tacker>:9890/alert'
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['dev', 'instance']

NFVO 如何自动愈合 CNF

通过 FM 接口,有两种模式可以自动愈合 CNF。

轮询模式

在此模式下,NFVO 会以一定间隔主动发送获取告警的请求到 Tacker。根据响应的内容,确认发生问题的 CNF 的 VNFC 实例 ID。

以下是响应获取告警请求的示例

[
    {
        "id": "de8e74e8-1845-40dd-892c-cb7a67c26f9f",
        "managedObjectId": "c21fd71b-2866-45f6-89d0-70c458a5c32e",
        "vnfcInstanceIds": [
            "VDU1-curry-probe-test001-798d577c96-5624p"
        ],
        "alarmRaisedTime": "2023-12-08T13:16:30Z",
        "alarmChangedTime": "",
        "alarmClearedTime": "",
        "alarmAcknowledgedTime": "",
        "ackState": "UNACKNOWLEDGED",
        "perceivedSeverity": "CRITICAL",
        "eventTime": "2023-12-08T13:16:00Z",
        "eventType": "PROCESSING_ERROR_ALARM",
        "faultType": "fault_type",
        "probableCause": "Process Terminated",
        "isRootCause": "false",
        "correlatedAlarmIds": [],
        "faultDetails": [
            "fingerprint: 5ee739bb8840a190",
            "detail: fault_details"
        ],
        "_links": {
            "self": {
                "href": "http://127.0.0.1:9890/vnffm/v1/alarms/de8e74e8-1845-40dd-892c-cb7a67c26f9f"
            },
            "objectInstance": {
                "href": "http://127.0.0.1:9890/vnflcm/v2/vnf_instances/c21fd71b-2866-45f6-89d0-70c458a5c32e"
            }
        }
    }
]

注意

managedObjectId 的值是 VNF 实例 ID。 vnfcInstanceIds 的值是 VNFC 实例 ID。

然后,向 Tacker 发送指定 VNFC 实例 ID 的愈合请求。愈合请求的格式可以参考 愈合请求

通知模式

在此模式下,NFVO 将在 Tacker 上创建一个 FM 订阅。在此 FM 订阅中,可以设置多个过滤条件,以便匹配在 Tacker 中已实例化的 VNF 实例。

可以通过以下 CLI 命令执行创建 FM 订阅。

$ openstack vnffm sub create sample_param_file.json --os-tacker-api-version 2

本文档中示例 sample_param_file.json 的内容如下

{
    "filter": {
        "vnfInstanceSubscriptionFilter": {
            "vnfdIds": [
                "4d5ffa3b-9dde-45a9-a805-659dc8df0c02"
            ],
            "vnfProductsFromProviders": [
                {
                    "vnfProvider": "Company",
                    "vnfProducts": [
                        {
                            "vnfProductName": "Sample VNF",
                            "versions": [
                                {
                                    "vnfSoftwareVersion": 1.0,
                                    "vnfdVersions": [1.0, 2.0]
                                }
                            ]
                        }
                    ]
                }
            ],
            "vnfInstanceIds": [
                "aad7d2fe-ed51-47da-a20d-7b299860607e"
            ],
            "vnfInstanceNames": [
                "test"
            ]
        },
        "notificationTypes": [
            "AlarmNotification"
        ],
        "faultyResourceTypes": [
            "COMPUTE"
        ],
        "perceivedSeverities": [
            "WARNING"
        ],
        "eventTypes": [
            "EQUIPMENT_ALARM"
        ],
        "probableCauses": [
            "The server cannot be connected."
        ]
    },
    "callbackUri": "http://127.0.0.1:9890/vnffm/v1/subscriptions/407cb9c5-60f2-43e8-a43a-925c0323c3eb",
    "authentication": {
        "authType": [
            "BASIC",
            "OAUTH2_CLIENT_CREDENTIALS",
            "OAUTH2_CLIENT_CERT"
        ],
        "paramsBasic": {
            "userName": "nfvo",
            "password": "nfvopwd"
        },
        "paramsOauth2ClientCredentials": {
            "clientId": "auth_user_name",
            "clientPassword": "auth_password",
            "tokenEndpoint": "token_endpoint"
        },
        "paramsOauth2ClientCert": {
            "clientId": "auth_user_name",
            "certificateRef": {
                "type": "x5t#S256",
                "value": "certificate_fingerprint"
            },
            "tokenEndpoint": "token_endpoint"
        }
    }
}

以下是创建 FM 订阅的示例

$ openstack vnffm sub create sample_param_file.json --os-tacker-api-version 2
+--------------+-----------------------------------------------------------------------------------------------------+
| Field        | Value                                                                                               |
+--------------+-----------------------------------------------------------------------------------------------------+
| Callback Uri | http://127.0.0.1:9890/vnffm/v1/subscriptions/407cb9c5-60f2-43e8-a43a-925c0323c3eb                   |
| Filter       | {                                                                                                   |
|              |     "vnfInstanceSubscriptionFilter": {                                                              |
|              |         "vnfdIds": [                                                                                |
|              |             "4d5ffa3b-9dde-45a9-a805-659dc8df0c02"                                                  |
|              |         ],                                                                                          |
|              |         "vnfProductsFromProviders": [                                                               |
|              |             {                                                                                       |
|              |                 "vnfProvider": "Company",                                                           |
|              |                 "vnfProducts": [                                                                    |
|              |                     {                                                                               |
|              |                         "vnfProductName": "Sample VNF",                                             |
|              |                         "versions": [                                                               |
|              |                             {                                                                       |
|              |                                 "vnfSoftwareVersion": "1.0",                                        |
|              |                                 "vnfdVersions": [                                                   |
|              |                                     "1.0",                                                          |
|              |                                     "2.0"                                                           |
|              |                                 ]                                                                   |
|              |                             }                                                                       |
|              |                         ]                                                                           |
|              |                     }                                                                               |
|              |                 ]                                                                                   |
|              |             }                                                                                       |
|              |         ],                                                                                          |
|              |         "vnfInstanceIds": [                                                                         |
|              |             "aad7d2fe-ed51-47da-a20d-7b299860607e"                                                  |
|              |         ],                                                                                          |
|              |         "vnfInstanceNames": [                                                                       |
|              |             "test"                                                                                  |
|              |         ]                                                                                           |
|              |     },                                                                                              |
|              |     "notificationTypes": [                                                                          |
|              |         "AlarmNotification"                                                                         |
|              |     ],                                                                                              |
|              |     "faultyResourceTypes": [                                                                        |
|              |         "COMPUTE"                                                                                   |
|              |     ],                                                                                              |
|              |     "perceivedSeverities": [                                                                        |
|              |         "WARNING"                                                                                   |
|              |     ],                                                                                              |
|              |     "eventTypes": [                                                                                 |
|              |         "EQUIPMENT_ALARM"                                                                           |
|              |     ],                                                                                              |
|              |     "probableCauses": [                                                                             |
|              |         "The server cannot be connected."                                                           |
|              |     ]                                                                                               |
|              | }                                                                                                   |
| ID           | a7a18ac6-a668-4d94-8ba0-f04c20cfeacd                                                                |
| Links        | {                                                                                                   |
|              |     "self": {                                                                                       |
|              |         "href": "http://127.0.0.1:9890/vnffm/v1/subscriptions/407cb9c5-60f2-43e8-a43a-925c0323c3eb" |
|              |     }                                                                                               |
|              | }                                                                                                   |
+--------------+-----------------------------------------------------------------------------------------------------+

创建 FM 订阅后,每当 Prometheus 将警报发送到 Tacker 时,Tacker 都会根据警报中的信息找到匹配的 FM 订阅。

以下是 Prometheus 发送警报的请求体的示例

{
    "receiver": "receiver",
    "status": "firing",
    "alerts": [
        {
            "status": "firing",
            "labels": {
                "receiver_type": "tacker",
                "function_type": "vnffm",
                "vnf_instance_id": "c21fd71b-2866-45f6-89d0-70c458a5c32e",
                "pod": "VDU1-curry-probe-test001-798d577c96-5624p",
                "perceived_severity": "CRITICAL",
                "event_type": "PROCESSING_ERROR_ALARM"
            },
            "annotations": {
                "fault_type": "fault_type",
                "probable_cause": "Process Terminated",
                "fault_details": "fault_details"
            },
            "startsAt": "2023-12-08T13:16:00Z",
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http://192.168.121.35:9090/graph?g0.expr=up%7Bjob%3D%22node%22%7D+%3D%3D+0&g0.tab=1",
            "fingerprint": "5ee739bb8840a190"
        }
    ],
    "groupLabels": {},
    "commonLabels": {
        "alertname": "NodeInstanceDown",
        "job": "node"
    },
    "commonAnnotations": {
        "description": "sample"
    },
    "externalURL": "http://192.168.121.35:9093",
    "version": "4",
    "groupKey": "{}:{}",
    "truncatedAlerts": 0
}

最后,将通知发送到 FM 订阅中的 Callback Uri(即 NFVO)。NFVO 根据通知中的内容向 Tacker 发送愈合请求。愈合请求的格式可以参考 愈合请求

以下是 Tacker 发送通知的请求体的示例

{
    "id": "0ab777dc-b3a0-42d6-85c1-e5f80711b988",
    "notificationType": "AlarmNotification",
    "subscriptionId": "0155c914-8573-463c-a97a-aef5a3ca9c72",
    "timeStamp": "2023-12-08T13:16:30Z",
    "alarm": {
        "id": "de8e74e8-1845-40dd-892c-cb7a67c26f9f",
        "managedObjectId": "c21fd71b-2866-45f6-89d0-70c458a5c32e",
        "vnfcInstanceIds": ["VDU1-curry-probe-test001-798d577c96-5624p"],
        "alarmRaisedTime": "2023-12-08T13:16:30+00:00",
        "ackState": "UNACKNOWLEDGED",
        "perceivedSeverity": "CRITICAL",
        "eventTime": "2023-12-08T13:16:00Z",
        "eventType": "PROCESSING_ERROR_ALARM",
        "faultType": "fault_type",
        "probableCause": "Process Terminated",
        "isRootCause": false,
        "faultDetails": [
            "fingerprint: 5ee739bb8840a190",
            "detail: fault_details"
        ],
        "_links": {
            "self": {
                "href": "http://127.0.0.1:9890/vnffm/v1/alarms/de8e74e8-1845-40dd-892c-cb7a67c26f9f"
            },
            "objectInstance":{
                "href": "http://127.0.0.1:9890/vnflcm/v2/vnf_instances/c21fd71b-2866-45f6-89d0-70c458a5c32e"
            }
        }
    },
    "_links": {
        "subscription": {
            "href": "http://127.0.0.1:9890/vnffm/v1/subscriptions/0155c914-8573-463c-a97a-aef5a3ca9c72"
        }
    }
}

如何使用 FM 接口的 CLI

获取所有告警

可以通过以下 CLI 命令执行获取所有告警。

$ openstack vnffm alarm list --os-tacker-api-version 2

以下是获取所有告警的示例

$ openstack vnffm alarm list --os-tacker-api-version 2
+--------------------------------------+--------------------------------------+----------------+------------------------+--------------------+--------------------+
| ID                                   | Managed Object Id                    | Ack State      | Event Type             | Perceived Severity | Probable Cause     |
+--------------------------------------+--------------------------------------+----------------+------------------------+--------------------+--------------------+
| de8e74e8-1845-40dd-892c-cb7a67c26f9f | c21fd71b-2866-45f6-89d0-70c458a5c32e | UNACKNOWLEDGED | PROCESSING_ERROR_ALARM | CRITICAL           | Process Terminated |
+--------------------------------------+--------------------------------------+----------------+------------------------+--------------------+--------------------+

获取指定的告警

可以通过以下 CLI 命令执行获取指定的告警。

$ openstack vnffm alarm show ALARM_ID --os-tacker-api-version 2

以下是获取指定的告警的示例

$ openstack vnffm alarm show de8e74e8-1845-40dd-892c-cb7a67c26f9f --os-tacker-api-version 2
+----------------------------+------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                |
+----------------------------+------------------------------------------------------------------------------------------------------+
| Ack State                  | UNACKNOWLEDGED                                                                                       |
| Alarm Acknowledged Time    |                                                                                                      |
| Alarm Changed Time         |                                                                                                      |
| Alarm Cleared Time         |                                                                                                      |
| Alarm Raised Time          | 2023-12-08T13:16:30Z                                                                                 |
| Correlated Alarm Ids       |                                                                                                      |
| Event Time                 | 2023-12-08T13:16:00Z                                                                                 |
| Event Type                 | PROCESSING_ERROR_ALARM                                                                               |
| Fault Details              | [                                                                                                    |
|                            |     "fingerprint: 5ee739bb8840a190",                                                                 |
|                            |     "detail: fault_details"                                                                          |
|                            | ]                                                                                                    |
| Fault Type                 | fault_type                                                                                           |
| ID                         | de8e74e8-1845-40dd-892c-cb7a67c26f9f                                                                 |
| Is Root Cause              | False                                                                                                |
| Links                      | {                                                                                                    |
|                            |     "self": {                                                                                        |
|                            |         "href": "http://127.0.0.1:9890/vnffm/v1/alarms/de8e74e8-1845-40dd-892c-cb7a67c26f9f"         |
|                            |     },                                                                                               |
|                            |     "objectInstance": {                                                                              |
|                            |         "href": "http://127.0.0.1:9890/vnflcm/v2/vnf_instances/c21fd71b-2866-45f6-89d0-70c458a5c32e" |
|                            |     }                                                                                                |
|                            | }                                                                                                    |
| Managed Object Id          | c21fd71b-2866-45f6-89d0-70c458a5c32e                                                                 |
| Perceived Severity         | CRITICAL                                                                                             |
| Probable Cause             | Process Terminated                                                                                   |
| Root Cause Faulty Resource |                                                                                                      |
| Vnfc Instance Ids          | [                                                                                                    |
|                            |     "VDU1-curry-probe-test001-798d577c96-5624p"                                                      |
|                            | ]                                                                                                    |
+----------------------------+------------------------------------------------------------------------------------------------------+

更改目标告警

可以通过以下 CLI 命令执行更改告警的 ackState。

$ openstack vnffm alarm update ALARM_ID --ack-state ACKNOWLEDGED --os-tacker-api-version 2

注意

--ack-state 的值只能是 ACKNOWLEDGEDUNACKNOWLEDGED

以下是更改目标告警的示例

$ openstack vnffm alarm update de8e74e8-1845-40dd-892c-cb7a67c26f9f --ack-state ACKNOWLEDGED --os-tacker-api-version 2
+-----------+--------------+
| Field     | Value        |
+-----------+--------------+
| Ack State | ACKNOWLEDGED |
+-----------+--------------+

创建一个新的 FM 订阅

FM 订阅的创建已经在上面的 通知模式 中介绍,CLI 命令的使用案例可以参考那里。

获取所有 FM 订阅

可以通过以下 CLI 命令执行获取所有 FM 订阅。

$ openstack vnffm sub list --os-tacker-api-version 2

以下是获取所有 FM 订阅的示例

$ openstack vnffm sub list --os-tacker-api-version 2
+--------------------------------------+-------------------------------------------------------------------------------------+
| ID                                   | Callback Uri                                                                        |
+--------------------------------------+-------------------------------------------------------------------------------------+
| d6da0fff-a032-429e-8560-06e8af685e2c | http://127.0.0.1:9990/notification/callbackuri/c21fd71b-2866-45f6-89d0-70c458a5c32e |
+--------------------------------------+-------------------------------------------------------------------------------------+

获取指定的 FM 订阅

可以通过以下 CLI 命令执行获取指定的 FM 订阅。

$ openstack vnffm sub show FM_SUBSCRIPTION_ID --os-tacker-api-version 2

以下是获取指定的 FM 订阅的示例

$ openstack vnffm sub show d6da0fff-a032-429e-8560-06e8af685e2c --os-tacker-api-version 2
+--------------+-----------------------------------------------------------------------------------------------------+
| Field        | Value                                                                                               |
+--------------+-----------------------------------------------------------------------------------------------------+
| Callback Uri | http://127.0.0.1:9990/notification/callbackuri/c21fd71b-2866-45f6-89d0-70c458a5c32e                 |
| Filter       | {                                                                                                   |
|              |     "vnfInstanceSubscriptionFilter": {                                                              |
|              |         "vnfInstanceIds": [                                                                         |
|              |             "c21fd71b-2866-45f6-89d0-70c458a5c32e"                                                  |
|              |         ]                                                                                           |
|              |     }                                                                                               |
|              | }                                                                                                   |
| ID           | d6da0fff-a032-429e-8560-06e8af685e2c                                                                |
| Links        | {                                                                                                   |
|              |     "self": {                                                                                       |
|              |         "href": "http://127.0.0.1:9890/vnffm/v1/subscriptions/d6da0fff-a032-429e-8560-06e8af685e2c" |
|              |     }                                                                                               |
|              | }                                                                                                   |
+--------------+-----------------------------------------------------------------------------------------------------+

删除指定的 FM 订阅

可以通过以下 CLI 命令执行删除指定的 FM 订阅。

$ openstack vnffm sub delete FM_SUBSCRIPTION_ID --os-tacker-api-version 2

以下是删除指定的 FM 订阅的示例

$ openstack vnffm sub delete d6da0fff-a032-429e-8560-06e8af685e2c --os-tacker-api-version 2
VNF FM subscription 'd6da0fff-a032-429e-8560-06e8af685e2c' deleted successfully