Octavia Amphora 故障转移熔断器¶
在大型基础设施故障期间,过时 amphorae 的自动故障转移可能导致大规模故障转移事件,并给服务器带来相当大的额外负载。通过使用 amphorae 故障转移熔断器功能,您可以避免这些不希望发生的故障转移事件。熔断器是一个可配置的阈值,您可以设置它,并在达到该阈值时停止 amphorae 的自动故障转移。熔断器功能默认情况下是禁用的。
配置¶
您可以通过设置 failover_threshold 变量来定义故障转移熔断器功能的阈值。failover_threshold 变量是配置文件 /etc/octavia/octavia.conf 中 health_manager 组的成员。
每当过时 amphorae 的数量达到或超过 failover_threshold 的值时,Octavia 将执行以下操作:
停止 amphorae 的自动故障转移。
将过时 amphorae 的状态设置为 FAILOVER_STOPPED。
记录错误消息。
以下行显示了一个典型的错误消息:
ERROR octavia.db.repositories [-] Stale amphora count reached the threshold (3). 4 amphorae were set into FAILOVER_STOPPED status.
注意
根据您的环境规模设置 failover_threshold 的值。我们建议将该值设置为大于您估计在单个主机上运行的典型 amphorae 数量,或者设置为反映总 amphorae 数量的 20% 到 30% 的值。
错误恢复¶
自动错误恢复¶
对于状态为 FAILOVER_STOPPED 的 amphorae,Octavia 将在收到来自这些 amphorae 的新更新后自动将其状态重置为 ALLOCATED。
手动错误恢复¶
要从 FAILOVER_STOPPED 状态恢复,您必须手动将过时 amphorae 的数量减少到熔断器阈值以下。
您可以使用 openstack loadbalancer amphora list 命令列出处于 FAILOVER_STOPPED 状态的 amphorae。使用 openstack loadbalancer amphora failover 命令手动触发 amphorae 进行故障转移。
在此示例中,failover_threshold = 3,并且基础设施故障导致四个 amphorae 变得不可用。在健康管理器进程检测到此状态后,它会将所有过时 amphorae 的状态设置为 FAILOVER_STOPPED,如下所示。
openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+------------------+--------+---------------+------------+
| id | loadbalancer_id | status | role | lb_network_ip | ha_ip |
+--------------------------------------+--------------------------------------+------------------+--------+---------------+------------+
| 79f0e06d-446d-448a-9d2b-c3b89d0c700d | 8fd2cac5-cbca-4bb1-bcfc-daba43e097ab | FAILOVER_STOPPED | BACKUP | 192.168.0.108 | 192.0.2.17 |
| 9c0416d7-6293-4f13-8f67-61e5d757b36e | 4b13dda1-296a-400c-8248-1abad5728057 | ALLOCATED | MASTER | 192.168.0.198 | 192.0.2.42 |
| e11208b7-f13d-4db3-9ded-1ee6f70a0502 | 8fd2cac5-cbca-4bb1-bcfc-daba43e097ab | FAILOVER_STOPPED | MASTER | 192.168.0.154 | 192.0.2.17 |
| ceea9fff-71a2-48c8-a968-e51dc440c572 | ab513cb3-8f5d-461e-b7ae-a06b5083a371 | ALLOCATED | MASTER | 192.168.0.149 | 192.0.2.26 |
| a1351933-2270-493c-8201-d8f9f9fe42f7 | 4b13dda1-296a-400c-8248-1abad5728057 | FAILOVER_STOPPED | BACKUP | 192.168.0.103 | 192.0.2.42 |
| 441718e7-0956-436b-9f99-9a476339d7d2 | ab513cb3-8f5d-461e-b7ae-a06b5083a371 | FAILOVER_STOPPED | BACKUP | 192.168.0.148 | 192.0.2.26 |
+--------------------------------------+--------------------------------------+------------------+--------+---------------+------------+
在操作员解决基础设施故障后,他们可能需要手动触发故障转移才能恢复正常运行。在此示例中,需要进行两次手动故障转移才能使过时 amphorae 的数量低于配置的阈值三。
openstack loadbalancer amphora failover --wait 79f0e06d-446d-448a-9d2b-c3b89d0c700d
openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+------------------+--------+---------------+------------+
| id | loadbalancer_id | status | role | lb_network_ip | ha_ip |
+--------------------------------------+--------------------------------------+------------------+--------+---------------+------------+
| 9c0416d7-6293-4f13-8f67-61e5d757b36e | 4b13dda1-296a-400c-8248-1abad5728057 | ALLOCATED | MASTER | 192.168.0.198 | 192.0.2.42 |
| e11208b7-f13d-4db3-9ded-1ee6f70a0502 | 8fd2cac5-cbca-4bb1-bcfc-daba43e097ab | FAILOVER_STOPPED | MASTER | 192.168.0.154 | 192.0.2.17 |
| ceea9fff-71a2-48c8-a968-e51dc440c572 | ab513cb3-8f5d-461e-b7ae-a06b5083a371 | ALLOCATED | MASTER | 192.168.0.149 | 192.0.2.26 |
| a1351933-2270-493c-8201-d8f9f9fe42f7 | 4b13dda1-296a-400c-8248-1abad5728057 | FAILOVER_STOPPED | BACKUP | 192.168.0.103 | 192.0.2.42 |
| 441718e7-0956-436b-9f99-9a476339d7d2 | ab513cb3-8f5d-461e-b7ae-a06b5083a371 | FAILOVER_STOPPED | BACKUP | 192.168.0.148 | 192.0.2.26 |
| cf734b57-6019-4ec0-8437-115f76d1bbb0 | 8fd2cac5-cbca-4bb1-bcfc-daba43e097ab | ALLOCATED | BACKUP | 192.168.0.141 | 192.0.2.17 |
+--------------------------------------+--------------------------------------+------------------+--------+---------------+------------+
openstack loadbalancer amphora failover --wait e11208b7-f13d-4db3-9ded-1ee6f70a0502
openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+
| id | loadbalancer_id | status | role | lb_network_ip | ha_ip |
+--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+
| 9c0416d7-6293-4f13-8f67-61e5d757b36e | 4b13dda1-296a-400c-8248-1abad5728057 | ALLOCATED | MASTER | 192.168.0.198 | 192.0.2.42 |
| ceea9fff-71a2-48c8-a968-e51dc440c572 | ab513cb3-8f5d-461e-b7ae-a06b5083a371 | ALLOCATED | MASTER | 192.168.0.149 | 192.0.2.26 |
| cf734b57-6019-4ec0-8437-115f76d1bbb0 | 8fd2cac5-cbca-4bb1-bcfc-daba43e097ab | ALLOCATED | BACKUP | 192.168.0.141 | 192.0.2.17 |
| d2909051-402e-4e75-86c9-ec6725c814a1 | 8fd2cac5-cbca-4bb1-bcfc-daba43e097ab | ALLOCATED | MASTER | 192.168.0.25 | 192.0.2.17 |
| 5133e01a-fb53-457b-b810-edbb5202437e | 4b13dda1-296a-400c-8248-1abad5728057 | ALLOCATED | BACKUP | 192.168.0.76 | 192.0.2.42 |
| f82eff89-e326-4e9d-86bc-58c720220a3f | ab513cb3-8f5d-461e-b7ae-a06b5083a371 | ALLOCATED | BACKUP | 192.168.0.86 | 192.0.2.26 |
+--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+
在过时 amphorae 的数量降至配置的阈值以下后,正常运行恢复,自动故障转移过程尝试恢复剩余的过时 amphorae。