故障排除技巧

诊断:客户投诉在尝试浏览容器时收到 HTTP 状态码 500

此条目源于真实的客户问题,专门关注问题是如何被识别出来的。HTTP 状态码 500 可能有很多原因。如果 Swift 对象存储没有明显的问题,那么可能需要更仔细地查看用户的事务。找到用户的 Swift 帐户后,可以在每个 Swift 代理服务器上搜索 Swift 代理日志,查找该用户的事务。Linux 命令 bzgrep 可用于搜索节点上的所有代理日志文件,包括 .bz2 压缩文件。例如

$ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
  -w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139] \
  'sudo bzgrep -w AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log*' |  dshbak -c
.
.
----------------
<redacted>.132.6
----------------
Feb 29 08:51:57 sw-aw2az2-proxy011 proxy-server <redacted>.16.132
<redacted>.66.8 29/Feb/2012/08/51/57 GET /v1.0/AUTH_redacted-4962-4692-98fb-52ddda82a5af
/%3Fformat%3Djson HTTP/1.0 404 - - <REDACTED>_4f4d50c5e4b064d88bd7ab82 - - -
tx429fc3be354f434ab7f9c6c4206c1dc3 - 0.0130

这显示了用户帐户上的 GET 操作。

注意

返回的 HTTP 状态码是 404,未找到,而不是用户报告的 500。

使用事务 ID tx429fc3be354f434ab7f9c6c4206c1dc3,可以在 Swift 对象服务器日志文件中搜索此事务 ID

$ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
  -w <redacted>.72.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.[4-67|4-67],<redacted>.204.[4-131] \
  'sudo bzgrep tx429fc3be354f434ab7f9c6c4206c1dc3 /var/log/swift/server.log*' | dshbak -c
.
.
----------------
<redacted>.72.16
----------------
Feb 29 08:51:57 sw-aw2az1-object013 account-server <redacted>.132.6 - -

[29/Feb/2012:08:51:57 +0000|] "GET /disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-"

0.0016 ""
----------------
<redacted>.31
----------------
Feb 29 08:51:57 node-az2-object060 account-server <redacted>.132.6 - -
[29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0011 ""
----------------
<redacted>.204.70
----------------

Feb 29 08:51:57 sw-aw2az3-object0067 account-server <redacted>.132.6 - -
[29/Feb/2012:08:51:57 +0000|] "GET /disk6/198875/AUTH_redacted-4962-
4692-98fb-52ddda82a5af" 404 - "tx429fc3be354f434ab7f9c6c4206c1dc3" "-" "-" 0.0014 ""

注意

3 个 GET 操作到 3 个不同的对象服务器,这些服务器保存了该用户帐户的 3 个副本。每个 GET 返回 HTTP 状态码 404,未找到。

接下来,使用 swift-get-nodes 命令确定用户帐户数据存储的确切位置

$ sudo swift-get-nodes /etc/swift/account.ring.gz AUTH_redacted-4962-4692-98fb-52ddda82a5af
Account AUTH_redacted-4962-4692-98fb-52ddda82a5af
Container None
Object None

Partition 198875
Hash 1846d99185f8a0edaf65cfbf37439696

Server:Port Device <redacted>.31:6202 disk6
Server:Port Device <redacted>.204.70:6202 disk6
Server:Port Device <redacted>.72.16:6202 disk9
Server:Port Device <redacted>.204.64:6202 disk11 [Handoff]
Server:Port Device <redacted>.26:6202 disk11 [Handoff]
Server:Port Device <redacted>.72.27:6202 disk11 [Handoff]

curl -I -XHEAD "`http://<redacted>.31:6202/disk6/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
<http://15.185.138.31:6202/disk6/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
curl -I -XHEAD "`http://<redacted>.204.70:6202/disk6/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
<http://15.185.204.70:6202/disk6/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
curl -I -XHEAD "`http://<redacted>.72.16:6202/disk9/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
<http://15.185.72.16:6202/disk9/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_
curl -I -XHEAD "`http://<redacted>.204.64:6202/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
<http://15.185.204.64:6202/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
curl -I -XHEAD "`http://<redacted>.26:6202/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
<http://15.185.136.26:6202/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]
curl -I -XHEAD "`http://<redacted>.72.27:6202/disk11/198875/AUTH_redacted-4962-4692-98fb-52ddda82a5af"
<http://15.185.72.27:6202/disk11/198875/AUTH_db0050ad-4962-4692-98fb-52ddda82a5af>`_ # [Handoff]

ssh <redacted>.31 "ls -lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
ssh <redacted>.204.70 "ls -lah /srv/node/disk6/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
ssh <redacted>.72.16 "ls -lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/"
ssh <redacted>.204.64 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
ssh <redacted>.26 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]
ssh <redacted>.72.27 "ls -lah /srv/node/disk11/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/" # [Handoff]

检查每个主服务器,<redacted>.31、<redacted>.204.70 和 <redacted>.72.16,查找此用户的帐户。例如,在 <redacted>.72.16 上

$ ls -lah /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/
total 1.0M
drwxrwxrwx 2 swift swift 98 2012-02-23 14:49 .
drwxrwxrwx 3 swift swift 45 2012-02-03 23:28 ..
-rw------- 1 swift swift 15K 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db
-rw-rw-rw- 1 swift swift 0 2012-02-23 14:49 1846d99185f8a0edaf65cfbf37439696.db.pending

因此,此用户的帐户数据库(一个 sqlite 数据库)存在。使用 sqlite 检查帐户

$ sudo cp /srv/node/disk9/accounts/198875/696/1846d99185f8a0edaf65cfbf37439696/1846d99185f8a0edaf65cfbf37439696.db /tmp
$ sudo sqlite3 /tmp/1846d99185f8a0edaf65cfbf37439696.db
sqlite> .mode line
sqlite> select * from account_stat;
account = AUTH_redacted-4962-4692-98fb-52ddda82a5af
created_at = 1328311738.42190
put_timestamp = 1330000873.61411
delete_timestamp = 1330001026.00514
container_count = 0
object_count = 0
bytes_used = 0
hash = eb7e5d0ea3544d9def940b19114e8b43
id = 2de8c8a8-cef9-4a94-a421-2f845802fe90
status = DELETED
status_changed_at = 1330001026.00514
metadata =

接下来,尝试在代理服务器日志中找到此帐户的 DELETE 操作

$ PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -l <yourusername> -R ssh \
  -w <redacted>.68.[4-11,132-139 4-11,132-139],<redacted>.132.[4-11,132-139|4-11,132-139] \
  'sudo bzgrep AUTH_redacted-4962-4692-98fb-52ddda82a5af /var/log/swift/proxy.log* \
  | grep -w DELETE | awk "{print $3,$10,$12}"' |- dshbak -c
.
.
Feb 23 12:43:46 sw-aw2az2-proxy001 proxy-server <redacted> <redacted>.66.7 23/Feb/2012/12/43/46 DELETE /v1.0/AUTH_redacted-4962-4692-98fb-
52ddda82a5af/ HTTP/1.0 204 - Apache-HttpClient/4.1.2%20%28java%201.5%29 <REDACTED>_4f458ee4e4b02a869c3aad02 - - -
tx4471188b0b87406899973d297c55ab53 - 0.0086

从这里可以看到导致帐户被删除的操作。

流程:删除对象

简单情况 - 删除少量对象和容器

注意

swift-direct 专用于惠普企业 Helion 公有云。使用 swiftly 作为替代。

注意

对象和容器名称采用 UTF8 编码。Swift direct 直接接受 UTF8 编码,而不是 URL 编码的 UTF8 编码(REST API 期望 UTF8 编码,然后进行 URL 编码)。实际上,将外语字符串从终端窗口剪切粘贴会产生正确的结果。

提示:在执行任何破坏性命令之前,请使用 head 命令。

要删除少量对象,请登录到任何代理节点并按以下步骤操作

检查目标对象

$ sudo -u swift /opt/hp/swift/bin/swift-direct head 132345678912345 container_name obj_name

如果设置了 X-Object-ManifestX-Static-Large-Object,则这是一个 manifest 对象,段对象可能在另一个容器中。

如果设置了 X-Object-Manifest 属性,则需要找到对象的名称,这意味着它是一个 DLO。例如,如果 X-Object-Manifestcontainer2/seg-blah,则按以下方式列出容器 container2 的内容

$ sudo -u swift /opt/hp/swift/bin/swift-direct show 132345678912345 container2

选择名称以 seg-blah 开头的对象。按以下方式删除段对象

$ sudo -u swift /opt/hp/swift/bin/swift-direct delete 132345678912345 container2 seg-blah01
$ sudo -u swift /opt/hp/swift/bin/swift-direct delete 132345678912345 container2 seg-blah02
etc

如果设置了 X-Static-Large-Object,则需要读取其内容。通过以下方式进行操作

  • 使用 swift-get-nodes 获取对象的详细位置。

  • -X HEAD 更改为 -X GET,并针对一个副本运行 curl

  • 这将列出一个 JSON 主体,其中包含容器和对象名称

  • 如上所述删除 DLO 段的对象

一旦删除段,就可以使用 swift-direct 删除对象,如上所述。

最后,使用 swift-direct 删除容器。

流程:停用 Swift 节点

如果需要停用 Swift 节点(例如,正在重新利用它们),则务必遵循以下步骤。

  1. 对于对象服务器,请遵循从环中删除节点的步骤。

  2. 对于 Swift 代理服务器,请让网络团队将其从负载均衡器中删除。

  3. 打开网络工单以将其从网络防火墙中删除。

  4. 确保删除 /etc/swift 目录及其中的所有内容。