文章插图
这个特殊问题涉及自定义内部FUSE 文件系统:ndrive 。它已经溃烂了一段时间,但需要有人坐下来愤怒地看着它 。/proc这篇博文描述了在将问题发布到内核邮件列表并了解内核等待代码的实际工作原理之前,我是如何深入了解发生了什么的!
症状:卡住 Docker Kill 和僵尸进程我们有一个停滞的 docker API 调用:
goroutine 146 [选择,8817 分钟]:net/http.(*persistConn).roundTrip(0xc000658fc0, 0xc0003fc080, 0x0, 0x0, 0x0)/usr/local/go/src/net/http/transport.go:2610 +0x765 net/http.(*Transport).roundTrip(0xc000420140, 0xc000966200, 0x30, 0x1366f20, 0x162)/usr/local/go/src/net/http/transport.go:592 +0xacb net/http.(*Transport).往返(0xc000420140、0xc000966200、0xc000420140、0x0、0x0)/usr/local/go/src/net/http/roundtrip.go:17 +0x35 net/http.send(0xc000966200、0x161eba0、0xc000420 140、0x0、0x0、0x0、 0xc00000e050, 0x3, 0x1, 0x0)/usr/local/go/src/net/http/client.go:251 +0x454 net/http.(*Client).send(0xc000438480, 0xc000966200, 0x0, 0x0, 0x0, 0xc00000e 050 , 0x0, 0x1, 0x10000168e)/usr/local/go/src/net/http/client.go:175 +0xff net/http.(*客户端) 。做(0xc000438480, 0xc000966200, 0x0, 0x0, 0x0)/usr/local/go/src/net/http/client.go:717 +0x45f net/http.(*Client).Do(...)/usr/ local/go/src/net/http/client.go:585 golang.org/x/net/context/ctxhttp.Do(0x163bd48, 0xc000044090, 0xc000438480, 0xc000966100, 0x0, 0x0, 0x0)/go/pkg/mod/ golang.org/x/net@v0.0.0-20211209124913-491a49abca63/context/ctxhttp/ctxhttp.go:27 +0x10f Github.com/docker/docker/client.(*Client).doRequest(0xc0001a8200, 0x163bd48, 0xc00004409 0, 0xc000966100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)/go/pkg/mod/github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/request.go:132 +0xbegithub.com/docker/docker/client.(*Client).sendRequest(0xc0001a8200, 0x163bd48, 0xc000044090, 0x13d8643, 0x3, 0xc00079a720, 0x51, 0x0, 0x0, 0x0, ...)/go/pkg/mod /github 。com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/request.go:122 +0x156 github.com/docker/docker/client.(*Client).get(...)/go/pkg/mod /github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/request.go:37 github.com/docker/docker/client.(*Client).ContainerInspect(0xc0001a8200, 0x163bd48, 0xc000044090, 0xc 0006a01c0, 0x40 , 0x0, 0x0, 0x0, 0x0, 0x0, ...)/go/pkg/mod/github.com/moby/moby@v0.0.0-20190408150954-50ebe4562dfc/client/container_inspect.go:18 +0x128github.com/Netflix/titus-executor/executor/runtime/docker.(*DockerRuntime).Kill(0xc000215180, 0x163bdb8, 0xc000938600, 0x1, 0x0, 0x0)/var/lib/buildkite-agent/builds/ip-192- 168-1-90-1/netflix/titus-executor/executor/runtime/docker/docker.go:2835 +0x310 github.com/Netflix/titus-executor/executor/runner.(*Runner).doShutdown(0xc000432dc0, 0x163bd10, 0xc000938390, 0x1, 0xc000b821e0, 0x1d, 0xc0005e4710)/var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runner/runner.go:3 26 +0x4f4 github.com/Netflix/titus-executor/executor/runner.(*Runner).startRunner(0xc000432dc0, 0x163bdb8, 0xc00071e0c0, 0xc0a502e28c08b488, 0x24572b8, 0x1df5980)/var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runner/runner.go:122 +0x391由 github.com/Netflix/titus- 创建执行者/执行者/runner.StartTaskWithRuntime/var/lib/buildkite-agent/builds/ip-192-168-1-90-1/netflix/titus-executor/executor/runner/runner.go:81 +0x411
在这里,我们的管理引擎对 Docker API 的 unix 套接字进行了 HTTP 调用,要求它终止一个容器 。我们的容器配置为通过SIGKILL. 但这很奇怪 。kill(SIGKILL)应该是比较致命的,那么容器是干什么的呢?$ docker exec -it 6643cd073492 bash OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: process_linux.go:130: executing setns process caused: exit status 1: 未知
唔 。似乎它还活着,但setns(2)失败了 。为什么会这样?如果我们通过查看进程树ps awwfux,我们会看到:_ containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/6643cd073492ba9166100ed30dbe389ff1caef0dc3d35 | _ [码头工人初始化] | _ [ndrive] <已失效>
好的,所以容器的 init 进程仍然存在,但是它有一个僵尸子进程 。容器的初始化进程可能在做什么?# cat /proc/1528591/stack [<0>] do_wait+0x156/0x2f0 [<0>] kernel_wait4+0x8d/0x140 [<0>] zap_pid_ns_processes+0x104/0x180 [<0>] do_exit+0xa41/0xb80 [< 0>] do_group_exit+0x3a/0xa0 [<0>] __x64_sys_exit_group+0x14/0x20 [<0>] do_syscall_64+0x37/0xb0 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
它正在退出,但似乎卡住了 。不过,唯一的子进程是处于 Z(即“僵尸”)状态的 ndrive 进程 。Zombies 是已成功退出的进程,正在等待wait()其父进程的相应系统调用对其进行收割 。那么内核怎么会卡在等待僵尸呢?
推荐阅读
- Linux初学者在学习中常见的困惑
- 从不同维度分析:Linux与Windows的区别
- 如何入门 Linux Shell 脚本编写
- |90年版50元纸币,退出流通好几年了,现在市场价格怎么样
- 摩托车交强险常见问题解答
- |连换七套礼服后,再看范冰冰在戛纳的处境,秦海璐全说对了
- 手机是否在监听我们?苹果技术人员解答17个常见疑问
- 穿衣搭配|天秤座的人在职场上是否更容易被领导赏识?
- 干莲子要泡多久
- 如何用泡打粉做油条