0%

TCP 状态图

TCP的三次握手、TCP四次挥手、TCP状态机

TCP的三次握手

TCP四次挥手

CLOSE_WAIT 堆积的危害

每个 CLOSE_WAIT 连接会占据一个文件描述,堆积大量的 CLOSE_WAIT 可能造成文件描述符不够用,导致建连或打开文件失败,报错 too many open files:

1
dial udp 9.215.0.48:9073: socket: too many open files

如何判断?
检查系统 CLOSE_WAIT 连接数:

1
lsof | grep CLOSE_WAIT | wc -l

检查指定进程 CLOSE_WAIT 连接数:

1
lsof -p $PID | grep CLOSE_WAIT | wc -l

主动关闭的一方发出 FIN 包,被动关闭的一方响应 ACK 包,此时,被动关闭的一方就进入了 CLOSE_WAIT 状态。如果一切正常,稍后被动关闭的一方也会发出 FIN 包,然后迁移到 LAST_ACK 状态。

通常,CLOSE_WAIT 状态在服务器停留时间很短,如果你发现大量的 CLOSE_WAIT 状态,那么就意味着被动关闭的一方没有及时发出 FIN 包,一般来说都是被动关闭的一方应用程序有问题。

应用没有 Close

如果 CLOSE_WAIT 堆积的量特别大(比如 10w+),甚至导致文件描述符不够用了,一般就是应用没有 Close 连接导致。

当连接被关闭时,被动关闭方在代码层面没有 close 掉相应的 socket 连接,那么自然不会发出 FIN 包,从而会导致 CLOSE_WAIT 堆积。可能是代码里根本没写 Close,也可能是代码不严谨,出现死循环之类的问题,导致即便后面写了 close 也永远执行不到。

应用迟迟不 accept 连接

如果 CLOSE_WAIT 堆积的量不是很大,可能是全连接队列 (accept queue) 堆积了。我们先看下 TCP 连接建立的过程:

连接建立好之后会被放入 accept queue,等待应用 accept,如果应用迟迟没有从队列里面去 accept 连接,等到 client 超时时间,主动关闭了连接,这时连接在 server 端仍在全连接队列中,状态变为 CLOSE_WAIT

如果连接一直不被应用 accept 出来,内核也不会自动响应 ACK 去关闭连接的。不过这种情况的堆积量一般也不高,取决于 accept queue 的大小。

TIME_WAIT

补充我一个之前遇到的场景,有次上线服务,发现连接ES的连接有很多TIME_WAIT

TIME_WAIT很明显就是本地主动关闭连接,但在等2MSL。

按理说连接ES都是长连接(Keep-Alive),不会有这么多需要关闭的连接。

看了下代码及issue,找到了原因
https://github.com/elastic/go-elasticsearch/issues/123

发现其实是Golang的一个问题
https://pkg.go.dev/net/http#Response

1
2
3
4
5
6
// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body. The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.

在代码中加上

1
_ = res.String()

问题解决

如果直接使用net/http来发送请求,除了公用一个client之外,发送完数据后,也需要

1
2
3
4
5
6
7
8
9
resp, err := httpc.HttpClient.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
// 除了.Close()之后,还需要读取一下body,否则链接的状态还是TIME_WAIT
// 以下两种读取方式,任选其一即可
// _, err = ioutil.ReadAll(resp.Body)
// io.Copy(ioutil.Discard, resp.Body)

关于TIME_WAIT一些其它资料推荐:

TCP状态机

将连接建立和连接断开的两个时序状态图综合起来,就是这个著名的TCP的状态机。学习的时候比较建议将这个状态机和时序状态机对照着看,不然容易晕。

在这个图中,加黑加粗的部分,是上面说到的主要流程,其中阿拉伯数字的序号,是连接过程中的顺序,而大写中文数字的序号,是连接断开过程中的顺序。加粗的实线是客户端A的状态变迁,加粗的虚线是服务端B的状态变迁。

TCP Flags

TCP中Flags字段

  • SYN 表示建立连接,在TCP监听中为: Flags [S];
  • FIN 表示关闭连接,在TCP监听中为: Flags [F];
  • ACK 表示收到请求,返回响应,在TCP监听中为: Flags [.];
  • PSH 表示数据传输,在TCP监听中为: Flags [P];
  • RST 表示连接重置,在TCP监听中为: Flags [R]。

Flags字段组合使用

  • 收到并建立连接为: Flags [S.]
  • 收到并关闭连接为: Flags [F.]
  • 收到并传输数据为: Flags [P.]
  • 收到并连接重置为: Flags [R.]

简称解释

  • SYN - The synchronization flag is used to establish a three-way handshake between two hosts. Only the first packet from both the sender and receiver should have this flag set.
  • ACK - The acknowledgment flag is used to acknowledge the successful receipt of a packet. As we can see from the diagram above, the receiver sends an ACK as well as a SYN in the second step of the three-way handshake process to tell the sender that it received its initial packet.
  • FIN - The finished flag means there is no more data from the sender. Therefore, it is used in the last packet sent from the sender. It frees the reserved resources and gracefully terminates the connection.
  • URG - The urgent flag is used to notify the receiver to process the urgent packets before processing all other packets. The receiver will be notified when all known urgent data has been received. See RFC 6093 for more details.
  • PSH - The push flag is similar to the URG flag and tells the receiver to process these packets as they are received instead of buffering them. Usually, by default, the transport layer waits some time for the application layer to send enough data according to the maximum segment size so that the number of packets transmitted over the network is minimized. However, this is not desirable for certain applications, such as interactive applications (chatting). By using Push, this problem is solved.
  • RST - The reset flag gets sent from the receiver to the sender when a packet is sent to a particular host that was not expecting it.
  • ECE - This flag is responsible for indicating if the TCP peer is ECN capable. See RFC 3168 for more details.
  • CWR - The congestion window reduced flag is used by the sending host to indicate it received a packet with the ECE flag set. See RFC 3168 for more details.
  • NS (experimental) - The nonce sum flag is still an experimental flag used to help protect against accidental, malicious concealment of packets from the sender. See RFC 3540 for more details.

Is it allowed to send FIN, PSH and ACK in a single packet?

是的,这是允许的,也是正常的。如果这是要发送的最后一个数据,那么

  1. 确认先前的数据
  2. 指示没有新数据到来以及
  3. 指示应将数据推送到应用程序而不对即将到来的数据进行任何延迟的最有效方法是设置这三个标志。

推荐阅读

如何提升TCP三次握手的性能?

如何提升TCP四次挥手的性能?

欢迎关注我的其它发布渠道