TCP 粘包和拆包
# TCP 粘包和拆包
TCP是一种面向连接的、可靠的、基于字节流的传输协议。其可能产生粘包和拆包问题。
# 什么是粘包和拆包
- TCP 粘包是指接收方在一次接收中收到了发送方多次发送的数据。如发送方分两次发送了
Hello
和World
, 接收方一次接收到HelloWorld
。 - TCP 拆包是指发送方发送一次数据,接收方一次只接收部分数据。如发送方分两次发送了
HelloWorld
, 接收方分多次接收到Hello
和World
。
# 粘包
# 产生原因
为了提高传输效率和减少网络流量,TCP会使用Nagle算法 (opens new window)将短时间内接收、发送的多个小数据块合并成一个大数据块放在缓存中,接收端从缓存中取到了连续的多个数据块,这些数据块之间没有边界,无法将这些数据还原。
# 示例
server.py
import socket
import time
host = "localhost"
port = 8090
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host, port))
s.listen(5)
conn,addr=s.accept()
data1=conn.recv(1024)
data2=conn.recv(1024)
print('data1: ', data1.decode())
print('data2: ', data2.decode())
s.close()
client.py
import socket
import time
host = "localhost"
port = 8090
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
s.send("hello".encode())
s.send("python3".encode())
打开两个 Terminal,一个运行python3 server.py
,另一个运行python3 client.py
,第一个终端将输出:
bing@GH-HT6M693:~$ python3 server.py
data1: hellopython3
data2:
如果同时使用 tcpdump 监视网络流量,输出如下:
root@GH-HT6M693:~# tcpdump -i lo 'port 8090'
13:33:28.371499 IP localhost.60048 > localhost.8090: Flags [S], seq 2472192704, win 65495, options [mss 65495,sackOK,TS val 4195690721 ecr 0,nop,wscale 7], length 0
13:33:28.371505 IP localhost.8090 > localhost.60048: Flags [S.], seq 3673077898, ack 2472192705, win 65483, options [mss 65495,sackOK,TS val 4195690721 ecr 4195690721,nop,wscale 7], length 0
13:33:28.371511 IP localhost.60048 > localhost.8090: Flags [.], ack 1, win 512, options [nop,nop,TS val 4195690721 ecr 4195690721], length 0
13:33:28.371538 IP localhost.60048 > localhost.8090: Flags [P.], seq 1:6, ack 1, win 512, options [nop,nop,TS val 4195690721 ecr 4195690721], length 5
13:33:28.371540 IP localhost.8090 > localhost.60048: Flags [.], ack 6, win 512, options [nop,nop,TS val 4195690721 ecr 4195690721], length 0
13:33:28.371544 IP localhost.60048 > localhost.8090: Flags [P.], seq 6:13, ack 1, win 512, options [nop,nop,TS val 4195690721 ecr 4195690721], length 7
13:33:28.371545 IP localhost.8090 > localhost.60048: Flags [.], ack 13, win 512, options [nop,nop,TS val 4195690721 ecr 4195690721], length 0
13:33:28.372139 IP localhost.60048 > localhost.8090: Flags [F.], seq 13, ack 1, win 512, options [nop,nop,TS val 4195690722 ecr 4195690721], length 0
13:33:28.373304 IP localhost.8090 > localhost.60048: Flags [F.], seq 1, ack 14, win 512, options [nop,nop,TS val 4195690723 ecr 4195690722], length 0
13:33:28.373312 IP localhost.60048 > localhost.8090: Flags [.], ack 2, win 512, options [nop,nop,TS val 4195690723 ecr 4195690723], length 0
可见 60048 端口向 8090 端口发送了两次请求,长度分别是 5 和 7,对应的数据分别是 hello
和 python3
,但接收方收到的数据是hellopython3
,出现了粘包。
# 拆包
# 产生原因
发送方一次发送的数据量超过了 MSS 的大小,将会被拆分成多个TCP报文发送。
# 示例
server.py
import socket
import time
host = "localhost"
port = 8090
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host, port))
s.listen(5)
conn,addr=s.accept()
# 接收大于 MTU 的数据量
data1=conn.recv(100000)
data2=conn.recv(100000)
print('data1: ', len(data1.decode()))
print('data2: ', len(data2.decode()))
s.close()
client.py
import socket
import time
host = "localhost"
port = 8090
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
msg=''
for i in range(0, 10000):
msg+='python3'
# 发送大于 MTU 的数据量
s.send(msg.encode())
打开两个 Terminal,一个运行python3 server.py
,另一个运行python3 client.py
,第一个终端将输出:
bing@GH-HT6M693:~$ python3 server.py
data1: 70000
data2: 0
如果同时使用 tcpdump 监视网络流量,输出如下:
root@GH-HT6M693:~# tcpdump -i lo 'port 8090'
14:01:48.026845 IP localhost.44340 > localhost.8090: Flags [S], seq 2339273820, win 65495, options [mss 65495,sackOK,TS val 4197390377 ecr 0,nop,wscale 7], length 0
14:01:48.026852 IP localhost.8090 > localhost.44340: Flags [S.], seq 3182816013, ack 2339273821, win 65483, options [mss 65495,sackOK,TS val 4197390377 ecr 4197390377,nop,wscale 7], length 0
14:01:48.026857 IP localhost.44340 > localhost.8090: Flags [.], ack 1, win 512, options [nop,nop,TS val 4197390377 ecr 4197390377], length 0
14:01:48.028102 IP localhost.44340 > localhost.8090: Flags [.], seq 1:32742, ack 1, win 512, options [nop,nop,TS val 4197390378 ecr 4197390377], length 32741
14:01:48.028118 IP localhost.8090 > localhost.44340: Flags [.], ack 32742, win 380, options [nop,nop,TS val 4197390378 ecr 4197390378], length 0
14:01:48.028127 IP localhost.44340 > localhost.8090: Flags [P.], seq 32742:65483, ack 1, win 512, options [nop,nop,TS val 4197390378 ecr 4197390377], length 32741
14:01:48.028132 IP localhost.44340 > localhost.8090: Flags [P.], seq 65483:70001, ack 1, win 512, options [nop,nop,TS val 4197390378 ecr 4197390378], length 4518
14:01:48.028220 IP localhost.8090 > localhost.44340: Flags [.], ack 70001, win 512, options [nop,nop,TS val 4197390378 ecr 4197390378], length 0
14:01:48.028916 IP localhost.44340 > localhost.8090: Flags [F.], seq 70001, ack 1, win 512, options [nop,nop,TS val 4197390379 ecr 4197390378], length 0
14:01:48.030105 IP localhost.8090 > localhost.44340: Flags [F.], seq 1, ack 70002, win 512, options [nop,nop,TS val 4197390380 ecr 4197390379], length 0
14:01:48.030111 IP localhost.44340 > localhost.8090: Flags [.], ack 2, win 512, options [nop,nop,TS val 4197390380 ecr 4197390380], length 0
可见 44340 端口向 8090 端口发送了三次数据,长度分别是 32741、32741 和 4518,出现了拆包。
此处尚有疑问:为何三次数据的总长度不等于接收到的数据长度?
# 其他概念
# MSS
是指 TCP 连接建立后双方约定的可传输的最大报文长度,它的值需通过 MTU 计算,计算公式为 MSS = MTU - 20(IP Header) - 20(TCP Header)
MTU 可以通过 ip
命令查看:
bing@GH-HT6M693:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:5d:e3:0f:de brd ff:ff:ff:ff:ff:ff
inet 172.22.214.60/20 brd 172.22.223.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::215:5dff:fee3:fde/64 scope link
valid_lft forever preferred_lft forever
如上,lo
网卡的 MSS = 65536 - 20 - 20 = 65496