redis-cluster

Redis-Cluster是一个去中心的分布式Redis存储架构,解决了Redis高可用、可扩展等问题。

集群

redis-cluster采用分片机制,将整个集群分为16384个槽(slot),默认平均分配在集群各个节点上。
每个节点负责维护一部分槽以及槽所映射的键值对
分片算法: slot=CRC16(KEY)%16384

请求流程

将请求发送到任意节点,接收到请求的节点将查询请求转发到正确的节点上执行,即redis会返回转向指令,类似302跳转

一致性

主从节点依然存在数据不一致的问题

容错

​redis-cluster采用replica方式冗余一定节点做slave,当某台主节点fail后在从节点中通过选举算法重新得出主节点。
​如果该节点及其slave节点都fail掉,则整个集群状态变为fail

故障检测

主观下线

集群中每个节点会定期向其他节点发送ping消息,接收节点恢复pong消息作为响应。如果在cluster-node-timeout时间内通信与某台节点ping-pong通信一直失败,则发送节点会认为接收节点存在故障并将接受节点标记为主观下线pfail

客观下线

当某个节点判断另一个节点主观下线后,相应的节点状态会跟随消息在集群内传播。通过gossip消息传播,集群内节点不断收集到故障节点的下线报告。当半数以上持有槽的主节点都标记某个节点是主观下线时,触发客观下线流程

  1. 首先统计有效的下线报告数量,如果小于集群内持有槽的主节点总数的一半则退出
  2. 当下线报告数量大于槽主节点数量一半时,标记对应故障节点为客观下线状态
  3. 向集群广播一条fail消息,通知所有节点将故障节点标记为客观下线,fail消息的消息体只包含故障节点的id

广播fail消息

广播fail消息是客观下线的最后一步,它承担着非常重要的职责

  1. 通知集群内所有节点标记故障节点为客观下线并立即生效
  2. 通知故障节点的从节点触发故障转移流程

注意:容错的过程中集群会短暂down的时间段 集群状态变化ok->fail->ok
节点超时选项:cluster-node-timeout,默认15秒

故障转移

当一个从节点发现自己正在复制的主节点进入下线状态时,从节点将开始对下线主节点进行故障转移

  1. 下线主节点n0的所有从节点中选中其中一个n1(选举)
  2. n1节点执行SLAVEOF NO ONE,成为新的主节点
  3. 新主节点n1节点会撤销所有对n0节点的槽指派,并将这些槽全部指派给自己
  4. 新的主节点n1向集群广播一条PONG消息,这条PONG消息可以让集群中的其他结点立即知道这个节点已经由从节点变成主节点,并且这个主节点已经接管了原本由已下线节点负责处理的槽
  5. 新的主节点开始接收和自己负责的槽有关的命令操作

读写能力优化

redis-cluster中master用于读写,slave节点用于备份。如果不介意读取的是有可能过期的数据,并且对写请求不感兴趣时,看通过readonly命令,将slave设置为可读。

不过redis-cluster中master其实是可以任意扩展的,扩容master其实是比扩容slave效果更好的方式

优缺点分析

redis-cluster集群方式优点是解耦了数据与节点之间的关系,简化了节点扩容和收缩难度
缺点如下:
​1.key是数据分区最小粒度,不能将一个大的键值对象,如hash等映射到不同节点上
​2.集群下只能使用db0
​3.事务支持有限,只支持同一个节点上的多个key的事务操作
​4.key批量操作支持有限,只支持具有相同slot值的key执行批量操作

常用命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#5.0创建集群
./redis-cli --cluster create --cluster-replicas 1 172.19.0.2:7000 172.19.0.3:7001 172.19.0.4:7002 172.19.0.5:7003 172.19.0.6:7004 172.19.0.7:7005
# 查看集群信息
./redis-cli --cluster info host:port
# 查看所有节点信息
./redis-cli -p xxxx cluster nodes
#rehash
./redis-cli --cluster rehash host:port
# 连接服务
redis-cli -c -h 172.19.0.5 -p 7003 注意-c 表示cluster模式
# 制造故障
redis-cli -h 172.19.0.3 -p 7002 debug segfault

实操

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# 首先启动6个节点,端口7000-7005,配置文件redis.conf如下
port 7000
cluster-enabled yes #是否开启集群模式
cluster-config-file nodes.conf #集群配置文件,由集群自动维护,不建议编辑
cluster-node-timeout 5000 #集群主观下线超时时间,默认15
appendonly yes
# 构建集群
$ /usr/local/bin/redis-cli --cluster create --cluster-replicas 1 172.19.0.2:7000 172.19.0.3:7001 172.19.0.4:7002 172.19.0.5:7003 172.19.0.6:7004 172.19.0.7:7005
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 172.19.0.6:7004 to 172.19.0.2:7000
Adding replica 172.19.0.7:7005 to 172.19.0.3:7001
Adding replica 172.19.0.5:7003 to 172.19.0.4:7002
M: d4ed4e41e78b3cdd7ec2c4a7999e536074a13777 172.19.0.2:7000
slots:[0-5460] (5461 slots) master
M: b4fc21183821f4dea0e6870711f92880a474898b 172.19.0.3:7001
slots:[5461-10922] (5462 slots) master
M: d0e37b840cb6c0792b858530c39c13a021b8b75d 172.19.0.4:7002
slots:[10923-16383] (5461 slots) master
S: d19b92cbbb9eb5d662ea91c4c7eacf1f5d3fdc2d 172.19.0.5:7003
replicates d0e37b840cb6c0792b858530c39c13a021b8b75d
S: 6179af99c40c338e8e93ceddfa1dd3b8168c486c 172.19.0.6:7004
replicates d4ed4e41e78b3cdd7ec2c4a7999e536074a13777
S: 10a1bab4d1da3bd514101905834983396ab5f8d1 172.19.0.7:7005
replicates b4fc21183821f4dea0e6870711f92880a474898b
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
....
>>> Performing Cluster Check (using node 172.19.0.2:7000)
M: d4ed4e41e78b3cdd7ec2c4a7999e536074a13777 172.19.0.2:7000
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: d0e37b840cb6c0792b858530c39c13a021b8b75d 172.19.0.4:7002
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
M: b4fc21183821f4dea0e6870711f92880a474898b 172.19.0.3:7001
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 10a1bab4d1da3bd514101905834983396ab5f8d1 172.19.0.7:7005
slots: (0 slots) slave
replicates b4fc21183821f4dea0e6870711f92880a474898b
S: 6179af99c40c338e8e93ceddfa1dd3b8168c486c 172.19.0.6:7004
slots: (0 slots) slave
replicates d4ed4e41e78b3cdd7ec2c4a7999e536074a13777
S: d19b92cbbb9eb5d662ea91c4c7eacf1f5d3fdc2d 172.19.0.5:7003
slots: (0 slots) slave
replicates d0e37b840cb6c0792b858530c39c13a021b8b75d
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
# 查看集群信息
$ /usr/local/bin/redis-cli --cluster info 127.0.0.1:7000
127.0.0.1:7000 (204c6d60...) -> 1 keys | 5461 slots | 1 slaves.
172.19.0.4:7002 (270a4a6f...) -> 1 keys | 5461 slots | 1 slaves.
172.19.0.3:7001 (dfa441f6...) -> 1 keys | 5462 slots | 1 slaves.
[OK] 3 keys in 3 masters.
0.00 keys per slot on average.
# 检查集群联通状态
$ /usr/local/bin/redis-cli --cluster check 127.0.0.1:7000
127.0.0.1:7000 (204c6d60...) -> 1 keys | 5461 slots | 1 slaves.
172.19.0.4:7002 (270a4a6f...) -> 1 keys | 5461 slots | 1 slaves.
172.19.0.3:7001 (dfa441f6...) -> 1 keys | 5462 slots | 1 slaves.
[OK] 3 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 204c6d6069b34754624d8207aeac58d7b323cbc7 127.0.0.1:7000
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 270a4a6f575434eac9581920255754257d8f9efa 172.19.0.4:7002
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 8d184e1386c510ac8d4ac97b9d676fa31793d56e 172.19.0.5:7003
slots: (0 slots) slave
replicates 270a4a6f575434eac9581920255754257d8f9efa
M: dfa441f6328a98fec0075b14a5162dd8f8e9df85 172.19.0.3:7001
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: f65b96c3dba73ae2dd63cd0c8b68a960e559a27c 172.19.0.6:7004
slots: (0 slots) slave
replicates 204c6d6069b34754624d8207aeac58d7b323cbc7
S: a0dc7f9ca08aa26909778d2ae2030c49b555f92f 172.19.0.7:7005
slots: (0 slots) slave
replicates dfa441f6328a98fec0075b14a5162dd8f8e9df85
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
# 查看集群节点信息 (有时info和check命令会hang住,可以使用cluster nodes来查看)
$ /usr/local/bin/redis-cli -h 127.0.0.1 -p 7000 cluster nodes
d0e37b840cb6c0792b858530c39c13a021b8b75d 172.19.0.4:7002@17002 master - 0 1561172107513 3 connected 10923-16383
d4ed4e41e78b3cdd7ec2c4a7999e536074a13777 172.19.0.2:7000@17000 myself,master - 0 1561172108000 1 connected 0-5460
b4fc21183821f4dea0e6870711f92880a474898b 172.19.0.3:7001@17001 master - 0 1561172108319 2 connected 5461-10922
10a1bab4d1da3bd514101905834983396ab5f8d1 172.19.0.7:7005@17005 slave b4fc21183821f4dea0e6870711f92880a474898b 0 1561172109328 6 connected
6179af99c40c338e8e93ceddfa1dd3b8168c486c 172.19.0.6:7004@17004 slave d4ed4e41e78b3cdd7ec2c4a7999e536074a13777 0 1561172107817 5 connected
d19b92cbbb9eb5d662ea91c4c7eacf1f5d3fdc2d 172.19.0.5:7003@17003 slave d0e37b840cb6c0792b858530c39c13a021b8b75d 0 1561172107513 4 connected
# 尝试使用集群
$ /usr/local/bin/redis-cli -c -h 127.0.0.1 -p 7000
127.0.0.1:7000> set a 1
-> Redirected to slot [15495] located at 172.19.0.4:7002
OK
172.19.0.4:7002> set b 1
-> Redirected to slot [3300] located at 172.19.0.2:7000
OK
172.19.0.2:7000> set c 1
-> Redirected to slot [7365] located at 172.19.0.3:7001
OK
# 容错试验 将使172.19.0.3:7001服务崩溃 可以看到7001已经是fail的状态了,原7001的slave节点7005变为了master 集群功能表现正常
$ /usr/local/bin/redis-cli -h 172.19.0.3 -p 7001 debug segfault
Error: Server closed the connection
$ /usr/local/bin/redis-cli -h 127.0.0.1 -p 7000 cluster nodes
d0e37b840cb6c0792b858530c39c13a021b8b75d 172.19.0.4:7002@17002 master - 0 1561172353558 3 connected 10923-16383
d4ed4e41e78b3cdd7ec2c4a7999e536074a13777 172.19.0.2:7000@17000 myself,master - 0 1561172352000 1 connected 0-5460
b4fc21183821f4dea0e6870711f92880a474898b 172.19.0.3:7001@17001 master,fail - 1561172342561 1561172341000 2 connected
10a1bab4d1da3bd514101905834983396ab5f8d1 172.19.0.7:7005@17005 master - 0 1561172353357 7 connected 5461-10922
6179af99c40c338e8e93ceddfa1dd3b8168c486c 172.19.0.6:7004@17004 slave d4ed4e41e78b3cdd7ec2c4a7999e536074a13777 0 1561172352348 5 connected
d19b92cbbb9eb5d662ea91c4c7eacf1f5d3fdc2d 172.19.0.5:7003@17003 slave d0e37b840cb6c0792b858530c39c13a021b8b75d 0 1561172353000 4 connected
$ /usr/local/bin# ./redis-cli -c -h 127.0.0.1 -p 7000
172.19.0.2:7000> set c 1
-> Redirected to slot [7365] located at 172.19.0.7:7005
OK
# 如果继续将7005也下掉 此时集群状态为fail
$ /usr/local/bin/redis-cli -h 172.19.0.7 -p 7005 debug segfault
$ /usr/local/bin/redis-cli -c -h 127.0.0.1 -p 7000
127.0.0.1:7000> set a 1
(error) CLUSTERDOWN The cluster is down
---------------查看机器日志
1:M 22 Jun 2019 02:59:08.309 * Marking node b4fc21183821f4dea0e6870711f92880a474898b as failing (quorum reached). #7001服务崩溃
1:M 22 Jun 2019 02:59:08.309 # Cluster state changed: fail #集群不可用
1:M 22 Jun 2019 02:59:09.364 # Failover auth granted to 10a1bab4d1da3bd514101905834983396ab5f8d1 for epoch 7 #7005从变主
1:M 22 Jun 2019 02:59:09.372 # Cluster state changed: ok #集群再次可用
1:M 22 Jun 2019 03:05:03.353 * Marking node 10a1bab4d1da3bd514101905834983396ab5f8d1 as failing (quorum reached). #7005断开服务
1:M 22 Jun 2019 03:05:03.353 # Cluster state changed: fail #集群不可用
可以看到节点断开一定会出现短暂时间内集群down
集群状态: ok->fail->ok

FAQ

1.为什么redis-cluster的槽数为16384个

  • 可以减小心跳包的大小。槽数为16384时,心跳包槽状态信息占用2k字节空间;槽数为63336个时,心跳包槽状态信息占用8k字节空间。
  • redis-cluster集群节点数量不建议超过1000个

因此使用16384个槽既可以减小心跳包带来的带宽浪费,又可以满足绝大多数场景(1000个节点以内)