本文共 18799 字,大约阅读时间需要 62 分钟。
之前文章介绍了, 元数据可以持久化在RAM或Disc.从这个角度可以把RabbitMQ集群中的节点分成两种:RAM Node 和 Disk Node. RAM Node只会将元数据存放在RAM,Disc node 会将元数据持久化到磁盘.
单节点系统就没有什么选择了,只允许disk node,否则由于没有数据冗余一旦重启就会丢掉所有的配置信息.但在节点环境中可以选择哪些节点是RAM node.
RabbitMQ Cluster的部署非常方便,有一些需要注意的细节,只要做过Erlang节点互连的,这些也都是耳熟能详的了:
[1] 统一 Erlang Cookie; 虽然官方网站上提到了修改.erlang.cookie的方式,不过我从来没有这样做过,都是启动erlang node的时候使用 -setcookie 显示指定cookie;这样做的影响就是rabbitmqctl由于没有指定cookie不能正常使用了,可以同样修改一下添加-setcookie.这里为了方便我拷贝rabbitmqctl新建了一个工具rabbitmq-util指定了cookie.如下:
exec erl \ -pa "${RABBITMQ_HOME}/ebin" \ -noinput \ -hidden \ ${RABBITMQ_CTL_ERL_ARGS} \ -setcookie zen_rabbitmq \ -name rabbitmqctl@zen.com \ -s rabbit_control \ -nodename $RABBITMQ_NODENAME \ -extra "$@"
需要统一erlang cookie的脚本有:rabbitmqctl rabbitmq-server
[2] 如果使用sname创建Erlang节点不包含节点所在机器的域名,如果使用name就需要指定域名,比如: 127.0.0.1 zen.com
Windows 下c:\Windows\System32\drivers\etc\hosts
Centos 路径 /etc/hosts
[3] 如果在一台机器上启动多个节点,就需要用端口号和节点名称区分开,即使在多个机器上部署一般我们也会避免使用RabbitMQ的默认端口.这里也有很多方式,最快捷的方式就是添加变量启动节点,在生产环境肯定需要使用配置文件来实现.下面是我测试使用的一组节点启动命令:
RABBITMQ_NODE_PORT=9991 RABBITMQ_NODENAME=z_91@zen.com ./rabbitmq-server -detachedRABBITMQ_NODE_PORT=9992 RABBITMQ_NODENAME=z_92@zen.com ./rabbitmq-server -detachedRABBITMQ_NODE_PORT=9993 RABBITMQ_NODENAME=z_93@zen.com ./rabbitmq-server -detachedRABBITMQ_NODE_PORT=9994 RABBITMQ_NODENAME=z_94@zen.com ./rabbitmq-server -detached
[4] 如果是在多台物理机进行测试,那么注意打开4369端口,保证EPMD正常工作.这个也可以通过修改环境变量ERL_EPMD_PORT使用别的端口.
[5] 执行命令细心一点,特别是关闭应用程序是stop_app,如果你执行的是stop,整个节点都会关闭,后续操作就错了;
有一些操作过程不需要执行reset,这里也要注意,想清楚自己要做什么再动手.
下面我们走一个step by step的过程,完成一些RabbitMQ集群组建的常见的操作
从零开始创建集群
[root@localhost scripts]#[root@localhost scripts]# RABBITMQ_NODE_PORT=9991 RABBITMQ_NODENAME=z_91@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# RABBITMQ_NODE_PORT=9992 RABBITMQ_NODENAME=z_92@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com stop_appStopping node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com resetResetting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster z_92@zen.comClustering node 'z_91@zen.com' with ['z_92@zen.com'] ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com start_appStarting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster_statusCluster status of node 'z_91@zen.com' ...[{nodes,[{disc,['z_92@zen.com']},{ram,['z_91@zen.com']}]},{running_nodes,['z_92@zen.com','z_91@zen.com']}]...done.
细心的你一定发现了,这里的结果有点奇怪,91节点将92节点拉入组成集群,但是disc节点是92,91节点是ram节点!这是怎么回事?我们暂且按下不表,后面细说,先来把实验做完.
退出集群
记得911的一部纪录片提到劫机的匪徒在学习开飞机的课程只学习了起飞,没有学习降落,这,,,,,这就是找死去的啊.
我们能够将Erlang 节点加入集群,也要学会退出集群. 看一下详细的步骤:
[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com stop_appStopping node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com resetResetting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com start_appStarting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com cluster_status Cluster status of node 'z_92@zen.com' ...[{nodes,[{disc,['z_92@zen.com']}]},{running_nodes,['z_92@zen.com']}]...done.
可以看到集群中已经没有91节点了.
换一种方式组建集群
下面换一种方式组建集群,目的是观察rabbitmq在构建集群是如何选择Disc node的.和第一种组建方式的差异在于这行命令: ./rabbitmq-util -n z_91@zen.com cluster z_92@zen.com z_91@zen.com 这样完成组建之后,查看一下集群状态,注意disk node的已经变成了: [{nodes,[{disc,['z_91@zen.com','z_92@zen.com']}]},{running_nodes,['z_92@zen.com','z_91@zen.com']}]
[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com stop_app Stopping node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com resetResetting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster z_92@zen.com z_91@zen.comClustering node 'z_91@zen.com' with ['z_92@zen.com','z_91@zen.com'] ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com start_appStarting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster_statusCluster status of node 'z_91@zen.com' ...[{nodes,[{disc,['z_91@zen.com','z_92@zen.com']}]},{running_nodes,['z_92@zen.com','z_91@zen.com']}]...done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com cluster_statusCluster status of node 'z_92@zen.com' ...[{nodes,[{disc,['z_91@zen.com','z_92@zen.com']}]},{running_nodes,['z_91@zen.com','z_92@zen.com']}]...done.[root@localhost scripts]#
节点类型转换--将92从disk node转为ram node
上面组建的rabbitmq集群里面有两个节点91 92,这两个节点都是disk节点,我们希望可以动态调整节点的类型,比如把92节点从disk node 转成ram node.看一下操作步骤
通过查看集群状态,可以看到92已经变成了Ram节点.注意:在停止了92的应用程序之后并没有执行reset操作.
[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com stop_appStopping node 'z_92@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com cluster z_91@zen.comClustering node 'z_92@zen.com' with ['z_91@zen.com'] ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com start_app Starting node 'z_92@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com cluster_statusCluster status of node 'z_92@zen.com' ...[{nodes,[{disc,['z_91@zen.com']},{ram,['z_92@zen.com']}]},{running_nodes,['z_91@zen.com','z_92@zen.com']}]...done.[root@localhost scripts]#
节点类型转换--将92修改为disk节点
上面的过程将92从disk node转换成为ram node ,下面我们执行逆过程,将92再转成disk node,看下过程:
[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster_status Cluster status of node 'z_91@zen.com' ...[{nodes,[{disc,['z_91@zen.com']},{ram,['z_92@zen.com']}]},{running_nodes,['z_92@zen.com','z_91@zen.com']}]...done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com stop_appStopping node 'z_92@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com cluster z_91@zen.com z_92@zen.comClustering node 'z_92@zen.com' with ['z_91@zen.com','z_92@zen.com'] ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com start_appStarting node 'z_92@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster_statusCluster status of node 'z_91@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']}]},{running_nodes,['z_92@zen.com','z_91@zen.com']}]...done.[root@localhost scripts]#
增加node 93
在上面的基础上我们新增一个节点93:
[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_statusCluster status of node 'z_93@zen.com' ...[{nodes,[{disc,['z_93@zen.com']}]},{running_nodes,['z_93@zen.com']}]...done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com stop_appStopping node 'z_93@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com resetResetting node 'z_93@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster z_91@zen.com z_92@zen.com Clustering node 'z_93@zen.com' with ['z_91@zen.com','z_92@zen.com'] ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com start_appStarting node 'z_93@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_statusCluster status of node 'z_93@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']},{ram,['z_93@zen.com']}]},{running_nodes,['z_92@zen.com','z_91@zen.com','z_93@zen.com']}]...done.
重新启动节点,我们首先启动节点RAM节点93
[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com stopStopping and halting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com stopStopping and halting node 'z_92@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com stopStopping and halting node 'z_93@zen.com' ......done.[root@localhost scripts]# RABBITMQ_NODE_PORT=9993 RABBITMQ_NODENAME=z_93@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_statusCluster status of node 'z_93@zen.com' ...Error: unable to connect to node 'z_93@zen.com': nodedown
启动93失败了,这是因为93节点是RAM节点并没有持久化集群的元数据,启动时需要连接到disk node获取集群元数据,而这时其它的节点都没有启动,所以启动就失败了.下面我们尝试先启动92节点:
[root@localhost scripts]# RABBITMQ_NODE_PORT=9992 RABBITMQ_NODENAME=z_92@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com cluster_status Cluster status of node 'z_92@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']},{ram,['z_93@zen.com']}]},{running_nodes,['z_92@zen.com']}]...done.
是正常的,下面我们启动93节点
[root@localhost scripts]# RABBITMQ_NODE_PORT=9993 RABBITMQ_NODENAME=z_93@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_status Cluster status of node 'z_93@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']},{ram,['z_93@zen.com']}]},{running_nodes,['z_92@zen.com','z_93@zen.com']}]...done.
乘胜追击,继续启动91节点,注意cluster_status里面running nodes的变化:
[root@localhost scripts]# RABBITMQ_NODE_PORT=9991 RABBITMQ_NODENAME=z_91@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_status Cluster status of node 'z_93@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']},{ram,['z_93@zen.com']}]},{running_nodes,['z_91@zen.com','z_92@zen.com','z_93@zen.com']}]...done.[root@localhost scripts]#
发布消息到集群
我们用C#写一段代码连接到92节点 创建队列,并发布两条消息;可以看到消息虽然连接到92节点,集群中的其它节点也都有了队列信息.
[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com list_queuesListing queues ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com list_queuesListing queues ...zen_qp_pic_queue 1qp_pic_queue2 1...done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com list_queues Listing queues ...zen_qp_pic_queue 1qp_pic_queue2 1...done.[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com list_queuesListing queues ...zen_qp_pic_queue 1qp_pic_queue2 1...done.[root@localhost scripts]#
在没有disk node的情况下,添加节点,移除节点
为了方便下面的实验,我们添加94节点到集群中,过程省略,我们检查一下集群状态:
[root@localhost scripts]# ./rabbitmq-util -n z_94@zen.com cluster_statusCluster status of node 'z_94@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']}, {ram,['z_94@zen.com','z_93@zen.com']}]},{running_nodes,['z_92@zen.com','z_93@zen.com','z_91@zen.com', 'z_94@zen.com']}]...done.
现在集群中两个disk node: 91 92 两个RAM Node:93 94
[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com stopStopping and halting node 'z_91@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_92@zen.com stopStopping and halting node 'z_92@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_statusCluster status of node 'z_93@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']}, {ram,['z_94@zen.com','z_93@zen.com']}]},{running_nodes,['z_94@zen.com','z_93@zen.com']}]...done.[root@localhost scripts]#
下面我们新增一个节点到集群中(过程略),看下结果
[root@localhost scripts]# ./rabbitmq-util -n z_95@zen.com cluster_statusCluster status of node 'z_95@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']}, {ram,['z_95@zen.com','z_94@zen.com','z_93@zen.com']}]},{running_nodes,['z_94@zen.com','z_93@zen.com','z_95@zen.com']}]...done.
现在我们把集群中的所有节点都关闭,然后再启动,看下会是什么情况:
[root@localhost scripts]# ./rabbitmq-util -n z_95@zen.com stopStopping and halting node 'z_95@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_94@zen.com stopStopping and halting node 'z_94@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com stopStopping and halting node 'z_93@zen.com' ......done.
启动集群中所有的节点
[root@localhost scripts]# RABBITMQ_NODE_PORT=9991 RABBITMQ_NODENAME=z_91@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_91@zen.com cluster_statusCluster status of node 'z_91@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']}, {ram,['z_94@zen.com','z_93@zen.com']}]},{running_nodes,['z_91@zen.com']}]...done.
[root@localhost scripts]# RABBITMQ_NODE_PORT=9992 RABBITMQ_NODENAME=z_92@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# RABBITMQ_NODE_PORT=9993 RABBITMQ_NODENAME=z_93@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# RABBITMQ_NODE_PORT=9994 RABBITMQ_NODENAME=z_94@zen.com ./rabbitmq-server -detachedActivating RabbitMQ plugins ...0 plugins activated:[root@localhost scripts]# ./rabbitmq-util -n z_93@zen.com cluster_status Cluster status of node 'z_93@zen.com' ...[{nodes,[{disc,['z_92@zen.com','z_91@zen.com']}, {ram,['z_94@zen.com','z_93@zen.com']}]},{running_nodes,['z_94@zen.com','z_92@zen.com','z_93@zen.com']}]...done.
这时都是正常的,但是如果我们把95节点也启动,就会出现异常,91 92 节点可能会当掉.后面再启动就变成非常混乱的局面了.同样,如果在没有disk node的情况下移除了节点,也会导致这种混乱,甚至会导致disk node无法正常启动,必须把节点重新加入之后,disk node才可以正常启动.
在disk node全部关闭的情况下,我们可以继续使用集群,就像什么都没有发生一样,但是使用过程中声明的新的exchange queues等都会随着节点的重启烟消云散.
问题1 怎样从头再来?
如果你测试过程中把节点关系搞得乱七八糟,各种重启都会失败,想从头再来,但是崩溃的是reset命令执行也是失败;没有关系,要Hold住,转到/var/lib/rabbitmq/mnesia 目录把出问题节点对应的文件删掉,重启即可.
问题2 "Incompatible schema cookies. Please, restart from old backup"
在组建RabbitMQ集群的过程中,你可能会遇到"Incompatible schema cookies. Please, restart from old backup"的问题,这往往是下面的原因造成的:cluster多个节点,而这些节点并没有构成集群.复现一下这个错误:我们启动95 96 97 三个独立的节点,然后 ./rabbitmq-util -n z_97@zen.com cluster z_95@zen.com z_96@zen.com 注意这时95 96并没有组成集群,发生了上面的"Incompatible schema cookies. Please, restart from old backup"异常.
往往着急看到效果的时候,会犯这样的错,在不熟练的时候循序渐进的练习一下是很有必要的,可以避开一些坑.
[root@localhost scripts]# ./rabbitmq-util -n z_95@zen.com cluster_statusCluster status of node 'z_95@zen.com' ...[{nodes,[{disc,['z_95@zen.com']}]},{running_nodes,['z_95@zen.com']}]...done.[root@localhost scripts]# ./rabbitmq-util -n z_96@zen.com cluster_statusCluster status of node 'z_96@zen.com' ...[{nodes,[{disc,['z_96@zen.com']}]},{running_nodes,['z_96@zen.com']}]...done.[root@localhost scripts]# ./rabbitmq-util -n z_97@zen.com cluster_status Cluster status of node 'z_97@zen.com' ...[{nodes,[{disc,['z_97@zen.com']}]},{running_nodes,['z_97@zen.com']}]...done.[root@localhost scripts]# ./rabbitmq-util -n z_97@zen.com stop_appStopping node 'z_97@zen.com' ......done.[root@localhost scripts]# ./rabbitmq-util -n z_97@zen.com cluster z_95@zen.com z_96@zen.comClustering node 'z_97@zen.com' with ['z_95@zen.com','z_96@zen.com'] ...Error: {unable_to_join_cluster, ['z_95@zen.com','z_96@zen.com'], {merge_schema_failed, "Incompatible schema cookies. Please, restart from old backup.'z_95@zen.com' = [{name,schema},{type,set},{ram_copies,[]},{disc_copies,['z_95@zen.com']},{disc_only_copies,[]},{load_order,0},{access_mode,read_write},{majority,false},{index,[]},{snmp,[]},{local_content,false},{record_name,schema},{attributes,[table,cstruct]},{user_properties,[]},{frag_properties,[]},{storage_properties,[]},{cookie,{ {1352,726635,757709},'z_95@zen.com'}},{version,{ {3,0},{'z_95@zen.com',{1352,727291,753066}}}}], 'z_97@zen.com' = [{name,schema},{type,set},{ram_copies,['z_97@zen.com']},{disc_copies,['z_96@zen.com']},{disc_only_copies,[]},{load_order,0},{access_mode,read_write},{majority,false},{index,[]},{snmp,[]},{local_content,false},{record_name,schema},{attributes,[table,cstruct]},{user_properties,[]},{frag_properties,[]},{storage_properties,[]},{cookie,{ {1352,727184,282322},'z_96@zen.com'}},{version,{ {3,0},{'z_97@zen.com',{1352,727291,780429}}}}]\n"}}[root@localhost scripts]#
注: Rabbitmq 上有人遇到相同的问题 []
通过上面的动手实验,我们已经可以创建和管理RabbitMQ Cluster,但是创建RAM节点还是Disc节点呢?如何做这个选择呢?咱们下回再说
附RabbitMQ Cluster 文档:
另外,在测试过程中往往在单机创建多个实例,下面的命令常用:
A cluster on a single machine
Under some circumstances it can be useful to run a cluster of RabbitMQ nodes on a single machine. This would typically be useful for experimenting with clustering on a desktop or laptop without the overhead of starting several virtual machines for the cluster. The two main requirements for running more than one node on a single machine are that each node should have a unique name and bind to a unique port / IP address combination for each protocol in use.
You can start multiple nodes on the same host manually by repeated invocation of rabbitmq-server (rabbitmq-server.bat on Windows). You must ensure that for each invocation you set the environment variables RABBITMQ_NODENAME and RABBITMQ_NODE_PORT to suitable values.
For example:
$ RABBITMQ_NODE_PORT=5672 RABBITMQ_NODENAME=rabbit rabbitmq-server -detached$ RABBITMQ_NODE_PORT=5673 RABBITMQ_NODENAME=hare rabbitmq-server -detached$ rabbitmqctl -n hare stop_app$ rabbitmqctl -n hare join_cluster rabbit@`hostname -s`$ rabbitmqctl -n hare start_appwill set up a two node cluster with one disc node and one ram node. Note that if you have RabbitMQ opening any ports other than AMQP, you'll need to configure those not to clash as well - for example:
$ RABBITMQ_NODE_PORT=5672 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15672}]" RABBITMQ_NODENAME=rabbit rabbitmq-server -detached$ RABBITMQ_NODE_PORT=5673 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]" RABBITMQ_NODENAME=hare rabbitmq-server -detachedwill start two nodes (which can then be clustered) when the management plugin is installed.
最后,小图一张 Maggie Q 简称MQ 我们俩都姓李 Nikita!
转载地址:http://wtuoo.baihongyu.com/