Today, we’ll review MySQL Fabric’s HA concepts, and then walk you through the setup of a 3-node cluster with one Primary and two Secondaries, doing a few basic tests with it. In a second post, we will spend more time generating failure scenarios and documenting how Fabric handles them.(MySQL Fabric is an extensible framework to manage large farms of MySQL servers, with support for high-availability and sharding.)
Before we begin, we recommend you read this postby Oracle’s Mats Kindahl, which, among other things, addresses the issues we raised on our first post. Mats leads the MySQL Fabric team.
All our tests will be using our test environment with Vagrant (https://github.com/martinarrieta/vagrant-fabric)
If you want to play with MySQL Fabric, you can have these VMs running in your desktop following the instructions in the README file. If you don’t want full VMs, our colleague Jervin Real created aset of wrapper scriptsthat let you test MySQL Fabric using sandboxes.
Here is a basic representation of our environment.
To set up MyQSL Fabric without using our Vagrant environment, you can follow theofficial documentation, or check the ansibleplaybooksin our lab repo. If you follow the manual, the only caveat is that whencreating the user, you should either disable binary logging for your session, or use a GRANT statement instead of CREATE USER. You can readhere for more info on why this is the case.
A description of all the options in the configuration file can be foundhere. For HA tests, the one thing to mention is that, in our experience, the failure detector will only trigger an automatic failover if the value forfailover_intervalin the[failure_tracking]section is greater than 0. Otherwise, failures will be detected and written to the log, but no action will be taken.
In order to manage a mysqld instance with MySQL Fabric, the following options need to be set in the [mysqld] section of its my.cnf file:
log_bingtid-mode=ONenforce-gtid-consistencylog_slave_updates
log_bingtid-mode=ON enforce-gtid-consistency log_slave_updates |
Additionally, as in any replication setup, you must make sure that all servers have a distinctserver_id.
When everything is in place, you can setup and start MySQL Fabric with the following commands:
[vagrant@store ~]$ mysqlfabric manage setup[vagrant@store ~]$ mysqlfabric manage start --daemon
[vagrant@store~]$mysqlfabricmanagesetup [vagrant@store~]$mysqlfabricmanagestart--daemon |
Thesetupcommand creates the database schema used by MySQL Fabric to store information about managed servers, and thestartone, well, starts the daemon. The–daemonoption makes Fabric start as a daemon, logging to a file instead of to standard output. Depending on the port and file name you configured in fabric.cfg, this may need to be run as root.
While testing, you can make MySQL Fabric reset its state at any time (though itwon’tchange existing node configurations such as replication) by running:
[vagrant@store ~]$ mysqlfabric manage teardown[vagrant@store ~]$ mysqlfabric manage setup
[vagrant@store~]$mysqlfabricmanageteardown [vagrant@store~]$mysqlfabricmanagesetup |
If you’re using our Vagrant environment, you can run the reinit_cluster.sh scriptfrom your host OS(from the root of the vagrant-fabric repo) to do this for you, and also initialise the datadir of the three instances.
A High Availability Cluster is a set of servers using the standard Asynchronous MySQL Replication with GTID.
The first step is to create the group by running mysqlfabric with this syntax:
$ mysqlfabric group create
$mysqlfabricgroupcreate |
In our example, to create the cluster “mycluster” you can run:
[vagrant@store ~]$ mysqlfabric group create myclusterProcedure :{ uuid= 605b02fb-a6a1-4a00-8e24-619cad8ec4c7,finished= True,success = True,return= True,activities=}
[vagrant@store~]$mysqlfabricgroupcreatemycluster Procedure:{uuid =605b02fb-a6a1-4a00-8e24-619cad8ec4c7, finished =True, success =True, return =True, activities = } |
The second step is add the servers to the group. The syntax to add a server to a group is:
$ mysqlfabric group add[:port]
$mysqlfabricgroupadd |
The port number is optional and only required if distinct from 3306. It is important to mention that the clients that will use this cluster must be able to resolve this host or IP. This is because clients will connect directly both with MySQL Fabric’s XML-PRC server and with the managed mysqld servers. Let’s add the nodes to our group.
[vagrant@store ~]$ for i in 1 2 3; do mysqlfabric group add mycluster node$i; doneProcedure :{ uuid= 9d65c81c-e28a-437f-b5de-1d47e746a318,finished= True,success = True,return= True,activities=}Procedure :{ uuid= 235a7c34-52a6-40ad-8e30-418dcee28f1e,finished= True,success = True,return= True,activities=}Procedure :{ uuid= 4da3b1c3-87cc-461f-9705-28a59a2a4f67,finished= True,success = True,return= True,activities=}
[vagrant@store~]$foriin123;domysqlfabricgroupaddmyclusternode$i;done Procedure:{uuid =9d65c81c-e28a-437f-b5de-1d47e746a318, finished =True, success =True, return =True, activities = }Procedure:{uuid =235a7c34-52a6-40ad-8e30-418dcee28f1e, finished =True, success =True, return =True, activities = }Procedure:{uuid =4da3b1c3-87cc-461f-9705-28a59a2a4f67, finished =True, success =True, return =True, activities = } |
Now that we have all our nodes in the group, we have to promote one of them. You can promote one specific node or you can let MySQL Fabric to choose one for you.
The syntax to promote a specific node is:
$ mysqlfabric group promote--slave_uuid=' '
$mysqlfabricgrouppromote |
or to let MySQL Fabric pick one:
$ mysqlfabric group promote
$mysqlfabricgrouppromote |
Let’s do that:
[vagrant@store ~]$ mysqlfabric group promote myclusterProcedure :{ uuid= c4afd2e7-3864-4b53-84e9-04a40f403ba9,finished= True,success = True,return= True,activities=}
[vagrant@store~]$mysqlfabricgrouppromotemycluster Procedure:{uuid =c4afd2e7-3864-4b53-84e9-04a40f403ba9, finished =True, success =True, return =True, activities = } |
You can then check the health of the group like this:
[vagrant@store ~]$ mysqlfabric group health myclusterCommand :{ success = Truereturn= {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}activities=}
[vagrant@store~]$mysqlfabricgrouphealthmycluster Command:{success =True return ={'e245ec83-d889-11e3-86df-0800274fb806':{'status':'SECONDARY','is_alive':True,'threads':{}},'e826d4ab-d889-11e3-86df-0800274fb806':{'status':'SECONDARY','is_alive':True,'threads':{}},'edf2c45b-d889-11e3-86df-0800274fb806':{'status':'PRIMARY','is_alive':True,'threads':{}}} activities = } |
One current limitation of the ‘health’ command is that it only identifies servers by their uuid. To get a list of the servers in a group, along with quick status summary, and their host names, use lookup_servers instead:
[vagrant@store ~]$ mysqlfabric group lookup_servers myclusterCommand :{ success = Truereturn= [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'SECONDARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node2'}, {'status': 'PRIMARY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}]activities=}
[vagrant@store~]$mysqlfabricgrouplookup_serversmycluster Command:{success =True return =[{'status':'SECONDARY','server_uuid':'e245ec83-d889-11e3-86df-0800274fb806','mode':'READ_ONLY','weight':1.0,'address':'node1'},{'status':'SECONDARY','server_uuid':'e826d4ab-d889-11e3-86df-0800274fb806','mode':'READ_ONLY','weight':1.0,'address':'node2'},{'status':'PRIMARY','server_uuid':'edf2c45b-d889-11e3-86df-0800274fb806','mode':'READ_WRITE','weight':1.0,'address':'node3'}] activities = } |
We sent amergerequest to use a Json string instead of the “print” of the object in the “return” field from the XML-RPC in order to be able to use that information to display the results in a friendly way. In the same merge, we have added the address of the servers in the health command too.
Now we have the three lab machines set up in a replication topology of one master (thePRIMARYserver) and two slaves (theSECONDARYones). To make MySQL Fabric start monitoring the group for problems, you need to activate it:
[vagrant@store ~]$ mysqlfabric group activate myclusterProcedure :{ uuid= 230835fc-6ec4-4b35-b0a9-97944c18e21f,finished= True,success = True,return= True,activities=}
[vagrant@store~]$mysqlfabricgroupactivatemycluster Procedure:{uuid =230835fc-6ec4-4b35-b0a9-97944c18e21f, finished =True, success =True, return =True, activities = } |
Now MySQL Fabric will monitor the group’s servers, and depending on the configuration (remember thefailover_intervalwe mentioned before) it may trigger an automatic failover. But let’s start testing a simpler case, by stopping mysql on one of the secondary nodes:
[vagrant@node2 ~]$ sudo service mysqld stopStopping mysqld: [OK]
[vagrant@node2~]$sudoservicemysqldstop Stoppingmysqld: [ OK ] |
And checking how MySQL Fabric report’s the group’s health after this:
[vagrant@store ~]$ mysqlfabric group health myclusterCommand :{ success = Truereturn= {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': False, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}activities=}
[vagrant@store~]$mysqlfabricgrouphealthmycluster Command:{success =True return ={'e245ec83-d889-11e3-86df-0800274fb806':{'status':'SECONDARY','is_alive':True,'threads':{}},'e826d4ab-d889-11e3-86df-0800274fb806':{'status':'FAULTY','is_alive':False,'threads':{}},'edf2c45b-d889-11e3-86df-0800274fb806':{'status':'PRIMARY','is_alive':True,'threads':{}}} activities = } |
We can see that MySQL Fabric successfully marks the server as faulty. In our next post we’ll show an example of this by using one of the supported connectors to handle failures in a group, but for now, let’s keep on the DBA/sysadmin side of things, and try to bring the server back online:
[vagrant@node2 ~]$ sudo service mysqld startStarting mysqld: [OK][vagrant@store ~]$ mysqlfabric group health myclusterCommand :{ success = Truereturn= {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}activities=}
[vagrant@node2~]$sudoservicemysqldstart Startingmysqld: [ OK ] [vagrant@store~]$mysqlfabricgrouphealthmycluster Command:{success =True return ={'e245ec83-d889-11e3-86df-0800274fb806':{'status':'SECONDARY','is_alive':True,'threads':{}},'e826d4ab-d889-11e3-86df-0800274fb806':{'status':'FAULTY','is_alive':True,'threads':{}},'edf2c45b-d889-11e3-86df-0800274fb806':{'status':'PRIMARY','is_alive':True,'threads':{}}} activities = } |
So the server is back online, but Fabric still considers it faulty. To add the server back into rotation, we need to look at the server commands:
[vagrant@store ~]$ mysqlfabric help serverCommands available in group 'server' are:server set_weight uuid weight[--synchronous]server lookup_uuid addressserver set_mode uuid mode[--synchronous]server set_status uuid status[--update_only] [--synchronous]
[vagrant@store~]$mysqlfabrichelpserver Commandsavailableingroup'server'are: serverset_weightuuidweight [--synchronous] serverlookup_uuidaddress serverset_modeuuidmode [--synchronous] serverset_statusuuidstatus [--update_only][--synchronous] |
The specific command we need isset_status, and in order to add the server back to the group, we need to change it’s status twice: first toSPAREand then back toSECONDARY. You can see what happens if we try to set it toSECONDARYdirectly:
[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARYProcedure :{ uuid= 9a6f2273-d206-4fa8-80fb-6bce1e5262c8,finished= True,success = False,return= ServerError: Cannot change server's (e826d4ab-d889-11e3-86df-0800274fb806) status from (FAULTY) to (SECONDARY).,activities=}
[vagrant@store~]$mysqlfabricserverset_statuse826d4ab-d889-11e3-86df-0800274fb806SECONDARY Procedure:{uuid =9a6f2273-d206-4fa8-80fb-6bce1e5262c8, finished =True, success =False, return =ServerError:Cannotchangeserver's(e826d4ab-d889-11e3-86df-0800274fb806)statusfrom(FAULTY)to(SECONDARY)., activities = } |
So let’s try it the right way:
[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SPAREProcedure :{ uuid= c3a1c244-ea8f-4270-93ed-3f9dfbe879ea,finished= True,success = True,return= True,activities=}[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARYProcedure :{ uuid= 556f59ec-5556-4225-93c9-b9b29b577061,finished= True,success = True,return= True,activities=}
[vagrant@store~]$mysqlfabricserverset_statuse826d4ab-d889-11e3-86df-0800274fb806SPARE Procedure:{uuid =c3a1c244-ea8f-4270-93ed-3f9dfbe879ea, finished =True, success =True, return =True, activities = }[vagrant@store~]$mysqlfabricserverset_statuse826d4ab-d889-11e3-86df-0800274fb806SECONDARY Procedure:{uuid =556f59ec-5556-4225-93c9-b9b29b577061, finished =True, success =True, return =True, activities = } |
And check the group’s health again:
[vagrant@store ~]$ mysqlfabric group health myclusterCommand :{ success = Truereturn= {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}}activities=}
[vagrant@store~]$mysqlfabricgrouphealthmycluster Command:{success =True return ={'e245ec83-d889-11e3-86df-0800274fb806':{'status':'SECONDARY','is_alive':True,'threads':{}},'e826d4ab-d889-11e3-86df-0800274fb806':{'status':'SECONDARY','is_alive':True,'threads':{}},'edf2c45b-d889-11e3-86df-0800274fb806':{'status':'PRIMARY','is_alive':True,'threads':{}}} activities = } |
In our next post, when we discuss how to use the Fabric aware connectors, we’ll also test other failure scenarios like hard VM shutdown and network errors, but for now, let’s try the same thing but on thePRIMARYnode instead:
[vagrant@node3 ~]$ sudo service mysqld stopStopping mysqld: [OK]
[vagrant@node3~]$sudoservicemysqldstop Stoppingmysqld: [ OK ] |
And let’s check the servers again:
[vagrant@store ~]$ mysqlfabric group lookup_servers myclusterCommand :{ success = Truereturn= [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'PRIMARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node2'}, {'status': 'FAULTY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}]activities=}
[vagrant@store~]$mysqlfabricgrouplookup_serversmycluster Command:{success =True return =[{'status':'SECONDARY','server_uuid':'e245ec83-d889-11e3-86df-0800274fb806','mode':'READ_ONLY','weight':1.0,'address':'node1'},{'status':'PRIMARY','server_uuid':'e826d4ab-d889-11e3-86df-0800274fb806','mode':'READ_WRITE','weight':1.0,'address':'node2'},{'status':'FAULTY','server_uuid':'edf2c45b-d889-11e3-86df-0800274fb806','mode':'READ_WRITE','weight':1.0,'address':'node3'}] activities = } |
We can see that MySQL Fabric successfully marked node3 asFAULTY, and promoted node2 toPRIMARYto resolve this. Once we start mysqld again on node3, we can add it back asSECONDARYusing the same process of setting it’s status toSPAREfirst, as we did for node2 above.
Remember that unlessfailover_interval is greater than 0, MySQL Fabric will detect problems in an active group, but it won’t take any automatic action. We think it’s a good thing that the value for this variable in the documentation is 0, so that automatic failover is not enabled by default (if people follow the manual, of course), as even in mature HA solutions likePacemaker, automatic failover is something that’s tricky to get right. But even without this, we believe the main benefit of using MySQL Fabric for promotion is that it takes care of reconfiguring replication for you, which should reduce the risk for error in this process, specially once the project becomes GA.
In this post we’ve presented a basic replication setup managed by MySQL Fabric and reviewed a couple of failure scenarios, but many questions are left unanswered, among them:
We’ll try to answer these and other questions in our next post. If you have some questions of your own, please leave them in the comments section and we’ll address them in the next or other posts, depending on the topic.