Skip to main content

Raymii.org Raymii.org Logo

Quis custodiet ipsos custodes?
Home | About | All pages | Cluster Status | RSS Feed

Corosync Pacemaker - Execute script on failover

Published: 20-11-2013 | Author: Remy van Elst | Text only version of this article


❗ This post is over eleven years old. It may no longer be up to date. Opinions may have changed.

With Corosync/Pacemaker there is no easy way to simply run a script on failover. There are good reasons for this, but sometimes you want to do something simple. This tutorial describes how to change the Dummy OCF resource to execute a script on failover.

Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below. It means the world to me if you show your appreciation and you'll help pay the server costs:

GitHub Sponsorship

PCBWay referral link (You get $5, I get $20 after you've placed an order)

Digital Ocea referral link ($200 credit for 60 days. Spend $25 after your credit expires and I'll get $25!)

In this example it is a script which triggers a few SNMP traps, sends an alert to Nagios and sends some data to Graphite. SNMP alone could be done with the ocf:heartbeat:ClusterMon resource, but the other stuff not.

This is a very very simple way of doing it, I find it more a quick hack. For example, the script path is hard coded. For me that is not a problem because both the script as the Dummy resource are managed via Ansible, so I can change them any time.

Start by copying the Dummy resource over to a new resource. On Ubuntu the resource files are located here:

/usr/lib/ocf/resource.d/heartbeat/

In there, copy the Dummy file to a new resource, for example FailOverScript. If you don't have the Dummy resource, you can also find it here.

Edit the name and description:

Name:

meta_data() {
    cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="FailOverScript" version="0.9">
<version>1.0</version>

Description:

<longdesc lang="en">
Script ran on Failover
</longdesc>
<shortdesc lang="en">Script ran on Failover</shortdesc>

Make sure the script you want to execute is placed on the host, and is executable (chmod +x /usr/local/bin/script).

A bit lower in the file, edit the dummy_start function. Add the script path below the if [ $? = $OCF_SUCCESS ]; then and above the return $OCF_SUCCESS lines. Like so:

dummy_start() {
    dummy_monitor
    /usr/local/bin/failover.sh
    if [ $? =  $OCF_SUCCESS ]; then
    return $OCF_SUCCESS
    fi
    touch ${OCF_RESKEY_state}
}

After that has been done, replace all instances of Dummy and dummy with your name of choice:

sed -i 's/Dummy/FailOverScript' /usr/lib/ocf/resource.d/heartbeat/FailOverScript
sed -i 's/dummy/failoverscript' /usr/lib/ocf/resource.d/heartbeat/FailOverScript

Test the script using the ocf-tester program to see if you have any mistakes:

ocf-tester -n resourcename /usr/lib/ocf/resource.d/heartbeat/FailOverScript

Output:

Beginning tests for /usr/lib/ocf/resource.d/heartbeat/FailOverScript...
/usr/sbin/ocf-tester: 214: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/FailOverScript failed 1 tests

Oops. Seems we need xmllint. On Ubuntu, install it:

apt-get install libxml2-utils

Test again, you'll see it will pass:

Beginning tests for /usr/lib/ocf/resource.d/heartbeat/FailOverScript...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
/usr/lib/ocf/resource.d/heartbeat/FailOverScript passed all tests

As an extra test, to see if the script you've created is correctly executed, you can do a test start of the resource:

 export OCF_ROOT=/usr/lib/ocf
 bash -x /usr/lib/ocf/resource.d/heartbeat/FailOverScript start

To use this resource, add it like so:

crm configure primitive script ocf:heartbeat:FailOverScript op monitor interval="30"

If you want to test it, you can for example let the script send you an email. Put a node in standby and see if you get an email.

Tags: cluster , corosync , crm , heartbeat , high-availability , network , pacemaker , tutorials