Application Status Monitoring
What is Application Status Monitoring?
The Application Status Monitor monitors the status of your application. If the application reported failure reaches a threshold the virtual instance which the application resides is restarted. The application, assuming it has a startup script, will be restarted as well. The monitoring is done by calling a watchdog script that you write which checks your application state and returns a status of '0' (application is running), or a non zero value (application has failed).
What do I need to know about this watchdog script that I must write?
The script must be set to executable and reside in the opt/app_status_monitor/watchdogs/ directory of your build source. The script must be given the name W<##><name>[APPLICATIONEXTENSIONPLATFORM:.ext] . The <##> represents the startup order of 01 to 99. The <name> is whatever you want to call your script and the [APPLICATIONEXTENSIONPLATFORM:.ext] is the optional suffix.
What is the default monitoring interval and recovery threshold ?
The default monitoring interval is 12. Since each interval is equivalent to 5 seconds, this would mean that the watch dog script would be called every 60 seconds (12 * 5 = 60 seconds). The default recovery threshold is 5. Therefore if five times in a row the system receives a non zero or no value each time it checks the application status, it will then request a restart of the virtual instance in which the application resides. If a zero value is received, then the threshold count starts over.
Is the monitoring interval and recovery threshold configurable via a file?
Yes they are. Valid values are 1 to 99. You can configure these attributes by creating a file called 'config' and placing it in the opt/app_status_monitor/config directory of your build source directory. Below is an example of how to set the application to be monitored every 30 seconds, (6 * 5 seconds = 30), and cause the virtual instance which contains your application to be restarted if a failure occurs 3 consecutive times.
monitor_interval 6
recovery_threshold 3
Is the monitoring interval and recovery threshold configurable via the CLI?
Yes they are. Valid values are 1 to 99. Below is an example of how to set the application to be monitored every 30 seconds, (6 * 5 seconds = 30), and cause the virtual instance which contains your application to be restarted if a failure occurs 3 consecutive times.
AXP_SM> app-service myapp
AXP_SM (myapp)> config t
AXP_SM (myapp)> status-monitor monitor_interval 6 recovery_threshold 3.
AXP_SM (myapp)> end
AXP_SM (myapp)> copy run start
What package must my application depend upon to use this utility?
No package dependency is required.
Do I need to configure the ISR or AXP Module for this API to work?
No additional configuration is required.
Please provide a source code example of the watch dog script.
#The test script assumes that your application or startup script creates
#a <app name>.pid file in the /var/run directory. It is further assumed
#that the PID file is created when the application starts, is populated
#with the application's pid, and the file is destroyed when the
#application terminates.
#!/bin/bash
APP=test.sh
APPNAME_NO_EXT=test
PID_FILE=/var/run/${APPNAME_NO_EXT}.pid
if [ ! -e $PID_FILE ]; then
exit 1;
fi
PID_FROM_FILE=`cat ${PID_FILE}`
for x in `ps -ef|grep $APP |awk '{print $2}'`
do
if [ $x == "${PID_FROM_FILE}" ]; then
exit 0
else
exit 1
fi
done