<< Click to Display Table of Contents >> Navigation: User Interface > Project Properties > Watchdog |
A watchdog is either integrated in hardware or a separate software process that is used to monitor if an application is alive (e.g. did not crash or does not hang).
The monitored application needs to "feed" the watchdog within a configured timeout.
When using a hardware watchdog, typically only one application can be monitored. If that application does not feed the watchdog within the configured timeout, a hard system reset is performed.
When using a software watchdog, it can be implemented to support multiple applications at once and it can be configured what the reaction to a timed-out application should be.
On our devices we provide a mixed approach to the possibilities described above. A so called watchdog daemon is running on the system. It uses the hardware watchdog functionality to make sure that a system reset is performed if itself would crash.
Other applications can subscribe to that watchdog daemon with a configurable timeout and what to do if that application times out (e.g. restart the application or restart the device).
Like this it is possible to monitor the state of multiple applications which can prevent a (partially) not working system that could only be escaped from manually (e.g. switch off/on supply voltage or switch off ignition until the device shuts down (which can take a long time if a long low-power or sleep timeout was configured).
With enabled watchdog support it can be made sure that a hanging or crashed application is automatically restarted to make sure that the end user does not need to manually restart the system and minimize possible down times.
If there is a need for a custom application to subscribe to the watchdog daemon please contact support for more information.
"Watchdog Support" can be enabled in the project properties. A timeout can be configured with which the watchdog daemon will determine if an application hangs or has crashed. Additionally it can be configured if the application shall just be restarted when a timeout happens or if the whole system shall be restarted.
Things to consider when enabling watchdog support:
•All functionality of the project needs to be tested and validated with the configured watchdog settings before shipping
•The application will tell the watchdog daemon that it is alive in its main loop. If the project has functions that will block the main loop for longer than the configured timeout, the watchdog will consider the application to be dead and will execute the configured timeout action. Especially check:
▪Entering complex pages (with many objects and/or containing video / multimedia objects)
▪Complex JavaScripts. Be careful with loops and take care of blocking functions like writeToFile, writeEEPROM, readEEPROM, copyFileOrDirectory, moveFileOrDirectory, syncFilesystem, runSystemCommand, RS232Handler.writeFromBuffer, RS232Handler.readIntoBuffer
▪For testing, only enable the "Restart Application" action and check if everything works as expected. It is fairly easy to break a "restart loop" (e.g. if the application blocks for too long in a repeat script) like this. If "Restart Device" is configured, a restart loop can only be broken by re-installing PClient (or if the timeout is long enough to log into the device and stop the watchdogDaemon). To break a restart loop, the watchdogDaemon can be stopped with "killall watchdogDaemon" on console. It might be necessary to restart the downloader application (via "/opt/etc/init.d/04_ud4 restart") so that a fixed project can be installed.
•Enable Watchdog Support
•Switch on/off watchdog support for this project
•Timeout
•min: 1s, max: 30s
•Timeout in seconds within which the application has to "feed" the watchdog. If the main loop is blocked for longer than this timeout, the action configured below will be executed
•On Timeout
•Configure what shall happen when timeout occurs
oRestart Application: Application will be killed (if it hangs) and then started again
oRestart Device: A reboot will be executed
The PClient will send a heartbeat to the watchdog every timeout / 4 seconds. Depending on the concrete situation that can mean that the PClient is restarted a bit before the set timeout, so be sure to test your system accordingly and adjust the timeout time if necessary.