How the Program Works
The program is designed for simultaneous monitoring of multiple hosts. The number of hosts is not limited in the Enterprise edition, but the number of hosts that can be monitored depends on the used hardware. Modern hardware allows monitoring 2,500+ hosts.
What is Ping Monitoring and How it Works
The program uses the ICMP protocol to monitor hosts over a network. Every host is monitored independently from other hosts to guarantee a high monitoring performance. The program sends ICMP echo requests to a monitored host and analyzes its echo replies. Echo requests, which are called pings, are sent on a regular basis at particular time intervals, so the program continuously interacts with the host and can detect the moment when it stops replying to ping requests.
When the program sends a ping echo request and gets a reply from the host, the ping is considered as passed, and the program saves its round-trip time. When there is no echo reply from the host, the ping is considered as failed. If pings fail, it means that there is no connection between the program (the Ping Monitor server that sends pings) and the monitored host.
What is the reason for ping failure? A ping can fail due to different reasons, for example, when the pinged host is turned off, when there is a network problem between the source and destination, or when there is a DNS problem and the host name cannot be resolved to the correct IP address. Usually, a ping fails when its round-trip time exceeds the configured timeout. It happens when the pinged host or the network infrastructure is overloaded and ping requests cannot be processed in time.
How the Host State is Detected
One of the main goals of Ping Monitor is helping you to detect problems with the monitored hosts automatically, so the program continuously monitors the hosts to detect their current state. The Up state means that the monitored host works and replies to ping requests. The Down state means that the monitored host is unreachable and it doesn't reply to ping requests.
How does the program detect the host states using pings? If pings pass successfully, it means that the host works, so it has the Up state. If pings fail, it means that the monitored host isn't reachable. A single failed ping usually doesn't indicate a problem, because pings can fail sometimes even for properly working hosts. However, if several pings fail in a row, it means that there is a problem and the host state should be changed to Down.
All the monitoring parameters used by the program are configurable and can be changed if required. Below you can see an example of a host monitoring sequence to understand how monitoring works and what parameters are used in the monitoring configuration.
The program continuously monitors a host by sending ping echo requests on a regular basis at a Regular Pings Interval (10 sec in the example). If a ping fails, the next ping request is sent after a State Check Interval (3 sec in the example). As you can see, the program uses different ping intervals when the host state is stable (either Up or Down) and when it is changing (from Up to Down or vice versa). Using different intervals allows you to tune monitoring according to your needs.
The program uses Up Check Attempts (1 ping in the example) and Down Check Attempts (3 pings in the example) to detect the state changes. For example, using the default settings the state changes from Up to Down when 3 pings fail in a row. Using the default settings, the host state changes from Down to Up when 1 ping has passed.
As you can see on the diagram, the program can send notifications when the host state changes. It is possible to configure the program to send e-mail notifications, show balloons in Windows Tray, play sounds or execute custom actions on state changes.
Changing and Tuning the Monitoring Settings
You can change the used configuration on the Monitor Settings page of the program preferences Pic 1. These settings are used by all monitored hosts. It is possible to override the common settings and specify individual settings for hosts and groups. To enter individual settings for a host/group, you need to open it for editing and switch to Monitor Settings.
Monitoring settings allow you to change ping intervals and to check attempts used to detect Up/Down states. These settings were explained above. You can also change the ping packet size, the ping timeout and the TTL (time to live). Note that depending on the used edition of the program and the initial configuration set at the first start of the program, the default monitoring settings may vary, but you can change them anytime if required.
What settings should you use in different cases? There is no universal answer to this question, because the optimal settings depend on the case, but it makes sense to follow these recommendations:
- Don't use too large and too small intervals. Small intervals increase monitoring workload, which may matter if you monitor hundreds or thousands of hosts. For example, if you reduce the interval by 2 times, the program will send twice as many pings during the same interval. Large intervals make monitoring less responsive because it requires more time to detect state changes. For example, it makes no sense to ping a host once per hour because a host state may change many times during an hour and these changes will not be detected.
- Avoid false positive reports by increasing Up/Down check attempts. If you set check attempts to just one ping, the host state can be changed on every ping. If the connection is unstable, you will constantly notice state changes and will not be able to react appropriately. By increasing the number of check attempts, one reduces the number of potential false-positive state changes; but at the same time, too large a number of check attempts makes monitoring less responsive. For example, if State Check Interval is set to 10 seconds and you set Down Check Attempts to 1000 pings, an outage will be detected 10.000 sec later at the earliest, i.e. over 2 hours after the onset of problems.
How to estimate the monitoring responsiveness, i.e. the time required for the monitoring system to detect a host state change? The program sends ping requests on a regular basis, so it immediately detects that pings start to fail, but it isn't reported as a host state change to avoid false positives until the number of failed pings detected in a row reaches the Down Check Attempts value. In this case, pings are sent with a State Check Interval. To evaluate the minimum time required to report the Down state after the first failed ping, you need to multiply State Check Interval by Down Check Attempts. To evaluate the minimum time required to report the Up state after the first passed ping, you need to multiply State Check Interval by Up Check Attempts.
What settings should I use if I need to detect state changes no sooner than in 2 minutes after the problem onset (in other words, outages shorter than 2 minutes shouldn't be detected)?
In this case, first we need to decide how often the host should be pinged in the transitional state. If it should be pinged every five seconds (State Check Interval should be set accordingly), how many pings do we need to send to detect a state change? Let's calculate: 120 sec. (2 min.) / 5 sec. = 24, so we need 24 pings to detect a host change with this interval and can enter this number as Up Check Attempts and Down Check Attempts.
Note that the interval and the number of attempts are inversely related, so as one value is increased, the other one should be decreased if you wish to maintain the same monitoring responsiveness. In the above example, if we set State Check Interval to 3 sec., how many pings are required to detect a state change? This number can be calculated as 120 sec. (2 min.) / 3 sec. = 40, thus we need 40 pings to detect a host change with this interval and need to set Up Check Attempts and Down Check Attempts to 40 pings.