Stay Alert!
July 12th, 2007 Charles Lee
IT Administrators have a love and hate relationship with alerts. They don’t want to miss any important outages or performance problems, because their jobs depend on it. However, those issues have a tendency to surface at 2am in the morning. Hyperic HQ may collect vast amounts of monitoring data from the IT infrastructure, but the most important function that it provides is timely alerts.
We commissioned a user experience study a couple of months ago, and one of our focus group contestants summed it up best, “Alerts are very important and indispensable.” We always understood the importance of alerting, but by better understanding how Hyperic HQ participates in the problem resolution process, we were able to identify some important new features and directions for alerting in the upcoming Hyperic HQ 3.1 release.
In a typical workflow, an alert initiates a process where the administrator has to assess the situation and fix the problem. After the alert is received, the administrator has to collect information. We had previously assumed that the administrator would go and perform the diagnosis using HQ’s UI. However, our focus group participants informed us that this is not how they function in real-life. In fact, up to 80% of alerts received are self-explanatory or repeated, and the administrators would already know what to do based on the information contained in the alert itself. Or perhaps an administrator does not have easy access to a browser because he or she received the alert on a pager or the servers in a datacenter do not have graphical environments. In any case, it’s only the other 20% of alerts, which require additional investigation and correlation in HQ’s UI to arrive at the root of the problem. In fact, the goal is to reduce the number of steps it takes from receiving an alert to the problem resolution. To this end, we are introducing some improvements in the area of alert notification and reduce the need for additional interaction with the UI.
First of all, we have added the last collected values for a resource’s indicator metrics to the email notification. This may help a user in quickly assessing if a failure is due to memory consumption, number of concurrent connections, or a myriad of other conditions that are exposed by the indicator metric values. It’s also important to note that this data represents the state of the system before the alert, rather than post-mortem data collection, which exposes the symptom and not the cause. Having the metric values in the notification means that the administrator can make decisions just from the alert notification itself.
Next, we added a log field to alert resolution (which marks an alert as Fixed in HQ). This seems like such a simple addition to alerting, but it means that an administrator can document the resolution to an incident and have the resolution appear in the email notification the next time the alert fires. It would offer a great starting point for dealing with the alert and allow the admin to try the previous solutions quickly. One participant said, “those types of things at 3 in the morning are extremely helpful.” Again, leading to a quicker problem resolution.
Lastly, we have often received requests to provide a centralized place in the application for users to view all of their alerts across their infrastructure. We are definitely responding to that need with the implementation of the Alert Center, which will list all of the alerts in the system in reverse chronological order, but also sortable by a number of other fields.
These changes are just some of the ideas that we have to improve the alerting feature and are available in HQ 3.1. We have some really great feedback through our user experience study, as well as requests from users, that we will continue to innovate around in the future. So please let us know how else the alert feature can be improved to help solve your infrastructure problems faster and more accurately.
Entry Filed under: HQ, HQ Beta, IT Industry













1 Comment Add your own
1. Dan Gorman | July 12th, 2007 at 10:39 am
Look forward to these features in 3.1 Looks like Hyperic is spot-on.
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed