We do not remember days, we remember moments and events.
Events are triggers, bad or good, from a well-defined monitored environment (known problems) or unknown problems, the layer we are yet not aware it exists.
Both ways, we need to place those meaningful triggers into the perspective of complete Business level and IT environment. We want to smell the smoke before IT is on fire.
Let us first ask ourselves – what does the event need to be noticed?
Valid and trusted source of information. Sense the environment. In traditional monitoring solutions, systems are under a magnification glass that have well defined but limited scope of information that are generating event.
You know the environment, know it by heart, which are the most critical segments and what is the appropriate care those systems need. That being said, you defined a monitoring framework of meaningful parameters that will trigger an event in a case of known issue.
All of us want to think outside the framework – but what is outside that box when you know it by heart?
It will be always out there, the unknown, embrace it, analyze it and strip the veil so it becomes the known. You need Operations Analytics to do hard work for you as long as you feed hungry machine.
Place to settle. You get hundreds of events per hour (when lucky) but only one can be a lifesaver. A well-defined monitoring framework, you put in place, needs multiple dimensions to dwell in, each dimension will enrich the event and like that it shall present itself in Operations Bridge dashboard.
Places to go and adopt. This is not the end of its journey, just the first leap. Now the real power comes to place if we use it as a springboard. We want to close the loop, from the registration to problem remediation using human force as little as possible. It is because we already implemented human wisdom into our monitoring framework! The process is Closed Loop Incident Process (CLIP). Every incident requires a proper action. In the best case, the action will resolve an issue that end users will not be aware of. In worst case incident is assigned to specific team that can establish a functional state before it escalates as a problem to the end users.
It is a spinning journey, on which we need to make sure that the monitoring framework is evolving side-by-side the environment under the loop.