SnappyFlow is a comprehensive monitoring and log management solution addressing the needs of today’s cloud-native applications. This blog highlights SnappyFlow’s signature analysis feature that users will find extremely useful to improve their troubleshooting effectiveness, reduce noise and reduce log storage costs.
Logs provide valuable insights about an application. They are useful to troubleshoot issues happening with the application, track user access, understand usage of application’s features, track load patterns etc. Consequently, log analysis and log management solutions have become a “must-have” in a SRE’s tool repertoire, more so, with the growing complexity of cloud-native stacks that the SRE is needing to manage.
There are several good log management solutions in the market but most have two pronounced drawbacks:
Log Signature is a unique feature in SnappyFlow that is used to reduce noisy logs, improving triaging as well as reduce log storage costs.
Signature is a string pattern present in the log, which uniquely identifies it. String patterns used to define signatures can contain variables $w or $i . The variable $w represents a word consisting of alphanumeric characters and $i a decimal number.
For example, the signature, ‘’missed heartbeat from $w@’’ would uniquely identify these logs
"missed heartbeat from provision@stage-apm-sfapm-apm-celery-provision-5579fbffc9-st9dm"
"missed heartbeat from notify@stage-apm-sfapm-apm-celery-notify-78b46bd6cc-85shb"
"missed heartbeat from default@stage-apm-sfapm-apm-celery-default-6c44857687-4zx2t"
Signature Group is a grouping of multiple signatures that are related to a problem or a workflow.
Users can perform the following operations related to Log Signatures in SnappyFlow:
A large proportion of logs are of very little or no value. Many of these just add to volume and cost. We have seen situations where 80-95% of logs may belong to this category. Users live with them because it is not easy to selectively turn them off at the source.
With SnappyFlow, users can turn on or off the collection of the log with a single click. In the example below, with just 2 clicks we are able to turn off 2 logs that are taking 40% of storage space, logs that have very little value. If the user does indeed want to retain the log for a future purpose, the user can continue to store the log in the archive with 10-40x compression and search the archive as needed.
When a critical issue occurs and SRE is racing to troubleshoot the issue, SRE has to first wade through a ton of noisy logs to get to the few logs of interest. The experience can range from irritating to frustrating depending on the situation at hand
With SnappyFlow, if user finds a noisy log and wants to mask it out, all he the user has to do is to “hide” a log or set of logs and they will be removed from the log view
Depending on the problem that a user is troubleshooting, the user’s field of interest is a finite set of logs. These logs of interest vary based on the problem. User would ideally like to see the trail of these logs of interest, i.e., when, where & how many, and easily mask-out everything else
This is not possible in most log management solutions and the workflow in these solutions is fairly cumbersome. Users typically filter logs based on log levels, instance and file, after which they search for individual logs or scroll through logs to find what they are looking for. This is a time-consuming process with a big impact on resolution time
Suppose user is debugging an OOM issue and is interested in a set of 8-10 logs to understand the behavior of the application, user can group these logs into a group called “OOM” and only show logs that belong to “OOM” group. Overtime, user can create multiple such groups that correspond to playbooks of specific issues