Improve triaging and reduce log management spends

Banner

What is SnappyFlow?

SnappyFlow is a comprehensive monitoring and log management solution addressing the needs of today’s cloud-native applications. This blog highlights SnappyFlow’s signature analysis feature that users will find extremely useful to improve their troubleshooting effectiveness, reduce noise and reduce log storage costs.

Monitoring

  • Infrastructure monitoring
  • Application monitoring
  • Kubernetes monitoring
  • Cloud services monitoring

Log Management

  • Out-of-box standard parsers
  • Search & Analytics
  • Feature extraction at ingest
  • Signature based filtering
  • Signature based filtering

APM

  • Trace services, transactions, spans
  • Multi-service analysis
  • Asynchronous analysis
  • Anomalous span analysis
  • Jaeger integration

Dashboards

  • Powerful dashboard builder
  • Pre-built dashboards & auto recommendation
  • Rich correlation within application context

Alerts

  • Pre-built alert library with auto-recommendation
  • Auto-thresholding
  • Integration with multiple notification systems
  • Noise reduction constructs

Easy On-boarding

  • sfAgent, sfPoller, sfPod
  • Single agent for Metrics, Logs, Tracing
  • Simple discovery & configuration
  • Multi-cloud support

What is the problem we are trying to solve?

Logs provide valuable insights about an application. They are useful to troubleshoot issues happening with the application, track user access, understand usage of application’s features, track load patterns etc. Consequently, log analysis and log management solutions have become a “must-have” in a SRE’s tool repertoire, more so, with the growing complexity of cloud-native stacks that the SRE is needing to manage.

There are several good log management solutions in the market but most have two pronounced drawbacks:

  • Triaging issues is not easy. Users have to find a trail of logs amongst a vast deluge of logs and cutting through the noise is not easy and has a direct impact on resolution time
  • As the size of deployments, number of deployments and load grow, the volume of logs exponentially grows. Obviously, the cost explodes as well

What are Log Signatures in SnappyFlow?

Log Signature is a unique feature in SnappyFlow that is used to reduce noisy logs, improving triaging as well as reduce log storage costs.

Signature is a string pattern present in the log, which uniquely identifies it. String patterns used to define signatures can contain variables $w or $i . The variable $w represents a word consisting of alphanumeric characters and $i a decimal number.

For example, the signature, ‘’missed heartbeat from $w@’’ would uniquely identify these logs

"missed heartbeat from provision@stage-apm-sfapm-apm-celery-provision-5579fbffc9-st9dm"
"missed heartbeat from notify@stage-apm-sfapm-apm-celery-notify-78b46bd6cc-85shb"
"missed heartbeat from default@stage-apm-sfapm-apm-celery-default-6c44857687-4zx2t"

Signature Group is a grouping of multiple signatures that are related to a problem or a workflow.

Users can perform the following operations related to Log Signatures in SnappyFlow:

  • Add or delete a signature
  • Group multiple signatures into a group
  • Get volume statistics og logs based on a signature or a group
  • Hide or Unhide logs belonging to a signature or a group
  • Show only logs belonging to a signature or a group
  • Stop or Restart collection of logs to primary store that belong to the signature or a group
  • Stop or Restart collection of logs to archive that belong to a signature or a group

SnappyFlow’s overall Signature Analysis flow is described below

Log Forwarders

So how does Signatures help users?

A large proportion of logs are of very little or no value. Many of these just add to volume and cost. We have seen situations where 80-95% of logs may belong to this category. Users live with them because it is not easy to selectively turn them off at the source.

With SnappyFlow, users can turn on or off the collection of the log with a single click. In the example below, with just 2 clicks we are able to turn off 2 logs that are taking 40% of storage space, logs that have very little value. If the user does indeed want to retain the log for a future purpose, the user can continue to store the log in the archive with 10-40x compression and search the archive as needed.

Log Compression

  • When a critical issue occurs and SRE is racing to troubleshoot the issue, SRE has to first wade through a ton of noisy logs to get to the few logs of interest. The experience can range from irritating to frustrating depending on the situation at hand

    With SnappyFlow, if user finds a noisy log and wants to mask it out, all he the user has to do is to “hide” a log or set of logs and they will be removed from the log view

  • Depending on the problem that a user is troubleshooting, the user’s field of interest is a finite set of logs. These logs of interest vary based on the problem. User would ideally like to see the trail of these logs of interest, i.e., when, where & how many, and easily mask-out everything else

    This is not possible in most log management solutions and the workflow in these solutions is fairly cumbersome. Users typically filter logs based on log levels, instance and file, after which they search for individual logs or scroll through logs to find what they are looking for. This is a time-consuming process with a big impact on resolution time

    Suppose user is debugging an OOM issue and is interested in a set of 8-10 logs to understand the behavior of the application, user can group these logs into a group called “OOM” and only show logs that belong to “OOM” group. Overtime, user can create multiple such groups that correspond to playbooks of specific issues

email

Get in touch

Or fill the form below, we will get back!

14

Is SnappyFlow right for you ?

logo
Subscribe to our newsletter