SnappyFlow helps run a
massive big data stack at peak performance

Banner

Many unanswered questions

  • Why is an app running slow? Is it due to cluster or data?
  • What was the reason for erratic performance issues like node failures or high latency?
  • Why were the apps crashing?
  • Why won’t an app run on one cluster but not on another?
  • Why is the Cluster not performing as per expected SLA?
  • How to optimize and manage capacity utilization?
Questions

And a nightmare for troubleshooters

Where to look and what to look for?

  • Information was spread across multiple Big Data platform components
  • Multiple independent dashboards for Oozie, Yarn, Map Reduce, Spar, Name Node Stats, Yarn Stats and Linux metrics

Server issue? Yes, but which Server?

  • Were the servers healthy when running jobs?
  • Which servers to isolate? Was there any unusual activity?

More logs but less joy

  • Too many verbose logs
  • No information on relevancy or correlation

How SnappyFlow helped

Data Ingestion
Ingest 5-10 TB of data per day with a 1-year retentionPlugin suite for Linux, Name Node, Resource Manager, Yarn, Oozie Service, Hive, Hadoop LogsCost effective data managementcompared to alternatives
Key Analysis

Run Comparisons

Compare the same workflow across its multiple runs, trends. Data size vs runtime CPU Efficiency

Workflow /Cluster performance correlation

How did the Hadoop services and Hadoop nodes affect the workflow performance

Analyze performance of a specific workflow action across runs

how did a workflow action perform across runs

Workflow Gantt Chart

Illustrates the progress of workflow actions and child jobs

Node Performance analysis

Which nodes were used to run the app, how did these nodes perform typically for the same workflow across the runs

Comparison of workflow with a baseline

Select a baseline workflow and compare it with other badly performing workflows

Map and Reduce Job analysis

Stragglers in map jobs, data spread across jobs, shuffle performance, gc performance

Bringing significant benefits

Lower resolution time

Order of magnitude reduction in resolution time through improved diagnostics

Improved capacity planning

Accurate assessment of capacity needs of the jobs to allow improved schedule as well as infrastructure planning

Lower CapEx need

Accurate assessment of capacity needs of the jobs to allow improved scheduling as well as better long term infrastructure planning

Diagnostic workflows

Run Comparisons

Compare the same workflow across its multiple runs, trends. Data size vs runtime CPU Efficiency

Workflow /Cluster performance correlation

How did the Hadoop services and Hadoop nodes affect the workflow performance

Analyze performance of a specific workflow action across runs

how did a workflow action perform across runs

Workflow Gantt Chart

Illustrates the progress of workflow actions and child jobs

Node Performance analysis

Which nodes were used to run the app, how did these nodes perform typically for the same workflow across the runs

Comparison of workflow with a baseline

Select a baseline workflow and compare it with other badly performing workflows

Map and Reduce Job analysis

Stragglers in map jobs, data spread across jobs, shuffle performance, gc performance

email

Get in touch

Or fill the form below, we will get back!

14

Is SnappyFlow right for you ?