Profiling is a powerful tool to get inside the mind of your application and understand the root cause of bottlenecks. Profiling can bring down troubleshooting time by an order of magnitude. However, the workflow for setting it up, operating it and analyzing the results is usually not trivial. Setting up heap profiling in a continuous mode can impact the performance of the application whereas setting it up in an on-demand mode is arduous. Users typically use multiple tools to trigger heap dumps, parse them and analyze them.
To address these issues, we’ve built CPU and Memory Profiling feature for Java right into SnappyFlow to provide a seamless monitoring experience for SRE and Performance engineers. It is easy to instrument profiling into an application, trigger profiling on-demand and analyze the profiles right then and there – while remaining in the context of the application and easily accessing metrics, logs and tracing in an integrated workflow
Memory Profiling is simply the process of analyzing the memory used by a JAVA process at a given point of time. To understand memory profiling better, let us look at how JAVA Virtual Machine (which runs the JAVA process) handles memory. A heap is where the JVM stores referenced objects as and when they are created, and the size of the heap grows (until the size reaches the predefined max heap size) and shrinks during runtime. A heap dump is a snapshot of the of the memory used by a JAVA process at a given point of time on the JAVA heap. This snapshot contains information on different objects and classes and individual memory usage at the time of triggering the heap dump. The heap dump can be triggered manually or can be automated when OutOfMemory exceptions occur or can be requested by a heap analysis tool on demand. An analysis of the heap dump helps developers pinpoint specific issues in the code such as large data structures, unused (but referenced) objects using memory etc.,
CPU Profiling provides a thread level CPU usage and helps identify
Memory management in JAVA is handled by the JVM garbage collector – a big reason for JAVA’s growth and popularity. While it is generally an efficient and automatic process, it is quite common for applications to suffer from crippling memory leaks and out of memory exceptions. An over reliance on the garbage collector and possibly poor handling of object references, misconfigurations of heap sizes are typical reasons for memory leaks.
OutofMemory occurs when the allocated heap size is not enough for all the referenced objects or if there are any errors in the code making some objects way too large. It is important to note that garbage collection frees up only unused objects and classes. It will not clear objects that are in use. This simply means, there will be no stopping an object from growing until it reaches the overall heap size and throwing a runtime error.
A very large heap size can seemingly negate OutOfMemory issues at the expense of high memory requirement at the infrastructure level but however, one doesn’t get a clear picture in terms of what is causing the error. JVM also allows fine tuning garbage collection depending on the application – number of parallel collection threads, parallel garbage collection for scavenges/full collections, old/new generation size ratio and eden/survivor space ratio. A higher heap size increases garbage collection execution time but decreases number of executions and a smaller heap size decreases execution time but increases number of executions.
During app development or testing phase, OutOfMemory exceptions occur quite often and even if they do, they can be identified and plugged, given the luxury of time. But in production, these OutOfMemory exceptions tend to occur after prolonged application run time and once they occur, the issue needs to be identified and plugged asap. In general, memory leaks are very gradual, and go unnoticed during dev/testing phase.
In an microservices architecture with multiple applications running in parallel, the overall performance is determined by the aggregate performance of every single application. Thus, it becomes important to drill down to an individual process level to troubleshoot performance issues.
There are many standalone tools such as VisualVM, JProfiler or Eclipse Memory Analzer for Heap Dump Analysis and Profiling. While these tools are powerful by their own right, there are some major shortcomings
In a typical SRE use case, the troubleshooting workflow starts with the APM tracing data to identify bottlenecks. Once a process is identified as slow or stuck, a heap dump analysis and profiling of the process helps us drill down. In such scenarios, ability to quickly shift between tracing / heap dump and profiling data can significantly improve troubleshooting times.