Instaclustr by NetApp is committed to providing you with the most accurate and actionable data on your clusters’ performance. We have recently discovered and fixed a bug that resulted in lower than actual CPU utilization being reported by Instaclustr managed application nodes.
In August of last year, Instaclustr deployed changes that introduced significantly more in-depth CPU metrics for our support staff. This allows them to investigate bottlenecks in single threaded applications. Unfortunately, this introduced a regression that resulted in the underreporting of system CPU time as we no longer included System in our total calculation. The discrepancy between actual and reported total CPU usage is particularly noticeable on instances with high system CPU usage.
Upon the discovery of the regression, we immediately identified the root cause and began deploying a fix to managed instances on July 15, 2024. This correction means CPU utilization metrics now include system CPU time in the total CPU calculations, providing accurate representation of your CPU usage.
What This Means for You
There are no actions required by customers at this time. You may have noticed an apparent increase in CPU usage reported for your clusters around the time the fix was released. We would like to assure our customers that this is not a real increase in usage but rather a correction to reflect the true values. The metrics shown will now present a comprehensive overview of CPU utilization, ensuring you have the correct information for capacity planning and performance analysis.
As a result of this fix, historical total CPU usage data after August 2023 and before mid-July 2024 may appear lower than they should as they did not include System CPU usage. All historical data relating to the detailed CPU utilization types is correct (including user, system, steal, irq, and nice). Instaclustr is confident that this error has not caused any impact to application performance or alerting. However, we understand that total CPU can be a useful indicator for evaluating cluster performance and available overhead.
Please don’t hesitate to reach out to our Support team if you are concerned about the impact that this change has on your cluster, or any future scaling plans, and we can assist you in working out the best path forward. We apologize for any inconvenience caused and are taking steps to prevent similar issues from occurring in the future. Thank you for your understanding and continued support.
Please contact our Support team if you have any further queries or concerns related to this.