Rob Faraj, Co-Founder of Kubecost, an open-source tool delivering Kubernetes cost monitoring and management at scale, outlines the best practices for implementing Kubernetes cost monitoring.
Kubernetes overspend is an easy trap to fall into, given the simplicity of provisioning costly resources such as GPUs and through the use of tools like cluster autoscaling to provision resources programmatically.
Particularly as Kubernetes scales (and particularly in multi-tenant environments), small oversights and bugs can quickly result in larger-than-necessary bills. To get a sense of howbig, the recent and first-of-its-kind FinOps for Kubernetes Report put out by the Cloud-native Computing Foundation (CNCF) is worth reading.
On a more positive note, software engineers naturally reduce Kubernetes spending simply by having insights into the spend. The cultural shift of having people outside of finance caring about costs, is important. Sensible cost monitoring processes and accountability can rapidly yield meaningful savings. Making developers more cognizant of the resources they utilize boosts not just cost efficiency, but productivity and security as well.
Cost monitoring methods
Even small steps for software engineers toward monitoring Kubernetes costs have a swift and beneficial impact on budgets. With more robust showback or chargeback methods and enforcement mechanisms that emphasize team accountability, organizations can further optimize infrastructure and realize greater savings.
Let’s take a look at these cost monitoring methods:
- Limited cost monitoring – One or more centralized teams (such as DevOps or finance) manage Kubernetes costs by responding to spending once they receive the bill each month, addressing any issues contributing to unnecessary costs. This method is best suited to organizations with less advanced environments and two or fewer applications engineering teams. However, organizations with larger multi-tenant environments would likely find this method unsustainable.
- Showback – Under this model, a cross-organization accounting process accurately tracks the Kubernetes and cloud spending that each team or business unit is responsible for. This detailed cost breakdown is shared directly with the responsible teams to help them understand their spend, enabling more proactive resource management. This method (like chargebacks, which I’ll get to below) is more appropriate for larger organizations with three or more applications engineering teams and 20 or more engineers.
- Chargeback–The chargeback method goes beyond showbacks to actually require teams and business units to pay the Kubernetes and cloud costs they’re responsible for out of their budgets. To introduce chargebacks effectively, an organization must first achieve broad cultural buy-in to the importance of controlling these costs.
- Hybrid cost monitoring–With this method, teams only pay for resource usage that surpasses pre-set spending limitations. Alternatively, teams can be charged for certain specific resources only. Similar to chargebacks, this method also takes complete buy-in across an organization to succeed.
Best practices for implementing Kubernetes cost monitoring
1) Work up to a chargeback strategy
A holistic understanding of cloud costs and fair expectations can’t be built in a day. It’s not unusual for teams’ initial spending reviews to include a series of surprises. Going straight to a chargeback strategy would only foster resentment for team leaders, who need time to understand the costs they’re now more acutely responsible for (and how to control them). Starting out with a limited monitoring or showback method offers teams time to vet the fairness of the costs allocated to them and to introduce cost controls thoughtfully, before the bills begin arriving.
2) Demonstrate the fairness of cost allocations through transparency
By its nature as a distributed system, Kubernetes makes cost allocation complicated. Achieving fairness and buy-in across teams requires completely transparent and reproducible allocation models. Be sure to audit these findings as well, to verify that all costs assigned to teams are correct. Importantly, allocation data must also be actionable-enabling teams to clearly and directly address their sources of overspending.
To ensure fair and actionable data, look at the following criteria:
- Consider how idle resources are allocated (often this relates to whoever makes provisioning decisions at the cluster level).
- Look at allocation of system-wide or shared resources.
- Make sure that team or business unit resource allocations are carefully delineated; for example, allocating by namespace offers a clear-cut approach.
- Decide if allocations are based on resource requests or usage (using the maximum of requests and usage is recommended, but only if teams are able to control those settings).
- Lastly, decide how to fairly bill teams responsible for costly but non-recurring jobs, such as research projects.
3) Assign each resource a clear owner
Use an escalation approach (defined further in this paragraph) in conjunction with an admission controller to establish the owner of each resource. To implement an escalation approach, define the owner’s label at the deployment, namespace and cluster levels, enabling a clear escalation path in case issues occur. An Open Policy Agent or admission controller webhook can enforce these labels.
4) Regularly review spending data
Proactive data review (weekly is ideal, to avoid month-end surprises) enables teams to rapidly identify overspend and avoid expensive future waste. Set up automated alerts to warn teams about excessive or unusual resource usage that would result in cost overages if left unchecked.
Cultural implementation can be the hardest part
From a technical perspective, implementing cost controls for Kubernetes spending isn’t inherently challenging. But it also isn’t effective unless the organization’s culture trusts and supports those cost management methods. By ensuring the transparency, accuracy and fairness of allocated costs, and by providing the monitoring and tooling to enable teams to make those data-backed cost control decisions, that necessary culture will arrive.
Utilizing the methods detailed above, organizations can successfully prepare teams that are more mindful of Kubernetes spending and empowered to reduce costs where feasible (likely by 30% or more in many cases), while also achieving related productivity benefits.