Experiences in Optimizing Cloud Costs for AI
As AI is added to the application mix, cloud services bills are growing, leading to efforts to manage the costs; new-generation tools are helping
By John P. Desmond, Editor, AI in Business

Increased reliance on cloud services, especially as more AI is added to the application mix for many enterprises, translates to growing bills for cloud services. Companies are exploring the use of automated tools to help manage the costs of cloud computing.
For example, at Phlexglobal, a provider of automation and AI services for clinical and regulatory functions of pharmaceutical companies, cloud engineers have been exploring ways to achieve greater efficiencies. “AI and machine learning are a big part of our technical strategy,” stated Alex Potter-Dixon, VP, cloud engineering and operations at Phlexglobal, in a case study published on the site of Cast AI. Cast offers optimization tools for workloads of Kubernetes containers that Phlexglobal uses to deploy and manage its software systems.
The company has developed a secure and scalable cloud infrastructure. “Our clients often operate during regular business hours, so we need the capability to meet their demands during the day and scale down outside of business hours,” stated Potter-Dixon.
Phlexglobal uses the Azure Kubernetes Services from Microsoft, which it has found beneficial for quickly “spinning up” new instances of Kubernetes needed by clients. “I want it all to be automated,” Potter-Dixon stated. “I don’t want to deal with it on a day-to-day basis.”
Understanding cloud costs is important to any effort to manage them. An examination of cloud invoices can help track costs of cloud resources. “However, cloud billing and invoicing can be complex and challenging, especially if you have multiple cloud providers and different pricing models,” stated Samir Ranjan Bhol, a delivery partner with Kyndryl, a multinational IT infrastructure service provider headquartered in New York City, in an account published on LinkedIn.
Among best practices in cloud cost control, he recommends:
Choosing the right pricing model for your company, whether fixed, variable, tiered or usage-based;
Automate billing and invoicing;
Provide detailed and transparent bills to customers;
Monitor invoice and billing data and seek feedback to improve.
On pricing models, Bhol stated, “Each model has its own advantages and disadvantages, and you should weigh them carefully before deciding. You should also communicate your pricing model clearly to your customers, and explain how it affects their bills and invoices.”
Also, “Establish clear cost allocation by tagging cloud resources accurately to identify usage by teams or projects. Implement robust budgeting, setting spending limits and alerts. Automate usage tracking and reporting to gain real-time insights into expenses.”
To reduce costs for predictable workloads, “leverage reserved instances of saving plans,” and “Embrace pay-as-you-go models and rightsize instances to match workload requirements,” he suggests. To create cost transparency among teams, he suggests implementing chargeback mechanisms, followed by reviews to optimize usage and remove unused resources. “By incorporating these practices, organizations can effectively manage cloud costs,” Bhol stated.
Harry Mylonas, an AWS specialist based in the Netherlands, added this to Bohl’s’ LinkedIn post: “A dimension missed in the article is internal customers, internal billing/charge-back: Transform your public cloud provider bill to business meaningful billing, using cost allocation tags, insights, third-party internal data sources, so that the internal bill focuses on projects … with the option to drill down to the actual building blocks. Not a task for the faint at heart, when considering shared cloud services, but one that has the maximum business benefits.”
Cloud Billing Mechanics from FinOps Foundation
The FinOps Foundation is a nonprofit project of the Linux Foundation that offers training and certification, including sessions on cloud cost forecasting and managing cloud cost anomalies. A module on Cloud Billing Mechanics offers to bring participants up to speed on how organizations incur charges for using cloud services, reconcile charges with cloud usage, pay suppliers, and use billing data to optimize the business value of cloud.
From the training, participants will learn: the key elements of the billing relationship between customer and supplier; why an understanding of cloud billing mechanics is important for FinOps; how cloud billing differs from traditional IT billing; and best practices for managing and using cloud billing data.
How UK Mortgage Provider finova Manages Cloud Costs
Finova, a leading mortgage provider in the UK, has 200 financial institutions using its platform, encompassing 60 lenders and 3,000 brokers. The company has committed to Microsoft Azure as its primary cloud provider. At any given time, finova’s development team is running hundreds of on-demand Azure virtual machines (VMs), on-demand computing resources listed as Infrastructure as a Service in Azure.
Finova initially relied on Azure’s pay-as-you-go cost model for VMs, but as the company grew, senior management became more concerned about the increasing cloud costs and sought new ways to keep track of spending and resource requirements, according to a case study on the site of Spot by NetApp..
Richard Marsh, finova’s director of operations, led an initiative to assess the sprawling cloud environments and reduce the number of VMs, compute use hours, and developers’ unsupervised ability to instantiate and scale up more servers. The initiative reduced finova’s VM count to just over 100, down from 1,200. Then finova focused on minimizing the cost of these VMs, but it required a shift in the company’s cloud computing consumption model, a shift that finova needed a trusted partner to help navigate.
Crayon, a UK-based Microsoft Cloud Solutions Provider, assists clients with plans to optimize and manage their IT complex through the entire lifecycle. Fortunately, finova had a long-standing relationship with them. Crayon helped search for a solution and recommended the Elastigroup cloud infrastructure automation service from Spot by NetApp, according to the case study.
Crayon suggested that finova deploy Elastigroup into their development and test environment first, given that this environment was one of finova’s main sources of compute spend. Designed to support a single-VM use case, Elastigroup’s AI/ML-driven automation enables stateful workloads to run on Azure spot VMs with the reliability of pay-as-you-go VMs. This results in substantial savings compared to the pay-as-you-go option finova previously used.
“Regardless of whether you are a developer, tester, or engineer, having consistent environments and stable compute levels are important for us, and we achieved that with Elastigroup,” stated Marsh. The result has been a savings of close to 70 percent of what had been the company’s annualized Azure compute spend for their development and test environment.
Through the Elastigroup console, developers can see how many VMs they are using versus the capacity they need. “Developers want as much compute power as they can have,” Marsh stated. “Elastigroup allows us to offer developers more compute power at less cost for their projects. This makes developers very happy and allows us to test a wider range of scenarios during the QA phase.”
Expect more cloud management services on the market in coming months.
Read the source articles and information on the site of Cast AI, on LinkedIn. and at Cloud Billing Mechanics from the FinOps Foundation.