When the Bill Comes Due, GenAI is Pricey
As early users learn how GenAI providers are pricing their products, they seek ways to optimize, manage costs and perhaps switch out to a new provider; Latitude did that
By John P. Desmond, Editor, AI in Business

The bills coming due from a consumption-based generative AI model are bound to be daunting for companies trying the tech, but it’s difficult to know in this early stage of experience with generative AI pricing. The amount of bills coming due might be unpredictable.
That was the drift from a recent account in ZDNet based on reporting from the Dreamforce event put on this week in San Francisco by Salesforce. "We're in the early stages, so all companies are wrangling with these issues," stated Gavin Barfield, VP and CTO for Salesforce Asean solutions, making a comparison to the early days of cloud service pricing.
"As the market and products mature, these things will get ironed out," he stated. Salesforce is examining a range of pricing models and for now has chosen a credits-based system for some services, with the amount of credits consumed depending on how the AI model is called to run the query.
The adoption of generative AI tools within an organization can grow organically and result in management halving little awareness of the rate of consumption, stated Tim Dillon, founder and director of Tech Research Asia. His research suggests that some 40 percent of organizations in the Asia-Pacific Japan region have informal policies around the tools, while 60 percent do have formal policies.
Good Practice to Monitor Consumption
Jan Morgenthal, chief digital officer of M1,a telecommunications company based in Singapore, told ZDNet it is critical for companies to monitor their consumption of generative AI. M1 is using several AI tools from various vendors, including Salesforce, to monitor consumption. His organization attempts to put a dollar value on a search, in an attempt to manage the number of queries made.
Having a dollar value for the ROI of a query, for instance, will enable him to manage the number of gen AI queries that should be made. The automation may not be worth the cost in many instances, such as for the cost of acquiring the needed data, he suggested.
Among the new gen AI offerings Salesforce unveiled at the event is Einstein Copilot, a conversational AI assistant said capable of being integrated with any Salesforce application. Responses are generated via Salesforce Data Cloud, previously called Genie, that can pull from customer data, telemetry data, even Slack conversations, to create a view of the customer.
Data Cloud currently processes 30 trillion transactions per month and connects 100 billion records daily, according to Salesforce. The data engine is now integrated with the Einstein 1 Platform, which enables businesses to apply AI, automation, and analytics to every customer experience. Lucky customers! It even enables Einstein Copilot to provide options for additional actions, such as a recommended action plan after a sales call.
GenAI Model Example Pricing for Prompts and Completion
An explanation of gen AI pricing recently published by Acceleration Economy, a Scottsdale, Arizona-based advisory and event services firm run by practitioners, emphasized the importance of tokens as the basis of pricing by computational consumption.
Tokens in this context are basic units of text or code the LLMs use to process and generate language, stated the author, Toni Witt, technology analyst and entrepreneur. These can be individual characters, words or parts of a sentence. A rule of thumb is that 1,000 tokens is about 750 words in English. A prompt is the text instruction given to a model, and completion is the response of the model.
In a pricing example from Anthropic, a competitor of OpenAI, a prompt for the Claude Instant family of models is priced at $1.631 per million tokens; and completion is priced at $5.51 per million tokens. A prompt for the company’s Claude-v1 model is priced at $11.021 per million tokens, and completion is priced at $32.58 per million tokens. The author recommends trying a prompt in a test environment to see which model delivers best value.
“You will notice that Anthropic charges per Prompt and per Completion. This means for every interaction with an LLM, you will be charged for the length of the input you give, as well as the length of the output,” Witt stated. “This double charge is very common across LLM providers, and it’s worth noting that the Prompt charge is always less than the Completion charge.”
In an example from Microsoft’s OpenAI Services for a fine-tuned model, pricing is by hours deployed and by tokens. With a price per token of $0.002, 120 hours deployed, at a price per hour of $0.24, total charges are $69.80. The pricing is similar to that of other Azure services; many services are offered, including sentiment analysis. Pricing is in chunks of 1,000 characters, model training by the hour, or, for services such as computer vision, pricing is by the “transaction,” similar to an API call.
In a third example from LLM provider Cohere, which recently announced a partnership with Oracle, pricing is $15 for one million tokens, higher than that of Open AI’s ChatGPT, gpt-35-turbo, which costs about $2 for one million tokens. “However, Cohere has more offerings around enterprise security, flexibility and privacy that might justify this cost,” Witt stated.
Use of Cost Management Tools Recommended
To minimize costs, Witt advised: use cost management tools, such as Microsoft Cost Management; choose the right model for the work, which will require some trial and error; reduce the prompt length; limit the maximum response; consolidate prompts; consider using prompt management or token cost-tracking software.
Generative AI cost management tools are emerging. One AI, based in Tel Aviv, Israel, is focused on managing the costs of ChatGPT. Founder and CMO Yochai Levi stated in a recent post on the company’s website, “The costs of using ChatGPT can add up quickly, especially for businesses with a high volume of requests. One AI offers a cost-effective solution by optimizing those costs.” The company offers tools for preparing efficient prompts for ChatGPT, and for creating generative AI models customized “to fit a business’s exact needs.”
Latitude Startup Switched from ChatGPT to AI21Labs
The CEO of AI startup Latitude went up the learning curve on the cost of ChatGPT when the company’s AI Dungeon game powered by generative AI started getting popular. The more people played the game, the bigger the bill Latitude had to pay to OpenAI, according to a recent account from CNBC.
Adding to the bill, the company’s marketing team began using ChatGPT to help generate promotional copy. At its peak in 2021, Latitude CEO Nick Walton estimated that the company was spending nearly $200,000 per month with Open AI and Amazon Web Services to keep up with the millions of user queries it was processing each day.
“We spent hundreds of thousands of dollars a month on AI, and we are not a big startup, so it was a very massive cost,” Walton stated. By the end of 2021, Latitude switched from OpenAI’s ChatGPT to software offered by startup AI21 Labs, which resulted in bills under $100,000 per month, he said.
Software suppliers are investing heavily in their generative AI offerings. Financial analysts estimate that Microsoft’s Bing AI chatbot, powered by OpenAI’s ChatGPT models, has required an investment of $4 billion in infrastructure to service responses to Bing users.
Read the source articles and information from ZDNet, Acceleration Economy and CNBC.
(Write to the editor here; tell him what you want to read about in AI in Business.)
Click on the image to buy me a cup of coffee and support the production of the AI in Business newsletter. Thank you!