End of the consumer AI honeymoon

The trend

A day ago, Github announced that they are switching to usage based pricing for their Copilot product. This means that instead of paying a fixed monthly fee, users will now be charged based on how much they use the product.

Usage will be calculated based on token consumption, including input, output, and cached tokens, using the listed API rates for each model.

– Github’s blog: Github Copilot is moving to usage-based pricing

The community’s reaction has not been positive at all.

This move has been attributed to the underlying fact that running and offering generative AI products is expensive.

Moreso, the current genAI market feels like a huge bait and switch. Vendors are offering their products at very low prices to attract users, and then, once users are hooked, raise the prices for the same products without an increase in the value being offered.

What really influences generative AI cost?

Model size

The size and complexity of a model greatly impacts the pricing of compute and training. Larger models require more compute resources and data to train, which can significantly increase costs.

Take an example of Claude Opus 4.7 which has a 1 million context window, 128k token output and high resolution vision support of upto 2576 long edge pixes. When you compare it with Claude 2.0 from 2023 which has a 100k context window, 8k token output and no vision support, you can see how the cost of training and running the former would be significantly higher than the latter.

Vendors will offer pricing tiers basing on model size.

Large model = more expensive to train and run hence higher vendor pricings.

Smaller model = less expensive to train and run hence lower vendor pricings.

Use case

Different use cases require different methods and model sizes; thereby requiring different amounts of compute and data to train and run.

For example, suppose you want to build a model that can answer ONLY specific questions about a business’s internal data. In that case, you can get away with a smaller model and less compute than if you wanted to build a general-purpose chatbot that can answer any question about any topic.

Pre-training

Pre-training is the process of training a foundation model (usually on a large dataset) before fine-tuning it for specific tasks. The cost of pre-training can be significant, especially for large models.

Pre-training is prohibitive for enterprises since it requires a lot of compute, time and effort to do it right.

For example, OpenAI currently has over a million GPUs. With each data center GPU costing over $20,000, the cost is astronomical for smaller players.

If you are not going to pre-train a model, you can still use a foundation model that has already been pre-trained by a vendor.

A case in point is Cursor which uses the Kimi K2.5 as it’s foundation model for it’s coding assistant - Composer.

Tuning

Tuning is the process of fine-tuning a pre-trained model for specific tasks. For example, if you want to have a foundation model that can answer general questions, you can train it on specific knowledge bases to make it better at answering questions about specific topics. The cost of tuning can vary depending onthe size of the model, the complexity of the task, and the amount of compute resources required to fine-tune the model.

Inferencing

Inferencing refers to the process that a model uses to generate a response based on the input it receives. At the very basic level, inferencing is where the model figures out the kind of response that you need based on the input you give it basing on it’s knowledge.

The basic unit of inferencing is a token. A token can be a word, a part of a word, or even a character. The more tokens you use in your input and output, the more expensive the inferencing will be.

The cost of inferencing can vary depending on the size of the model, the complexity of the task, and the amount of compute resources required to generate a response.

What’s next?

Many users are now looking at competing products that offer fixed pricing, for example, Claude Code and Deepseek.

However, the reality is that the cost of running generative AI products is not going to go down anytime soon.

Realistically, we will see more price increases in the future as vendors try to recoup their costs and make a profit.

The best way is to go for models that can be run locally on your own hardware, or to use open source models that are available for free.