Can I build a model like ChatGPT on a company budget?

Not from scratch — a frontier model costs hundreds of millions to over a billion dollars and is limited to mega-labs and nations. But you can build 'your own AI' that competes in your domain by tailoring an open model to your data, starting from tens of thousands of riyals.

What is the cheapest path to start?

Building on an open-weight model (such as DeepSeek or Qwen) and tailoring it with fine-tuning and RAG on your data. It starts from tens of thousands of riyals for a specialized assistant and rises with data volume, infrastructure, and self-hosting.

Does a tailored open model really compete?

Yes, but on the right battlefield: it won't beat ChatGPT at everything, but it beats it in your domain, your Arabic language, your own data, your privacy, and your cost — which is usually what actually matters for your business.

Exactly how much does a frontier model from scratch cost?

According to Epoch AI, the cost roughly doubles 2.4x per year, and the largest training runs are projected to exceed a billion dollars by 2027 — before R&D salaries (up to half the cost), infrastructure, and energy. In practice: hundreds of millions to billions, which is why it is limited to a few of the most well-funded organizations.

Feasibility Study: How to Build an AI to Rival Claude and ChatGPT, and What It Costs

Feasibility Study: Can You Build an AI to Rival Claude and ChatGPT? And What Does It Cost?

The honest answer from Origami's experts: building a frontier model from scratch at the scale of Claude or ChatGPT costs hundreds of millions to over a billion dollars, and is realistically within reach only of the mega-labs and nation-states. But the word "rival" has three very different cost meanings — and for an ordinary company, the smart path costs a tiny fraction of that. This study breaks down the three paths and the cost of each, honestly.

What does "rival Claude and ChatGPT" actually mean?

Before talking cost, you must define the ambition, because the gap between the paths is measured in thousands of times:

A frontier model from scratch: you build a new "brain" that rivals the world's latest models at everything.
A sovereign or specialized model: you build a smaller model, or continue-train an open model, to serve a language, a sector, or a country.
Building on an open model: you take a ready open model and tailor it to your data so it wins in your own domain.

Path one: a frontier model from scratch — an astronomical cost

This is the hardest and most expensive path. According to Epoch AI, which specializes in tracking training costs, the cost of training frontier models roughly doubles 2.4x per year, and the largest training runs are projected to exceed one billion dollars by 2027 — putting them "out of reach for all but the most well-funded organizations." The cost splits between hardware (47–67%), R&D staff salaries (29–49%), and energy (2–6%).

In practice that means tens of thousands of GPUs (an H100 costs about $25,000–$40,000, a B200 about $30,000–$50,000) forming a cluster worth hundreds of millions, plus top-tier researchers earning over a million dollars each, enormous datasets, and years of work. The result: not an option for a company, but for labs like OpenAI, Anthropic, Google, and DeepSeek, or major national programs.

Path two: a sovereign or specialized model — a large strategic investment

Here you don't compete globally at everything; you build a smaller model (or deeply continue-train an open one) to excel in a language or sector. Saudi Arabia is a clear example through its sovereign-AI push, such as the company HUMAIN and SDAIA's "ALLaM" model. This path costs between millions and tens of millions of dollars, needs a specialized research team, a compute cluster, clean data, and 6 to 18 months. It fits governments and large enterprises with a strategic objective.

Path three: building on an open model — the smartest for 99% of companies

This is where the practical answer lies. You take a strong open-weight model (DeepSeek V4, Qwen, Llama, or GLM) and tailor it to your data via fine-tuning and Retrieval-Augmented Generation (RAG). You don't beat ChatGPT at everything — you beat it in your domain, your language, and your data, which is what matters for your business. The cost here drops from "astronomical" to between tens of thousands and a few million riyals depending on ambition and infrastructure, in weeks to months. A bonus: you can self-host it so your data stays with you — important under the Personal Data Protection Law (PDPL).

The three paths compared

Option	What it is	Estimated cost	Time	Who it fits
Frontier model from scratch	A Claude/GPT-scale model from the ground up	Hundreds of millions to over a billion dollars	Years	Mega-labs and nations
Sovereign or specialized model	A smaller model, or deep continue-training of an open one	Millions to tens of millions of dollars	6–18 months	Governments and large enterprises
Building on an open model	Tailoring an open model to your data (fine-tuning + RAG)	Tens of thousands to a few million riyals	Weeks to months	Most companies

You rarely need to build a new engine; usually you need to build the right car for your road around an existing one.

The open-source option in detail: which model, and how to build on it?

Since building on an open model is the right path for most companies, here is the practical detail: which models to choose, how to build on them, and what each approach costs.

The leading open models in 2026:

Model	Maker	Known for
DeepSeek V4	DeepSeek	The strongest open-source model for agentic coding, and the cheapest
Qwen3-Coder	Alibaba	Code-specialized and multilingual
GLM-5.2	Z.ai	Long-horizon coding and a 1M-token context
Llama 4	Meta	A huge tooling ecosystem and broad community support
Kimi K2	Moonshot	Long context and strong general performance

How do you build on them? Four approaches, in rising cost and effort:

1) Direct use + prompt engineering: run the model as-is (via a cloud API or self-hosted) and steer it with smart instructions, no training at all. The cheapest and fastest — days to weeks, from thousands to tens of thousands of riyals.
2) Retrieval-Augmented Generation (RAG): connect the model to your knowledge base and documents so it answers from your own data without training it — ideal for a knowledge assistant that knows your products and policies. Weeks, tens of thousands of riyals.
3) Fine-tuning / LoRA: train the model on examples from your domain and style so it masters your specific task (tone, classification, output format). Weeks to months, tens to hundreds of thousands of riyals.
4) Continued pretraining: feed the model large amounts of your domain or language data to deepen its knowledge fundamentally — the most powerful and most expensive, approaching the "sovereign path." Months, hundreds of thousands and up.

Self-hosting and privacy: the big advantage of open models is that you run them on your own infrastructure (a private cloud or local servers), so your data never leaves your organization — decisive under the Personal Data Protection Law (PDPL). You need GPUs sized to the model (from a single card for small models to a cluster for large ones), and you balance fixed hosting cost against pay-as-you-go cloud API cost.

Origami's experts' recommendation

Don't try to out-train the mega-labs at their own game — that is a battle of billions. The smart move is to build a specialized solution on an open model that wins in your domain, your Arabic language, your data privacy, and your cost. This path is achievable, high-ROI, and gives you "your own AI" without the billion-dollar bill.

How Origami helps

At Origami we study feasibility first, then choose the right path and model for your goal and budget, tailor the model to your data (fine-tuning + RAG), self-host it for full privacy, and connect it to your systems and agents via MCP. The goal is an AI that competes in your domain, at a realistic, well-studied cost.

Sources

Epoch AI — the cost of training frontier models: epoch.ai
Saudi Data and AI Authority (SDAIA): sdaia.gov.sa

Hardware cost figures are market estimates for NVIDIA GPUs at publication time and are subject to change.

Feasibility Study: How to Build an AI to Rival Claude and ChatGPT, and What It Costs

Feasibility Study: Can You Build an AI to Rival Claude and ChatGPT? And What Does It Cost?

What does "rival Claude and ChatGPT" actually mean?

Path one: a frontier model from scratch — an astronomical cost

Path two: a sovereign or specialized model — a large strategic investment

Path three: building on an open model — the smartest for 99% of companies

The three paths compared

The open-source option in detail: which model, and how to build on it?

Origami's experts' recommendation

How Origami helps

Sources

Frequently Asked Questions

Rate this article

Related Articles

Weekly newsletter

Looking for a software solution for your business?

One session. Twenty minutes. No commitments.