Why it's not totally hopeless to build a ChatGPT clone
"It's totally hopeless"
These three words have put the entire Indian startup ecosystem in general and the AI startup ecosystem in particular on notice.
Unless you have been living under a tree by now you know that Rajan Anandan asked Sama for advice on how Indian startups with 3 engineers and around $10 million can create foundational models and compete with Chat GPT and Sama responded that it was hopeless but you can try.Later on, Sama clarified that the hopeless part was meant for the $10 million part of the question.
Enough people have tweeted their outrage and more people have tweeted their outrage on the outrage. But whatever the outcome, one thing is for sure, a whole industry has been galvanized into action. In this article I will try to make some points to convince you that Sama is wrong. I will make some historical references, share some technical details and point to some efforts which have already started.
Historical examples:
"It's hopeless". These are the words which will spur great people into action. For example, the Indian Space Research Organisation (ISRO) faced immense skepticism and doubts when it announced its intention to send a mission to Mars. Many people around the world believed that such a mission would be prohibitively expensive, especially for a developing country like India. They considered it an audacious and nearly impossible goal, often expressing skepticism and labeling it as a hopeless endeavor. If some had asked Nasa if it was possible to do a Mars mission in less than $100 million, they would have said, "It's hopeless".
Do you know the cost of the Mars Orbiter Mission?
It was estimated to be around $75 million.
Do you know the cost of NASA's Mars Atmosphere and Volatile Evolution (MAVEN) mission?
Approximately $700 million.
So Indian scientists were able to achieve what Nasa achieved with 10 times lesser cost.
So if we are able to build rockets for 10 times lesser cost, I am sure we will be able to build a ChatGPT. After all, it's not rocket science :)
Let's now look at an example closer to our topic.
The first time OpenAI made waves was not with ChatGPT. It might seem like a long while ago, but just a year back OpenAI released Dall-E2 which took the generative AI world by storm. You could write text and the model would generate images as if by magic. If someone had asked at that time if anyone can compete with OpenAI on generating images they would have said, "It's hopeless".
But exactly 3 months later, we had a new model, Stable Diffusion. It was as good and sometimes better than Dall-E2. And after Stable Diffusion was open sourced, it overtook OpenAI by leaps and bounds. Today, if someone has to generate images from text they don't go to OpenAI. They go to a whole host of startups like Stability, Midjourney etc which are built on top of the open source models. OpenAI has already lost the game in generative AI for images.
Lets now put our focus on ChatGPT.
First of all, should we build another ChatGPT? Does it make business sense? The jury is still out on that. Technically what OpenAI has been able to achieve has been nothing short of a miracle. But it’s been more than 6 months since ChatGPT launched and around 3 years since OpenAI launched their APIs. The number of business use cases that have come up are few and far between(code completion and marketing helpers come to mind). I think this is one of the main reasons why the VC community in India has not shown too much interest in foundational models so far.
That said, I believe it is important to have multiple parties(businesses/governments) to invest and build multiple foundational models rather than leave the fate of foundational models in the hands of a few large corporations. We may not have viable business cases right now, but foundation models will form the foundation for AI enabled systems in the future. So people need to invest in the future. As Wolfram mentions, the biggest takeaway from ChatGPT is that it has proven that by just predicting the next word, AI systems can learn language. No one could have predicted this. But OpenAI put the money and the effort to train transformer systems on humongous amounts of data(the whole of the Internet almost) and luckily for them, it turns out that this is enough for machines to learn language.
But here’s the thing. Once you know something is possible, it’s only a matter of time before others will replicate the success. Never in history has there been a scenario where someone has invented something and no one else was able to replicate it.
Sama knows this. Sama also knows he does not have a moat. That’s the main reason for his world tour. And that’s the reason he is visiting political capitals instead of tech capitals of the countries. So that he can try to convince the politicians to regulate AI and make sure that no one else builds what they have built. And to convince the politicians he has taken the route of AI safety. The claim that AI(especially current AI) will achieve AGI and wipe off humanity is so preposterous that I am not gonna spend any more time writing about it.
Let’s instead look at the technical pieces needed to build something like ChatGPT and see if it really is hopeless to build something like it.
OpenAI has built it. So we know that it is possible.
OpenAI used the transformer model to build it. So the secret sauce is not the AI architecture. This is known to everyone.
OpenAI has a curated dataset and they have trained their systems with human feedback(RLHF). This is their supposed moat.
OpenAI has financed the training on thousands of GPU hours of training. This is the cost factor and most probably the reason Sama said others cannot compete with $10 million.
For points 3 and 4 we have open source to the rescue. And open source warriors have found an unlikely ally in this, Meta. Meta has spent millions of dollars in collecting data and training on GPUs and has released their model, LLaMA. The release of LLaMA has thrown open the door for open source innovation. Just in the last couple of months we have seen the following being developed:
Alpaca is a large language model that was trained on a massive dataset of text and code. It is available for free to use and modify.
LoRa is a technique for fine-tuning LLMs that reduces the number of parameters that need to be updated. This can make fine-tuning LLMs faster and more efficient, while still maintaining good performance.
Llama is a library that makes it easier to use LLMs. It provides a number of features, such as text generation, question answering, and text summarization.
Llama.cpp is a C++ implementation of Llama. It makes it possible to use LLMs on devices with limited resources, such as mobile phones and laptops.
These projects are all open source, which means that they are freely available to use and modify. This makes it possible for researchers and developers to experiment with different LLMs and techniques, and to contribute their own improvements to the community.
As a result of these efforts, the field of LLMs is rapidly advancing. New and improved LLMs are being developed all the time, and these models are being used to solve a wide range of problems, such as language translation, question answering, and text summarization.
If this can be achieved in just a couple of months, think of what is possible in a few more months.
Additionally we have investors and companies coming forward to help in building an “India LLM”.
Emad from Stability.ai has promised an Indian foundation model that outperforms ChatGPT on benchmarks.
Tech leaders think we can achieve this, https://restofworld.org/2023/tech-mahindra-india-ai-openai/. And remember, Infosys was one of the early funders of OpenAI. If they can fund OpenAI, then they can certainly fund an Indian startup which can do something similar.
All of this is assuming that using transformers and deep learning is the only way to build something like ChatGPT. We have been working on an alternative to deep learning. We already have some good results involving sentence embeddings which can lead to building a system like ChatGPT without using deep learning. The simple insight that predicting the next word leads to an AI learning language is enough to start experiments in multiple areas. After all, now it is a search problem. Given a billion sentences, if I give you 3-4 words(or even 100-200 words) can you tell me the most probable next word. Do you need deep learning to solve this or can we solve it in any other mathematical way? I am sure people will find multiple different ways to solve this problem in the coming future.
So, we have the architecture, datasets(thanks to sites like Huggingface), open source models and new techniques. And we have the money. Talent has never been a problem in India. All the ingredients to build a ChatGPT.
Still think it’s hopeless?