The shock and awe strategy of the VC funded AI companies seems to be working. Their goal is to make sure that everyone believes:
1. This is the only way to build AI.
2. No one else can build AI because they have all the compute and data.
They are wrong on both counts.
The shock and awe did work. To some extent. But once open source takes over, and we start chipping away one block at a time and solve the problems in a much more efficient way, we will see why they wasted so much of compute and data to build these systems.
Thanks to Meta open sourcing Llama2, we could see that building a GPT4 level open source model is possible. All it needed was compute. And since Llama2 weights were open sourced, we don’t need to train again.
OpenAI and others have shown one path of AI:
1. Use cheap labor under horrible conditions to collect and clean data. (https://time.com/6247678/openai-chatgpt-kenya-workers/)
2. Brute force with lots of GPU compute.
There are many other ways. Better ways. There are ways in which AI can be built by the people and for the people.
We have already shown that we can collect data in a fun way during the Chandamama kathalu datathon with participation from students.
Captchas showed you can collect training data by solving a problem.
We have also collected speech data in a unique fun way last month which we will unveil at the end of the month. With almost zero effort while having fun we were able to collect 1.5 million voice samples. It's possible.
Couple that with research in cluster computing, and new algorithms, we will solve the compute problem as well. I give it till next year. No one will be in shock and awe by next year.