A hitchhikers guide to GPT-3. Part 1.
A hitchhikers guide to everything related to GPT-3 through experiments. What works, what doesn't.
I got access to GPT-3 a couple of days back, thanks Greg Brockman and team. I was excited as I had seen the demo of a JSX code creator. Since I did not have access to OpenAI API at that time, I went ahead and built a no GPT-3 HTML code creator. After that, I got the access to the API and I have had no sleep for the past two days, trying out everything under the sun. This post is a result of those experiments. There is a lot to cover, so lets get started.
What is GPT-3?
Google it. Since I have a lot to cover I don’t want to SEO stuff this page with details about GPT-3 and how it works.
Is it worth the hype?
Yes and No.
Final Analysis:
The API is a huge thing. While people have been trying to discuss the AI capabilities and whether we are closer to AGI etc etc, the simple fact is, as far as natural language APIs are concerned, this is the most powerful one out there. By far. HuggingFace, AllenAI, spacy etc have good APIs. But OpenAI’s APIs are much more powerful.
“The one API to rule them all” approach is ground breaking. It’s surprising how many things you can do with that one API.
The fact that OpenAI has scrapped most of the Internet(and I mean it’s everything) and added a natural language layer to it has suddenly made the world’s knowledge accessible as an API. I am sure if Google wanted to do it, it could. But I don’t think the money it can make with the API will be more than the Ad business so they don’t have the incentive. I bring up Google because through the course of my experiments that’s what struck me as amazing. Not the natural language generation or NLP cabilities(still a long way to go), but the general access to knowledge as an API and the design of the API(priming with prompts) is really good. Maybe OpenAI should call themselves OpenAPI :)
Implications:
My first thought goes to NLP startups. Startups that specialize in NLP like summarization, NLG, chat bots etc may face competition if OpenAI API does what it says its does. If you can build a summarization engine with a few prompts, a chat bot with a few examples, then where is the moat? Similarly for startups building on top of OpenAI API(have seen a lot of prototypes on Twitter recently asking for beta sign ups), again, where is the moat? People have suggested priming is an art etc. After playing around with the API, I can say that you can recreate any idea out there by tying out different priming examples in a day or two. But the good news is though OpenAI is good, it’s not good enough. Yet. So there is still hope for the NLP startups.
Experiments:
If you want to just look at the videos of my experiments you can click the tweet.
I will explain these and other experiments in detail below.
I did not do a lot of text generation experiments or actual AI experiments to find out if we are close enough to AGI as enough people have tried that. For text generation experiments you can check out tweets by Gwern and Quasimondo.
I just did a small experiment to verify for myself that we are nowhere close to AGI.
Experiment 1:
It’s very hard to design a clean experiment with GPT-3 as it has already been polluted with all the knowledge in the world :). So I tested it with something completely from my imagination. The following are the answers from GPT-3 playground.
My 6 year old son was able to answer these questions Maybe a 3-4 year old may not be able to answer the first one, but they can answer the second one.
Ok, so “Billangodi” trumped GPT-3. What about a common sport.
Well, now GPT-3 is in familiar domain :). Even though I didn’t mention anything about the bat hitting the ball, it gave the answer. This actually leads to the Eliza effect which I have seen a lot of other examples fall into.
So I am not really sure if we can call GPT-3 as a few shot learning system. Especially when we don’t know exactly what it has been trained on. For all we know, the model has already observed more examples than what you are providing :). At least that’s what I have seen in my later experiments on Text to HTML and Text to Javascript experiments.
With the AI piece out of the way, I wanted to stress test the API with uses cases. I will be posting one screenshot for each experiment. You can find the prompts for these experiments at GPT3-Prompts. For all these experiments I have tried multiple priming prompts.
Experiment 2:
Title: Text to SQL
Result:
Experimentation Methodology: I started off by feeding it with common single table queries picked up web sites. It got all of them right. Sometimes it prints some extra queries as part of its generation strategy. From those prompts I could sort of see where the training data for this set came from. I think OpenAI has scrapped w3schools.com(I may be wrong in this. It’s just a hunch).
Then I tried it with multi table queries. It couldn’t get most of them right. I tried priming different ways and fed it a lot more data. But it just couldn’t get the prompts right.
Analysis: You start of with wow! Then you say great. Then you feel ok. In my trials, it mostly gets the single table queries right. If you prime it with enough examples, it sometimes gets join queries right. But it still has a long long way to go and its too much work for priming examples for each database structure. I couldn’t use this in a production environment. And it’s not just about priming.
Use cases: I can see it being useful as a learning tool for teaching someone SQL. As a tool to get a quick sql template. Nothing more than that.
Experiment 3:
Title: Summarization
Result:
Experimentation Methodology:
I tried to get the API to summarize content behind a pay wall(hope https://the-ken.com/ people can forgive me) as I didnt want it to summarize something which it already knew. I tried multiple things with multiple priming examples and also preset settings. I chose the summary preset in the OpenAI playground dashboard and changed the priming prompt to hint at the type of article it was(I asked my historian friend for History articles etc). I also tried it without presets.
Analysis:
Mostly the results were “meh”. It’s a little better that other summarization tools. But I wouldn’t switch yet. Huggingface provides summarization as a function call. So I would still use that.
Experiment 3.1:
Title: Summarization take 2
Next I tried the API to generate Tweet headings from news articles. I chose the BBC twitter handle and picked up some tweets from them. Then copied the story and asked the API to generate a Tweet.
Results:
OPENAI Tweet: German police arrest ‘Rambo’ after manhunt in forest
OPENAI Tweet: Ruth Bader Ginsburg: Supreme Court justice ‘home and doing well’ after cancer treatment
Experimentation Methodology:
I just primed it with a couple of tweet headlines and it mostly got the tweeting essence.
Analysis:
It mostly got the tweets right. I think this is one area where OpenAI can replace a human Tweeter with the proper priming.
I think we have covered enough for part 1. In Part 2, I will cover the code generation capabilities of GPT-3, recipes and movies.