Its the year of NLP. @BigScienceW released a new model for zero shot task generalization with big claims.
So immediately I wanted to try and test it out. Huggingface had already provided this model in their model repository and all I had to do was try out different prompts.
I first tried out the default prompts showing up like,
“How many hydrogen atoms are in a water molecule?”
The model replied with 2.
“How many oxygen atoms are in a water molecule?”
2.
Well, we now know T0pp is not a chemist.
No biggie. After all, this is a language model.
So after having some fun with it, I wanted to get serious. I opened up my OpenAI playground and tried out different prompts I had saved to see how well T0pp performs with respect to GPT3 in real world scenarios.
First I checked to see if this model can fix some grammar mistakes. I tried with prompts like
“Can you fix the grammar mistake in the following sentence. I are going home.”
Ans:I am going home.
Good. The model was pretty good at fixing obvious grammar errors.
Next I tried summarization. Here the model was not great. It would mostly spit out the first 2 lines.
Example:
“T0* is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.
How would you rephrase that in a few words?”
Ans: DeepMind has released a new language model called T0*, which it says is "the first general purpose deep learning model for natural language understanding".
I then asked the model to generate some emojis. Looks like the model cannot do that.
“The Incredible Hulk. Can you represent that in emoji?”
Ans: ““
What about Tweet sentiment classification. Following are some examples I tried. looks like the prompt framing is important. It gives much better answers when I frame it like “Is the Tweet positive?”
Examples:
I can't wait for Halloween!!! What is the sentiment of this tweet?
Negative
I can't wait for Halloween!!! Is this tweet positive?
Yes
This is the link to the article. Is this tweet positive?
No
This is the link to the article. What is the sentiment of this tweet?
Negative
Also seems to work well for review sentiment classification
Customer gave the following review. Alexa is very good but came with defective smart bulb..., took very much time for setting and now Alexa is replying 'device is not compatible'. Now the smart bulb is working as a normal bulb of Rs. 25/-
What is the general sentiment?
Ans: Negative
Slot filling also seems to work, though I had to try different prompts.
Examples:
I want to go to Mumbai on 26th October and come back on 30th. I am traveling from Hyderabad. What is the source city?
Ans: Mumbai(WRONG)
I want to go to Mumbai on 26th October and come back on 30th. I am traveling from Hyderabad. What is the from city?
Ans: Hyderabad(CORRECT)
I want to go to Mumbai on 26th October and come back on 30th. I am traveling from Hyderabad. What is the to city?
Ans: Mumbai
I want to go to Mumbai on 26th October and come back on 30th. I am traveling from Hyderabad. When is the travel date?
Ans: 26th October
I want to go to Mumbai on 26th October and come back on 30th. I am traveling from Hyderabad. When is the return date?
Ans: 30th October
Finally I tried a lot of inventive generative stuff, the one GPT3 is really good at. In most cases T0pp performed really well in these scenarios.
Examples:
A home milkshake maker. Can you generate a product name for that?
Ans: The Shake Maker
The name of the startup is Hubspot. We provide marketing, sales and service software to help you grow. Can you generate a tagline for that?
Ans: Grow with Hubspot.
Can you generate an icon for Idea?
Ans: a scribbled note
What is the metaphor for brave?
Ans: a Lion
Observations:
T0pp seems to go after the generic intent more. GPT3 is really good at finding the exact intent better. For example, T0pp gives the same answer to the prompt generate a product name or generate a funky product name. GPT3 changes the answer because it identifies that funky is also an important word in the context.
T0pp is pretty good for single word or single sentence responses. But it is not meant for long essays or paragraphs which GPT3 is really good at. For example, T0pp cannot generate recipes.
Can I use T0pp in my products? I think I have to try a little bit more. This might work in some very specific cases. Looks like this is a good candidate for question answering, though I have not yet tried it fully. Will update once I try that.
There seems to be some scope in being able to use this model in customer service. I still have to figure out how to productize this though.
For example:
Where is my order. What is the intent in that?
Ans: I am not satisfied with the order