What’s GPT-3, the Language Model Built by OpenAI, and What’s So Exciting About It?

23/07/2020

Photo: Jimmy Chan/Pexels.

In 2018, OpenAI – the AI company co-founded among others by Elon Musk – released GPT, an AI language model that could perform a variety of language tasks, like write letters and compose articles. Two years and one more iteration later, OpenAI has released the newest version of this model, called GPT-3. While the name is simple, GPT-3 could make your head turn with its reservoir of abilities.

GPT-3 can write essays, stories, blog posts, tweets, poems, press releases, business memos and technical manuals – and with better grammar than most of us. It can imitate the styles of different authors, compose music and even write code. It can answer questions requiring basic comprehension and translate languages.

OpenAI unveiled GPT-3 in a preprint paper uploaded to the arXiv preprint server in May. Last week, the company’s developers provided beta access to developers, allowing them to play around with the model. Soon after, the web was flooded with text samples generated by GPT-3, together with exclamations of surprise and delight. Here is a snippet from a news article written by GPT-3 – which so closely resembles something a human writer might have penned:

After two days of intense debate, the United Methodist Church has agreed to a historic split – one that is expected to end in the creation of a new denomination, one that will be “theologically and socially conservative,” according to The Washington Post. The majority of delegates attending the church’s annual General Conference in May voted to strengthen a ban on the ordination of LGBTQ clergy and to write new rules that will “discipline” clergy who officiate at same-sex weddings. But those who opposed these measures have a new plan: They say they will form a separate denomination by 2020, calling their church the Christian Methodist denomination.

As such, GPT-3 represents the most powerful language model built to date. Its purpose is simple: to consume a large volume of text, and then predict what word will come next. It achieves this feat using an artificial neural network, which is a logical architecture invented to help machines learn from data and make predictions.

The artificial neural network at the heart of GPT-3 contains 175 billion training parameters – over a hundred-times as many as GPT-2, released last year, to learn and predict. GPT-3 was trained on 45 TB of text sourced from all over the internet, including Wikipedia. Using this data, GPT-3 taught itself the statistical dependencies between different words, which were encoded as parameters in its neural network.

So given an input sequence of words, the neural network could predict the next word. If this seems like a mundane task, it’s probably because humans take for granted the wondrously complex neural architecture in our heads. In fact, by developing this ability, GPT-3 has innumerable applications.

GPT-3 is the latest instance of a long line of pre-trained models, like Google’s BERT, Facebook’s RoBERTa and Microsoft’s Turing NLG. Pre-trained models are large networks trained on massive datasets, usually without supervision. Taking pre-trained models and fine-tuning them to solve specific problems has become a popular trend in the field of natural-language processing.

If a model has already learned how to identify cats in images, it can quickly learn how to identify dogs. However, training the model from scratch to identify dogs will require far more images. Similarly, it is easier for developers to adapt GPT-3 for their purposes instead of developing custom models from scratch.

However, the GPT models do do one thing differently. Language models like BERT need to be fine-tuned before they can be used for downstream tasks. But GPT can perform a range of tasks out-of-the-box without any fine-tuning. This is enhanced by its ‘text in, text out’ API, which allows users to reprogram the model using simple instructions written in plain English.

“GPT-3 looks very promising. It will allow us to solve many natural language generation problems for our clients in accelerated fashion even with limited data,” Adwait Bhave, the CTO of AlgoAnalytics, an AI services company in Pune, said. “Instead of talking about training models, we are now talking more about tuning models for business problems.”

For example, you could ask GPT-3 for a website designed a certain way, in plain English, and it could give you the corresponding code.

Also read: AI Beat Humans at a Complex Game but Is It the Leap Bill Gates Says It Is?

OpenAI, GPT-3’s maker, is a non-profit foundation formerly backed by Musk, Reid Hoffman and Peter Thiel. The company plans to make GPT-3 commercially available to developers to further adapt it for custom purposes. Any task that involves taking a piece of text as input and providing another piece of text as output is potentially GPT-3 territory.

For example, “The legal language used in contracts is highly complex. Organisations can potentially use AI models like GPT-3 for simplifying contracts and making them more understandable,” Monish Darda, cofounder and CTO of ICERTIS, a company known for AI-infused contract management, said. “This will result in faster negotiation and execution of contracts, leading to massive reduction in time and increase in the speed of business.”

On the flip side, many jobs involving customer support, billing, pre-sales, proposal writing, report generation, etc. may thus become vulnerable to intelligent automation along the lines of GPT-3 in future.

This said, it is important to cut through the hype and address GPT-3’s limitations as well. Models like GPT-3 work with statistical patterns in word occurrences. However, they have no intelligence of their own. They can’t actually understand the meanings of the words they’re working with. And without such understanding, they are incapable of logical reasoning or moral judgement. For example, GPT-3 would be stumped by the following question: “If I put cheese into the fridge, will it melt?”

OpenAI delayed the publication of GPT-2 last year, calling it a ‘dangerous’ model for its potential to produce high-quality fake news. Since the models are trained on data collected from the web, they are prone to internalising the biases, prejudices and hate speech rampant online as well. When GPT-3 is deployed to generate text, it could recreate these biases. The OpenAI authors take cognisance of this in their new preprint paper as well. However, mitigating biases in the training data to develop a fully fair model is an exceedingly difficult task, if not just impossible.

Despite these limitations, GPT-3 is a significant achievement that pushes the boundaries of AI research in natural-language processing. OpenAI has demonstrated that, when it comes to AI, bigger is in fact better. GPT-3 uses the same architectural framework as GPT-2 but performs markedly better owing only to its size. This leads us to an important question: can the limitations of GPT-3 be overcome simply by throwing more data and computational horsepower at it?

As mentioned before, GPT-3 has 175 billion parameters. The average adult human brain has trillions of neural synapses. It took OpenAI one year to go from 1.5 billion parameters in GPT-2 to 175 billion in GPT-3. What happens when these models grow to sizes comparable to those of human brains?

Indeed, GPT-3’s astounding success brings an even larger philosophical question to the fore: Is human intelligence only quantitatively superior to AI, or are there qualitative differences? Is intelligence merely a function of computation? We don’t know yet. Geoffrey Hinton, a leading AI researcher often credited for popularising neural networks, said in 2013, “When you get to a trillion [parameters], you’re getting to something that’s got a chance of really understanding some stuff.”

Also read: Why We Must Unshackle AI From the Boundaries of Human Knowledge

GPT-3 doesn’t just memorise sequences of words to spit them out later. It synthesises them into an internal mathematical representation that can be used to answer questions. Think of it as a student who memorises the contents of a textbook, and writes answers in her own words later during an exam.

We are still far from AI that possesses general intelligence – i.e. with the ability to read a textbook, understand what it says and apply its lessons in new contexts, in new ways, much like a human might. This said, the versatility and generalisation exhibited by GPT-3 marks a significant step towards making that scenario real.

Viraj Kulkarni has a master’s degree in computer science from UC Berkeley and is currently pursuing a PhD in artificial intelligence.