GPT-2 Text Generation

What is it?

GPT-2 is a Generative Pre-Training machine learning model created by OpenAI. The basic purpose of it is to predict what word comes next after a prompt of some seed text. The model was trained on over 40 GB of Internet text. That is an enormous amount of data. Being text only without any images means a lot more text to be used. Estimations on the Internet give approximately 680,000 pages of text per GB. So the 40 GB of text GPT-2 was trained on equates to roughly 27.2 million pages of text!

Originally OpenAI was worried about releasing the AI models publicly because they feared it could be used to auto-generate copious amounts of fake news and spam etc. Since then they have generously released all their models (even the largest with 1.5 billion neural network parameters) for anyone to experiment with.

If you want to use GPT-2 outside Visions of Chaos you can download the code at their GitHub here.

Visions of Chaos front end GUI for GPT-2

I have wrapped all the GPT-2 text generation behind a simple GUI dialog now in Visions of Chaos. As long as you have all the pre-requisite programs and libraries installed. See my TensorFlow Tutorial for steps needed to get this and other machine learning systems working in Visions of Chaos.

You give the model a sentence and after a minute it spits out what it thinks the continued text should be after that prompt. Each time you run the model you get a new unique result.

There is an option for which model to use as on my 2080 Super with 8GB VRAM it cannot handle the largest 1.5 billion parameter model without getting out of memory errors. The 774 million parameter model works fine.

Some example results

What does AI need to do to get rid of us

GPT-2 Text Generation

A nightmare

GPT-2 Text Generation

The future for the human race

GPT-2 Text Generation

How to be happy

GPT-2 Text Generation

These early test results are really interesting. At first I thought the model was just assembling sentences of text it found online, but if you take random chunks of the generated text and do a Google search (in quotes so it searches for the complete sentence) you get no results. The model is really assembling these mostly grammatically correct sentences and paragraphs by itself.

It can be accurate in answering “what is” questions, but then again it can spit out grammatically correct nonsense, so don’t take anything it says as truth.

More to come

A future use I want to use GPT-2 for is a basic chat bot you can talk with. OpenAI’s MuseNet is very promising for generating music and gives much better results than my previous best LSTM results.

OpenAI have also since released GPT-3 with limited access. I hope they also release the model to the general public like they did GPT-2. There are some very impressive results I have seen using GPT-3. GPT-3’s largest model is 175 billion parameters, compared to 1.5 billion for GPT-2. Although if my 8GB GPU cannot handle the 1.5 billion GPT-2 model it will have no hope of using the 175 billion parameter model.