AI Dungeon harnesses GPT-2 to generate novel experiences in the framework of a text-adventure game. Since the output of GPT-2 has no logical coherency beyond a few sentences, the play session quickly drifts in unpredictable directions. This is why the game is fun, but it also means there aren't clear goals, indications of progress, or win conditions.
Can we build a superstructure atop the output of GPT-2 to enable such features? In this post I will explore some of the terrain.
Let's start with something well-constrained. We'll use a block of text generated by GPT-2 to lay out the room of a roguelike game. It's a 2D space with setting and entities represented as ASCII characters. Our win condition is simple: if you can reach the exit tile, you win.
Now, how the hell do you build that?
Perhaps we could parse the GPT-2 output and extract the nouns. We detect the word "teacup" as a noun and imbue it with damage points. Of course the game has no idea that a teacup would be a terribly ineffective weapon.
How can we distinguish between nouns like "teacup" and "forest"? Clearly these have completely different purposes in-game. Teacup may be a weapon, but forest must be a setting.
Let's try an example. My handwritten prompt in is bold, while GPT-2 has generated the rest:
The dark forest is full of shadows and hate. Orcs patrol the narrow paths, eager for blood. Rusty weapons of past adventurers litter the ground. With a loud and vicious laugh, the elven rogue now casts a spell that hurls shards of broken chunks at the nearest enemy. It did not take long to find a scrap of bronze in the rubble. It could be all that was left.
Already we see GPT-2 taking things in a different direction.
We can pick out the nouns by hand, but if we're building a game we need to offload this task to the computer. Let's use the CMU Link Parser API to find the nouns.
We end up with this list:
Building our roguelike room from this list could go in all sorts of directions. In fact, there's no way we'd end up with anything like the original description (not that it was particularly coherent to begin with).
Now we need to categorize each noun. To keep things super simple, we'll only extract two kinds of nouns to build our roguelike room:
What’s our method for categorizing nouns? By hand? Surely not. There are tens of thousands of nouns in the English language, if not hundreds of thousands.
Luckily for us, linguistics has already mapped this territory. When we want to know a noun's category, we're looking for its hypernym (a noun can have several of these).
There's even an API! The WordNet database can provide us the hypernyms of a given noun.
Here's what we get:
As a quick aside, notice that:
How we use WordNet's hypernyms will be hugely influential on the resulting game. For example, do we discard the "feeling" hypernym outright, or would it be more interesting to consider such nouns as enemies, spawning a "hate" entity for the player to defeat?
We'll start by extracting two hypernyms, "person" and "artifact":
We have enough here to get the job done.
We'll use some number of Persons to populate the room with enemies. If you want add more complexity, you could make some of them neutral agents like merchants or mercenaries.
Artifacts will be scattered across the room. Each can be picked up by the player and placed in their inventory. Artifacts can be mapped to weapons, consumables, or wearables.
I'll stop here for now. There's a ton more to cover, and we could go in a few different directions:
I will explore these topics in future posts.
Comments? Reach me on Twitter.