How Long Should a Prompt Be for Stable Diffusion?

The Highly Detailed, Intricate, UHD FAQ & Guide About Prompt Length

If you’ve looked at the prompts of other AI art enthusiasts online, you may have noticed some very lengthy walls of text and wondered…how long can a text prompt actually be? In this painfully detailed guide, I’ll tell you everything you wanted to know about prompt length and tokens in Stable Diffusion.

How Long Can a Prompt Be for Stable Diffusion?

A Stable Diffusion prompt should be approximately 320 characters or less to prevent it getting cut off at the end. This is because Stable Diffusion has a limit on 77 tokens that can be used in a single prompt. So anything beyond that limit will get ignored.

But, although I gave you an approximate number, that does not mean every 320 characters you write out will always equal exactly 77 tokens. That’s because a token are made up of groups of characters and can include a different number of them depending on what words or punctuation marks are used. I’ll explain why that is in the section titled “How Long is a Stable Diffusion Token?”.

If you are using Stable Diffusion on your own computer, then there are actually ways to bypass the token limit (I’ll show you that bypass later, as well).

How Long Ought a Prompt To Be for Stable Diffusion?

We covered what the objective limit is on your prompt’s length. But how long does a prompt really need to be? That question can’t be answered with a set number, because it depends on:

  • the type of image you want to generate,
  • how much stuff you want included in the image,
  • the level of detail that you want added, and
  • how specific your want the image to look

I would recommend that you go for the minimum number of words necessary to get the image you want. Don’t just stuff a bunch of keywords into your prompt on the first go. Start with a short prompt and add more details/keywords in if you can’t get what you’re after.

What is the Maximum Length of a Stable Diffusion Prompt?

A Stable Diffusion prompt has a maximum length of 77 tokens. The code used for tokenization instructs the AI algorithm to truncate anything beyond that. Truncate is just a fancy word for “cut off’ or “ignore”. So if you write a 100-token prompt, the last 23 tokens won’t even be considered when generating an image.

But remember that a token is not the same thing as a character. A token can include several characters or even a full word.

Token Max of 75 or 77?

A few articles online claim the limit is actually 75 tokens. Some of the web interfaces show it as 75 tokens as well. I don’t know where they got that number, but it’s not accurate. You can read the actual code for any model that you download and see what the limit is. In Stable Diffusion V1.4, the “max model length” is set to 77, not 75. This code is on line 21, in the following model directory:

stable-diffusion-v1-4/tokenizer/tokenizer_config.json

Okay. We’ve thrown around the word “token” several times now. But what does it actually mean?

What is a Stable Diffusion Token?

A Stable Diffusion token is simply a clump of characters that are grouped together and assigned a number. The AI algorithm then reads the number when embedding and knows what characters it represents. In many instances, a token represents a single word…but not always.

You see, those numbers (and the characters assigned to them) are part of a massive dictionary of tokens that Stable Diffusion understands. The AI model reads your prompt, compares it to the dictionary of tokens it knows, and splits up the words and characters into chunks that match a dictionary entry.

How Long is a Stable Diffusion Token?

A Stable Diffusion token can be as small as 1 character or as long as 32 characters. That is based on the actual entries in Stable Diffusion’s Tokenizer Vocabulary list. It all depends on how the tokenizer splits up your text and what words are in that dictionary. Some things to note based on the vocab list that was used to train the AI:

  • Most words will equal 1 token
  • Most single punctuation marks equal 1 token (so every time you add a “,” between words you are adding another token
  • A space also equals 1 token (so every space between equals adds another token to your total)
  • Some hashtag phrases are 1 token (there are several phrases made of several words that equal just one token because that phrase is all one word in the vocab list. Example: “throwbackthursday” is a single token)

Not All Words Have Their Own Token

Stable Diffusion’s vocabulary list is big, but it still does not include every word in the English language. If you try to use a word that isn’t in the tokenizer dictionary, then Stable Diffusion likely won’t know how to use it for the output image. Such words will therefore not show up in your prompts.

In short, obscure words are unlikely to directly influence your image results. A super specific type of model plane that was only in production in the 1950s may not have reference images in the training dataset so it may not be available for recreation in Stable Diffusion.

Tokens Can Be Added Through Fine-Tuning

Some models may have a slightly larger dictionary. That’s because custom models can be trained to understand new tokens. Likely, those custom models will only have a handful more words than the base dataset.

But if you want to generate images that look like a very specific model of car, then you could train Stable Diffusion on pictures of that car and introduce a new token to represent that newly-trained subject.

This means you can add your own custom subjects or styles into Stable Diffusion; something that is impossible with all the other AI art generations on the current market.

Do you want to make AI images of your own face? How about your cat? Or a specific character from your favorite manga? All of those can be custom trained and added to Stable Diffusion’s token list.

How Can I Tell If a Word is Known by Stable Diffusion?

There is a way to determine if Stable Diffusion will understand a word. Go to this web page and type your word or phrase into the search bar, then hit enter:

https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fknn.laion.ai&index=laion5B-H-14&useMclip=false

This will show you the images in Stable Diffusion’s training data that included your word in its tags. It will also let you see the types of images from which Stable Diffusion is inferring for that word. You may discover that you’ve been including a word in your prompt that Stable Diffusion doesn’t actually understand.

Stable Diffusion Tokens are Not the Same as GPT Tokens

Most of the articles and discussions I’ve read on this topic assume that all tokenizers are the same, or that Stable Diffusion tokens are the same as OpenAI’s Chat GPT tokens. Therefore, they assume that a prompt can be about 340-380 characters based on the OpenAI estimator here. But this is incorrect.

Stable Diffusion uses it’s own CLIP based tokenizer. So the OpenAI estimator will not be precise if you’re trying to calculate length.

How Call I Figure Out the Number of Tokens in My Prompt?

There is an easy way to determine your number of tokens if you are running Stable Diffusion on your computer with the Automatic1111 web UI. The token count is right in the interface!

Do you see those 5 picture buttons to the right of the prompt boxes and to the left of the big orange Generate button? Look below those picture buttons. Do you see the numbers underneath them?

That is your token length checker.

In my screenshot, the numbers say “15/75”. that means I’ve used 15 out of 75 tokens so far.

How Do You Bypass Token Limits in Stable Diffusion?

With all of this said, how can we circumvent the token limit? If you’re using an online service like DreamStudio, you can’t. Code can’t just be tinkered with when it’s on someone else’s server.

But if you have a local installation of Stable Diffusion, then you can use Automatic1111’s Web UI to get around the limit. He has already added an automatic feature that allows longer prompts.

The code modification in that interface breaks your prompt into chunks of 75 tokens each. It then sends these chunks separately through the CLIP text encoder and then links all the parts back together before sending them to the Unet neural network. Thus…super long prompts can still go into Stable Diffusion and the words at the end will still be considered by the algorithm when making an image.

I tested this out and, even at a prompt length of 1000+ tokens, the last words in my prompt were still being recognized by Stable Diffusion. So this bypass method works quite well.

Conclusion

That’s every question about Stable Diffusion tokens I can think of. If you’ve made it this far, consider yourself a zen master on the subject. I hope this guide helped you out. Please consider reading some of my other articles if you want to learn more about making art in Stable Diffusion: