Does Prompt Order Matter in Stable Diffusion?

How you wondered about the best way to order words in your AI art prompts? Today, I’ll explain if and how word ordering affects image generation in Stable Diffusion.

Quick Note: the images in this guide were generated using the DreamShaper model.

Does Prompt Order Matter in Stable Diffusion?

Somewhat yes, prompt order matters…to an extent. Stable Diffusion will usually pay more attention to the first noun in your prompt. Beyond that, the weight of words doesn’t matter nearly as much as the contextual association of your words. We’ll discuss word association later in this article.

Emphasis Matters

What is more likely to impact the weight of a keyword in a prompt is the use of emphasis punctuation. Check out my guide on emphasis to learn about that topic. But, in short, you can tell Stable Diffusion to focus more attention on a certain word or phrase by putting it in parentheses. By doing this, it likely won’t matter where that emphasized word appears in the prompt order.

Prompt Length Matters, Too

That is, unless it’s at the very end of a very long prompt. That’s because Stable Diffusion, by default, has a limit on how many “tokens” can appear in a prompt. So if your prompt has too many words and punctuation marks, then the AI will eventually cut off your prompt at a certain length. I have an article all about prompt lengths and the cut off number as well.

What Should Go First in a Prompt?

So what words should appear at the beginning of your prompt? Whichever words are the most important for an accurate rendering, I suppose. This will depend on the subject matter and the medium you want to emulate.

The first noun mentioned can easily become the dominant keyword of a prompt, so I recommend that you start off your prompt by mentioning either the primary subject or specifying the art medium you want. If you start with a word that you did not intend to have dominance in the image, then you may be disappointing with the resulting art.

Word Order Examples

Let’s look at some examples. Our prompt will be about a portrait of a woman. In the first prompt, we’ll put “young woman” as the first key phrase of the prompt and specify “portrait” at the end. In the second version, we’ll move “portrait” to the beginning of the prompt. Let’s see the difference.

Prompt: young woman in a blue sweater, detailed face and eyes, portraitPrompt: portrait, young woman in a blue sweater, detailed face and eyes

Moving the word “portrait” from the end to the beginning of the prompt had a perceptible difference in the images that were generated. Notice that, for the prompt starting with “portrait”, the background and hair have a more photo-like quality and dept of field. I suspect that the AI closely associates “portrait” with “photography” and, since we mentioned that word first, it pulled more photo-like elements into the generations

You’ll also notice that the color blue dominated the images. It caused her eyes to also appear blue in every instance. But what would happen if we introduce another color before we mention the blue sweater?

Prompt: portrait, green, eyes, young woman in a blue sweater, detailed face and eyes

We still requested a blue sweater, but because the term “green eyes” appeared first in the prompt it overpowered the other color.

So if you specify colors for an image, it’s very likely that the first color you mention will be dominant in your output image.

Emphasis Overrides Word Order

But as soon as you start adding parenthetical emphasis to a word or phrase, the order of your words really becomes arbitrary. Even if you place a word at the very end of a very long prompt. If a keyword is in parentheses, it’s going to be weighted more regardless of it’s location in the prompt. Here’s an example of a needlessly long prompt with emphasis places on the very last phrase.

Prompt: young woman, portrait, detailed face and eyes, orange sweater, autumn colors, soft lighting, morning light, detailed, realistic, 4K, HD, freckles, cute, beautiful, gorgeous, brunette wavy hair, smiling, shallow depth of field, bokeh background, city street with buildings, Boston, 50mm lens, Kodak Portra 800 film, by Alec Soth, by Marta Bevacqua, by Petra Collins, photorealistic, (((blue eyes)))

Sure, I mentioned an orange sweater and even added “autumn colors” to support that suggestion. But Stable Diffusion just didn’t care. It say the heavily weighted phrase “blue eyes” at the very end and added blue to the subject’s clothing and even hair in some places!

The lesson here: order is not that important. Any internet guru who claims otherwise is probably not testing the software out enough.

Prompt Associations are Important!

After testing many different prompts, I’ve found that the AI’s understanding of a word or phrase matters more than the order of words. You can use the same keywords in a variety of orders and still get a very similar output image.

But if you ask Stable Diffusion for something absent in it’s training data, it won’t matter what order you put the words in…your output image likely won’t match your prompt very accurately.

That’s because Stable Diffusion can’t actually think for itself; it’s just a statistical mapping of keywords to images. The training for this software went something like this:

  • Internet images were fed into the training computer
  • Each image also had keywords tags linked with them
  • The computer associated those keywords with the way that their corresponding images looked
  • It therefore “learned” what an image is probably going to look like if it has a certain keyword phrase

Syntax Isn’t Very Important

But no AI art software is reading your prompt and consciously understanding the syntax of your sentence. It’s associating and mapping words together (like data points) based on the probability of them appearing together in the training data.

If you told me “draw a woman with green eyes and a blue sweater”, then my brain knows to break down your sentence based on English language syntax into the following chunks:

  • a woman
  • who has green eyes
  • and is wearing a blue sweater

But Stable Diffusion isn’t reading the sentence and making intuitive decisions on how to chunk up those words. It’s going to mix and match any words you write into pairs that make sense based on it’s dataset.

So don’t write full sentences describing the scene you want to see. Stable Diffusion doesn’t care about sentence structure, it just looks for keywords that it knows.

Common Word Associations

Let’s look again at the topic of a “blue sweater”. Both of those words are pretty common, so it’s likely that a few images in the training data had the term “blue sweater” associated with them. But what about a more obscure color name like chartreuse or mauve?

Prompt: portrait, young woman in a chartreuse sweater, detailed face and eyesPrompt: portrait, young woman in a mauve sweater, detailed face and eyes

It looks like chartreuse and mauve are not well represented in the dataset…at least not in relation to sweaters. When we pushed for a mauve sweater, Stable Diffusion starting taking more creative freedoms with the outputs because it did not have enough data to understand the color request.

Does Prompt Order Matter for Negative Prompts?

No, prompt order does not matter for the negative prompt. I’ve run several tests but I’ve been unable to find any image results that suggest order matters here. The hierarchy of negative prompts showed even less correlation than the order of positive prompt words.

Conclusion

It’s recommended that you place your most important keywords first in a Stable Diffusion prompt. Beyond that, emphasis through word weighting matters more than word order. Even if you have hundreds of words in your prompt.

Thanks for stopping by. If you found this article helpful, here are a few more you may like: