Should You Install Stable Diffusion or Use an AI Art Web Service?

The Complete Comparison of Online Services vs Local Installation

The industry of AI-generated art is growing at an exponential rate. Every day, new advancements are being released and more people are jumping on the chance to make create beautiful art out of the images in their heads.

But there are so many service options for AI art that a newcomer can quickly get confused and frustrated. Which service provider should I use? Can I install the software myself on my own computer? Today, I’d like to answer these questions for you.

In this guide, I will explain the how, what, and why of AI art generation services; then I will provide you with the pros and cons for using Stable Diffusion on your own computer so that you can make a well-informed choice and start crafting your own AI creations.

How Do Online AI Services Work?

Before we tackle the bigger questions, we need to understand:

  1. What is an AI model?
  2. What are “AI Art Services” and how do they operate?

To begin, an AI model is essentially just a program trained on datasets to learn how it can simulate the data type at hand. For AI art, this means the program is fed a lot of images so it can figure out what makes an image an image…and how to make it’s own images. If you want more information on this basics of AI modeling, please check out my article here.

An AI Art Service Provider is a website that allows you to input a prompt and outputs a unique AI-generated image based on your prompt’s requests; the service provider then allows you to download the image. In a bit more detail, it works like this:

  • A company hosts the AI model on their servers
  • That company then creates a web interface which allows end users to interact with the AI model
  • Users creates an account to gain access and then ask the AI to generate image outputs
  • The output images are stored on the company’s servers, but users have (limited) access to download those images

As you can guess by this business model, the service provider retains control of who can do what with the AI hosted on their servers. But you might be wondering: can’t I host an AI model on my own server? Yes, you can! In fact, you don’t need an expensive and high-powered server to run an AI image generator…you can do it with just a personal computer as long as it has enough processing power (we’ll get to that later).

What AI Services Are There?

AI Art has been around for many years now, but the diffusion-based methods gaining popularity today are quite new. As such, there are only a handful of options for working with diffusion models. The biggest contenders are:

  1. Midjourney
  2. Dall-E 2
  3. DreamStudio by StabilityAI

Each of these companies provides access to their servers for a recurring fee to end users. However, just because they all appear to offer the same service does not mean they were all created equal. In fact, each of these service providers operate their own models with their own “secret sauce” tweaks:

  • Midjourney uses a propriety model based on v-diffusion for their service, but with handpicked data subsets1. In addition to that, I’ve heard it said that Midjourney also applies post-processing models and modifications to prompts behind the scenes to improve the image quality2.
  • Dall-E 2, on the other hand, employs a CLIP latent diffusion model like Stable Diffusion, but as far as I know the specific model for their paid services is close-sourced3.
  • StabilityAI actually uses Stable Diffusion as their AI. And, unlike the last two, this diffusion model is open-sourced. It’s a latent diffusion model using a frozen CLIP ViT-L/14 text encoder4.

Benefits of Online Services

I won’t sit here and claim that online services are all bad and open-sourced local installations are all good. In fact, there are some very good reasons why you might prefer a web service for your AI art wants & needs. When discussing the pros and cons of these AI Service Providers, let’s first look at the benefits they have to offer:

  1. Ease of Use. There is a lot less technojumble you have to learn when using a clean, sleek online interface from one of these companies. There sites are designed so you never have to tweak code…you never have to figure out what in the world “git pull” means.
  2. GPU Quality. Because the images are generated on the service’s hardware, you don’t have to worry about the specifications of your own graphics card. These company servers are likely equipped with GPUs that us normal people can only dream of affording. If you’re stuck with an under-performing GPU on your personal computer, you may be completely unable to load Stable Diffusion.
  3. Cheap Pricing (for short-term enthusiasts). If you only plan on generating a few images for a month or two, then the pricing for some of these web services may be a better value (compared to buying a flashy GPU for using SD at home). Several of these services even offer a free trial period or limited free version.

Drawbacks of Online Services

However, these web AI art providers are not without their disadvantages either. These drawbacks include:

  1. Price
  2. Lack of Privacy
  3. Censorship

Price

Pricing gets expensive for the long-haul. If you want to generate a lot of images, then the pricing plans for these online services are going to get costly quick. These plans bill you per GPU minute used…yes, even the monthly plans on Midjourney have a limit on how many GPU minutes you can use each month5. If you want to have your generations done privately on Midjourney, that’s an extra fee. It might actually be cheaper for you to buy a modest graphics card and generate as many images as you want on your own hardware. For example:

So far, I’ve averaged to generate 1,389 images a month (on my own computer using Stable Diffusion). If I wanted that kind of access on Midjourney, I would have to pay $30 a month plus roughly $32.60 each month for additional “fast time” GPU minutes. That’s a total of $62.60 a month to make AI art at my current capacity. But my current graphics card only cost me $350 new and it should last several years of use. The same amount of money I spent on that GPU would only last me 7 months of online service. Dall-E 2’s pricing is even more convoluted…you’re charged based on the amount of tokens you use in your prompts and depends on which model you use to generate.

Lack of Privacy

Lack of privacy. With any online service, the images are generated on their servers and then made available to you for download. But the company keeps the original copy of your generations on their servers. With Midjourney, you have to use a public discord server to generate art and your generations are available for all the other users to see. You have to pay extra for private generations. If you want to use the art for commercial purposes, it may make you a bit uncomfortable knowing that anyone else can grab those images and use them before you do.

Censorship

This may be the biggest turnoff of them all. Every one of those online services restrict what kind of content you can and cannot generate. You cannot create R-rated imagery or likenesses of celebrities and public figures with online providers. Doing so can get your account banned.

Maybe you don’t want to make nude paintings, but still…how do you feel knowing that the company is monitoring what images you generate and will control your creative output if it strays outside the boundaries of what they consider acceptable?

What if they start censoring the use of certain art styles out of fear that those artists might file a lawsuit? It’s entirely up to those companies if they want to ban the use of “in the style of Norman Rockwell” on their servers. You may not be aware, but a number of popular modern visual artists hate AI art and want it censored because they think it’s “theft”. That argument is for another article, but regardless…I don’t like the idea of a company gate-keeping what type of art I can create (even if that art is “created” with the help of machine learning).

Now that I’ve beaten the topic of web services like a dead horse, let’s move on to the alternative.

Stable Diffusion Local Install

There is another method of AI art generation that is unencumbered by the restrictions and paywalls of those web services. It’s called Stable Diffusion, and it’s an AI art program that can be run on your own computer for free. Stable Diffusion is an open-sourced latent diffusion model, which means the program code can be reviewed by anyone and even built upon by independent programmers.

While it requires a bit more learning and tweaking to get it operational, Stable Diffusion allows for the power of image synthesis in the hands of everyday consumers. And, in my opinion, it’s really not that hard to install anyway. I don’t know how to code beyond the most basic CSS and HTML, but I got it up and running in just a few minutes.

There are a number of GUI options available for Stable Diffusion (for the uninformed, a GUI is a “graphical user interface” i.e. a visual interface for controlling a program with buttons and sliders). This means you can still create AI art at home without needing to slog through tedious command line terminals. So let’s look at some of the pros and cons of Stable Diffusion.

Benefits of Local Installation

The advantages of installing Stable Diffusion on your own computer include:

  1. It’s cheaper in the long run
  2. More privacy
  3. No restrictions
  4. Advanced customization

It’s Cheaper

The Stable Diffusion software itself is free. If you plan on using AI art for a long time (like, longer than 6 months), then it’s probably going to be cheaper for you to just invest in a good graphics card than pay for a monthly service. If you already have a gaming card with enough dedicated VRAM, then you’re good to go.

Greater Privacy

If you run Stable Diffusion from your own computer, then you’re the only one who has access to the output images (well, unless you share your computer with someone else). Whatever images you generate are your own business.

No Content Restrictions

When you operate Stable Diffusion from your own computer, you are not bound by the limitations of any corporate entity’s community guidelines. In many of the GUI’s available, the NSFW filter is already de-activated; if you want to create lewd images you can. But it goes much further than that. With Stable Diffusion, I can create caricatures of politicians, re-imagine Emilia Clarke as a live-action version of Snow White, or use a custom model that makes everything look Seussical if I want to.

AI image of Emilia Clarke as Snow White

Customization and Flexibility

Because Stable Diffusion is open-source and has training extensions, any person with a sufficiently-powered GPU can create limitless new models catering to any specific niche style of art they want. That’s the greatest strength of Stable Diffusion: it’s base model may not look like much compared to Midjourney or Dall-E 2, but it’s customization features make it far more specialized. If you want your art to look like 80s film camera photos, you can use a model just for that. If you want to generate art that looks like Barbie dolls, there’s a model for that too!

Drawbacks of Local Installation

The upsides of installing Stable Diffusion yourself are impressive, but there are a few downsides as well, like:

  1. GPU limitations and other hardware constraints may prevent use
  2. More fine-tuning is required than with online services
  3. A bit of geekery is required

GPU Limitations

There are minimum requirements for running Stable Diffusion on a computer. Specifically, you need enough VRAM on your graphics card or the model will run out of memory before it can create anything. I’ve seen several posts showing that people can run SD on as little as 6GB of VRAM. The more dedicated memory your graphics card has, the faster images can be generated. If you want to train your own textual inversions or models, you will probably need at least 12GB of VRAM. You also need to have an Nvidia GPU in order to use most distributions of Stable Diffusion interfaces.

Fine-Tuning Required

I’ve seen complaints that Stable Diffusion’s base models (v1.4 and v1.5, but especially v2.0) are not as effective as Midjourney’s model. By “effective”, I mean that Stable Diffusion requires a bit more generations, settings adjustments, and prompt tweakings to get a good output image. However, I would argue that this slightly underwhelming baseline becomes irrelevant when you start using custom models and extensions; once you throw the customization into the mix, it becomes apparent that SD can do styles that Midjourney just cannot deliver.

Minor Computer Skills Needed

A local installation also means you have to do more leg work to get it up and running. This requires software and file downloads and a few command prompts (but most the commands you’ll need have been pre-written in your GUI’s installation guide). But if you’ve gotten this far in my guide, then I’m betting you have a willingness to learn a tidbit or two about software and code in exchange for free art generation at your fingertips.

Which Should You Choose for Your AI Needs?

Perhaps you came to this article with just one burning question: what AI art service or model should I use? Well, that demands on what you want it for. Let’s review what we’ve learned to find out what AI model will be the best option for your needs:

I recommend that you use an online AI art service if you:

  • Just want to dabble for fun,
  • Don’t have a GPU that meets the minimum requirements for a local installation and don’t want to spend the money on a better graphics card at this time;
  • Prefer the more refined digital painting style of Midjourney;
  • Don’t want to installment new software on your computer; and/or
  • Don’t want or need extra custom models and features

I recommend that you install Stable Diffusion on your own computer if you:

  • Want to generate a lot of images,
  • Have a good GPU or you’re willing to buy a nicer one;
  • Want to use your generations for commercial purposes;
  • Want to try out non-standard styles that aren’t available from the online services (like a specific manga look or photo-realistic portraiture);
  • Are interested in training models based on your own art style of face;
  • Want to generate images depicting celebrities, politicians, or public figures; and/or
  • Want to create lewd (NSFW) content

If you decide to install Stable Diffusion on your own, I have several guides on how to use it for TXT2IMG and IMG2IMG generations.

References

1https://www.reddit.com/r/MachineLearning/comments/xpb2c5/d_is_midjourney_ai_moreorless_the_same/

2https://www.reddit.com/r/StableDiffusion/comments/ytgxgb/v4_of_midjourney_is_beyond_ridiculous_hope_sd_can/

3https://www.assemblyai.com/blog/how-dall-e-2-actually-works/

4https://github.com/CompVis/stable-diffusion

5https://midjourney.gitbook.io/docs/billing#basic-membership-usd10-month-offers