AI image generation has captured the imagination of millions of people, but to-date, most applications of the technology have felt more like a fun toy than a professional tool. Typing in a prompt and getting an image can be a delight, but likely doesn’t help professional creatives do their job better.
That is starting to change, as businesses are starting to experiment more with generative AI and implement more extensible and powerful solutions into their teams’ professional workflows and creative processes. Choosing an AI image generation solution can be a daunting task, especially in a landscape that is changing so rapidly. Rather than tell you what to choose, we’ll help you understand the landscape and the technology that goes into making an AI image generator work.
We’re also going to skip over the basic steps of choosing software that any guide could tell you (identify your goals, identify your budget, define stakeholders). We recommend using ChatGPT to give you a great template / project plan for going through a software procurement process.
What exactly is an AI Image Generator?
An AI image generator is a broad term, generally used today to define tools that allow users (through an application), to prompt a model with text or visual inputs (text or image) in order to generate an output. Most AI Image Generators are made up of a few key components:
- A base or foundational model: This is the fundamental architecture or framework of the AI image generation tool, and serves as the core foundation upon which additional functionalities, customizations, and specialized models are built. You can think of this like the “dictionary” of concepts, styles, and visual elements that an AI understands.
- Specialized models: These are models that have been customized from a base model to perform specific tasks or cater to particular domains. These models take the general capabilities of the base model and refine them to better handle certain types of input, produce specific kinds of outputs, or meet unique requirements of a particular field or application.
- An end-user application: This is the software interface or platform that allows end-users, such as artists, designers, or business professionals, to interact with and utilize the underlying AI models (both base and specialized models) to generate images.
Where To Start: Straightforward Tools For Stock Photo Replacement Or Inspiration?
For businesses that are primarily using generative AI as a tool to replace stock photography or to create images for inspiration, the best options are straightforward closed source AI image generators. A closed source AI image generator is one where:
- The base model is owned by the vendor and you license access to it
- The vendor controls the set of features, updates, and models that are available to you
- The vendor does not publish their code
- The vendor chooses which systems their tool integrates with
Midjourney, Dalle-3, and Adobe Firefly are great examples of solutions that provide simple, easy interfaces, producing low-effort results that are sufficient for most stock photo purposes. There are other closed source solutions that focus on specific business verticals, like Scenario, which provide quick game assets that replace the need to purchase many of the mobile game asset packs a smaller indie developer may have used.
These solutions are typically easier to use out-of-the-box or “fool-proof” with limited custom settings or processes. Both the models and applications are designed for you to quickly provide simple inputs and to reliably get back a high-quality image. They also typically have large communities of users, so there may be more freely available user-generated support documentation or tutorial content.
However, the main limitation of closed source solutions are that:
- Customers license access to a proprietary model, so there is little freedom to modify or adapt it to their specific needs or art direction
- The applications are designed to output a generally high-quality image, but do not allow for a high-level of creative control or customization
- It is impossible to know for sure how users’ input data is being used and most closed source solutions use user input data to improve their own proprietary models.
These solutions can be fantastic tools for individual users or small businesses who:
- create assets or images that don’t require a significant amount of creative control
- using the tool more for inspiration in the creative process, rather than production in an existing professional workflow
- don’t have a team or individual responsible for managing technology infrastructure at their organization
- don’t have sensitive intellectual property or content that they care is being used to train other people’s models
What Tools Work For Confidential IP Or More Complex Asset Production Pipelines?
For businesses dealing with sensitive intellectual property or those with complex, multi-step workflows for asset generation (eg. game design studios, film & tv studios, retail e-commerce, etc), open source AI image generators make a lot of sense.
An open source AI image generator is one where:
- The base model is openly licensed, meaning you can maintain complete ownership of a version of the base model that is fine-tuned to your business.
- You can contribute to the open source code, meaning you can develop features or models to adapt the core tool to your business’ specific needs.
- The code is published, so you can rest assured that investments in the technology and workflow will be accessible long-term.
- Users have the liberty to modify and integrate the tool with any system, which is particularly beneficial for organizations with unique or complex tech infrastructure.
Open source solutions typically offer greater customization and flexibility but demand more technical involvement to see that value, whereas closed source options provide a more controlled, “out-of-the-box” experience with less customization, control, and ownership.
Invoke, Stability AI, and Hugging Face are some of the businesses in the open source community that are building models and applications for generative AI. There’s a lot of variability within the landscape of open source solutions. Certain open source projects are more focused on providing businesses with end-to-end model and end-user application solutions (like Invoke). Other open source projects focus more on specialized functions and overall improved model performance.
In general, open source AI image generation solutions are great for businesses that:
- Require a high-level of creative control in the image generation process (eg. creative teams working on specific asset generation projects, workflows or tasks)
- Work with sensitive or confident IP that they don’t want shared between orgs or used to train others models
- Want to be able to customize the model and application technology infrastructure to meet their organization’s specific needs and use cases
To decide whether closed source or open source makes the most sense for your business, start by getting alignment with your team on some of these key considerations:
- How much creative control do the end users of these tools need to complete their work?
- How important is it for my intellectual property to be isolated from other accounts and/or not used for training others’ models?
- How important is it to be able to develop and deploy custom models for my business?
- How important is owning your own model vs licensing access to someone else’s model to produce images?
- What level of security and permissions do you need to deploy an AI solution?
If you are actively considering generative AI tools for your business, or want help thinking through the answers to these questions, we’re happy to talk! We know it’s difficult to keep up with all the rapid advancements, so we are happy to help anyone looking to better understand these tools and how they may impact your business.