Resource
How to Build a Custom AI Image Model Trained on Your Company's Content
February 1, 2024
Many creative teams are starting to explore building custom AI image models using their company's proprietary visual content. Most creatives have some familiarity with general-purpose, consumer AI models like DALL-E and Midjourney. These models produce high quality images, but lack the control, specialization, and context to provide much value in the creative process. 

Building custom AI models, trained on your team's proprietary content, might position you better than relying on general purpose models.

But it’s difficult to know where to start. 

The advice in this guide will show you when custom AI models might make sense for your business, how to navigate the complex AI model landscape, and what to look for when training and deploying custom AI models. While this focuses on AI image models and creative studios & teams, the frameworks and advice could be applied to other modalities and businesses.

When you should build a custom AI image model

If you or your team have explored general use generative AI products but are not getting the level of control you need, it might make sense to invest in training a custom AI model. 

Think about how your team is currently using AI image models - if you are mostly using tools for inspiration & idea generation, but haven’t seen them become a core competitive advantage or value multiplier in production pipelines and creative workflows (like on games or creative products focused on novel, unreleased IP), then building a custom AI model trained on your proprietary content probably makes sense for your business.

Building a custom AI image model is particularly advantageous for businesses rich in intellectual property (IP), with existing creative workflows. 

If your company or studio has a wealth of unique images, designs, products, characters, worlds, or stories, a custom AI model can use IP to improve the image generation quality, and help your creative team generate assets that consistently reflect your team's vision and aesthetic (eg. creating consistent images of a specific character in different clothing, environments, or poses).

You can also consider developing custom AI image models when you have particular types of controlled image generations or transformations that need to conform to specifications that are unique to your creative pipeline and process. An example might be game studios with certain image compositions, UI elements that have a consistent theme specific to your project, or specialized assets that can be fed into the technical art process. 

To summarize, custom AI image models are great for studios and businesses with any of the following:

  • Lots of intellectual property
  • Defined creative workflows and production pipelines that need controllable and consistency image formatting and aesthetics
  • Defined briefs and requirements around asset & image generation or transformation

AI Image models types

Although there are a growing number of different AI image model types, there are a handful of important classifications and distinctions that most can be categorized in. Here’s a basic overview of the most important distinctions:

1. Source Availability and Licensing

Open Source Models: Open source models are models whose code and training data (often called “weights”) are freely available for use, modification, and distribution. With open source models, businesses take a unique version of the model and customize it to their needs. This allows businesses to maintain ownership of their own version of the original open source model.

Source Available Models: Source available models are models that share their code and information about their training, but restrict the usage of the model through a license. Some licenses completely restrict commercial use, require licensing agreements for commercial use, or require any changes you make to the model to be published openly (e.g., copy-left licensing like AGPL). It’s important to understand the licensing restrictions of any model or technology you choose or build on top of.

Note: Most source available models will still call themselves “open source”, so it’s imperative to understand the licensing related to the underlying tools you’re using.

Closed Source Models: Closed source models are proprietary models developed by companies or organizations where the source code and training methodologies are not publicly accessible. With a closed source model, you are licensing temporary access to the model and any additional training data that improves the model. 

2. Based on Generalization and Specialization

Foundation Models: Sometimes referred to as a “base model”, this is the fundamental architecture or framework of the AI image generation tool, and serves as the core foundation upon which additional functionalities, customizations, and specialized models are built. You can think of this like the “dictionary” of concepts, styles, and visual elements that an AI understands.

  • Examples: Models like GPT-4 (which power ChatGPT), DALL-E, Stable Diffusion (SDXL)

Specialized Models: A specialized model (sometimes called a “finetuned model”) is a version of a foundation model that has been re-trained or modified in some way. Typically, this is done to produce better results within a specific domain or industry (eg. a company like Nintendo might re-train a general use foundation model to recognize the word “link” as the character from Zelda rather than the dictionary definition of the word). The data used to fine-tune the base model can either be injected directly into the base model, or that data can be separated from the base model and created into a standalone lightweight model that works alongside the base model. This is typically called a LoRA model:

  • LoRA Models: LoRA (short for Low-Rank Adaptation) is quickly becoming one of the popular choices for models trained to specific use cases. LoRA models are small, lightweight models that work alongside a foundation model. Essentially, the model is trained on a small subset of images (anywhere from 15-100 images can effectively train a LoRA) to guide the generation process towards a specific outcome or goal.

Adapters: Adapters are models that are trained to perform specific tasks or cater to specific domains. These models often operate in partnership or “on top of” a foundation model and are usually only compatible with a specific foundation model or model family. Ensure that you are using a foundation model and deployment solution that supports and will continue to support the most recent / useful adapters.

There is not widespread consensus on what to call this type of model. Some sites will just call these “models”, some will refer to them as “sidecar models”, and some (like us) have started to use the term adapter or control adapter. 

  • Examples: ControlNet, T2I Adapter, IP Adapter

The terminology around AI model types is rapidly evolving. We expect more codification and industry alignment in 2024 and will update this guide as terms as the industry starts to align on naming conventions. For now, our goal with this guide is to give you a starting point for exploring and understanding the different offerings available to you.

If this all feels like trying to buy eggs at the grocery store and figuring out what’s the difference between “farm raised” and “pasture access” - just know you are not alone. We spend our entire days following this stuff and it’s still difficult to keep up. 

How to train a custom AI image model

The process of actually training a custom AI image model starts by identifying the workflow or production use case that you are trying to improve. Custom AI models are trained to produce a specific result, so the clearer you are about what & how you want the model to produce, the better your results will be after the model is trained. 

We recommend identifying one or two use cases that a LoRA model could solve.

For example:

  • I want my team to be able to produce concept art in a consistent style
  • I want my team to be able to produce images with a specific character or environment consistently
  • I want to apply standard proprietary UI components to images
  • I want my team to be able to produce skins or other purchasable products that fit the artistic style of a game within our IP

In this process, a studio would leverage an openly licensed foundation model (like SDXL) and then train a LoRA model, or other Adapter model, to work alongside it. Training a LoRA model requires specialized expertise, but is significantly less resource intensive than training your own foundation model. This allows you to prove the ROI and value of a generative AI solution before expanding into larger, more resource intensive AI model development.

The actual training can be accomplished in a few ways:

  • Hiring a professional model trainer. This is a fairly new field, but we fully expect model training to develop into a well established profession. Today, many of the best model trainers come from the open source community, as well as academic research positions. (Disclaimer: Invoke offers this service to our enterprise clients). 
  • Use a cloud-based training service like Hugging Face. There are software solutions that allow you to upload images, tag them, and train your own model. Doing this on your own requires a lot of trial and error and you’ll need to research best practices around metadata tagging and image selection to get a model that successfully accomplishes your specific use case. You’ll also need to make sure that the model type that the software produces for you is compatible with the method you want to deploy and use that model. 
  • Install a LoRA model trainer locally. There are several open source model training apps (like Kohya, Diffusers Training, or Invoke’s Training Repo) that you can install and run locally on your computer. If you have technical experience running open source applications, and hardware that is powerful enough to run model training, then this might be a good option for you.
It’s important to think about model training as an ongoing process and expense.

Custom AI models help accomplish a specific goal or use case and you’ll need to continually develop new models as you find new opportunities for generative AI in your production process.

How to deploy a custom AI image model to your team

There are different ways to deploy a custom AI image model to your team. Essentially, “deployment” is just a way of saying “this is tangibly how the model, once it is trained, gets used by you/your team/your business”.

There are three main ways to deploy a custom AI model to your team:

1. Through a web app / UI

For most studios, starting by deploying a custom AI model through a web application or UI makes the most sense. This quickly provides users (typically artists) with a familiar interface to generate, edit, and refine images powered by your custom AI model, without the artist needing to learn or understand the underlying technical infrastructure.

There are numerous web applications / UI (Disclaimer: This is something Invoke offers) and you should choose one that doesn’t use your work to train other people’s models, does not include any code or licenses that limits commercial use of your images, and meets your other requirements around security, control, speed, and quality (see Questions You Should Ask below for additional qualifying criteria). 

2. Installing the model and a UI locally or in your own cloud environment

Studios with dedicated IT/ML teams who are familiar with self-hosting open source software might choose to install and run the model and UI directly on their own hardware or in a cloud environment they control. Implementing and maintaining this kind of infrastructure is costly, time-consuming, and takes a lot of engineering to make it viable for a business. Additionally, any custom development that integrates on top of software with an AGPL license will need to be open sourced fully.

3. Accessing the model via API

Studios with dedicated technical resources might also build a custom application that accesses the model through an API. This might be necessary or preferable for highly bespoke use cases or self-hosted design environments/applications, but requires significant resource investment and planning. Most API providers offer stock models, so you may not be able to integrate a custom model, but this is a space to keep an eye on as providers expand their offerings.

Which custom AI image models are best suited for your studio

When you think about choosing a custom AI image model, studios should be thinking about the entire lifecycle of their model: from implementation planning to training to deployment to maintenance and optimization. 

Studios will need to assess model types, training costs, and deployment solutions based on cost, ease of setup, power, functionality, integration, security, and data ownership.

In general, we recommend studios look for a partner that can support them in the end-to-end training and deployment of custom AI models.

As your studio becomes more familiar with the process and proves out ROI, you may be able to bring certain capabilities in-house, like model training or administration of the end-user application.

Few game studios have deployed a custom AI image model, and those that have are still not sharing much detail about their approach. We won’t know the “best-in-class” solution for some time, but as a team that has worked with both AAA game studios and indie developers, we’re seeing some patterns among the game developers that have moved quickly:

  • They are largely using open source foundation models. Most are working with SDXL and the Stable Diffusion family of models as these models have the most community support and innovation. 
  • They are starting small, training custom AI models (like LoRA models) for specific use cases and proving out the value of that investment.
  • They are primarily deploying these models to their teams through web applications / UIs that are familiar to creatives

Open-source models are appealing to game studios because they provide flexibility, a wide community of support, and customization that aligns with specific game development needs. Some open-source models also allow developers to completely own the model and model weights (or training data), which is appealing as this technology is so new. Closed-source models have yet to offer the same depth of functionality and capabilities as open source models when it comes to custom model development and training AI image models on your own work, though they are likely working hard to develop that feature set.

Starting with developing or training your own LoRAs (Low-Rank Adaptation models) offer an efficient way to produce consistent images in a predefined workflow, like character designs.

Deploying the model through a web application is the easiest way to get your team using the model and driving value - Especially with services that offer flexible workflows built for professional teams. While some studios have explored on-premise implementations, we recommend starting with whatever is the easiest to deploy that will help you learn what is valuable, and that is usually some kind of cloud-based deployment. 

Questions to ask when choosing a custom AI model solution

There are plenty of considerations that go into choosing any software vendor, like cost and support. Here are some questions that are specifically relevant to choosing an AI model solution: 

Control & Customization

  • How much control do we have over the model's learning process and output generation?
  • Can the model be customized to align with our specific artistic and technical requirements?

Legal Security & Data Ownership

  • Does the model share data with other customers’ models?
  • Does user behavior help improve other customers’ models? If so, how?
  • Do I own the model? Or am I licensing access to it (eg. via a subscription?) How does the model ensure our ownership of both the training data and the generated content?
  • Do I own my outputs/generations?
  • Is the model doing any behind-the-scenes prompt enhancement that might open us up to legal risk in the future? 
  • Is there a way to audit the steps used to generate an image?
  • What kind of governance is there?
  • If I want to leave the service, can I take my model with me?

Integration & Interaction

  • How easily can the model be integrated into our existing production pipelines and workflows?
  • What level of user interaction is required, and how user-friendly is the interface?
  • Can we develop custom workflows that integrate with our creative process?

Protection & Privacy

  • What measures are in place to ensure the security of our data during the model's training and operation?
  • Can we be confident that unreleased IP entered into both the model and deployment solution (eg. web application) remains isolated / secure?
  • How does the model handle privacy concerns, especially when dealing with sensitive content? Can we audit inputs for inappropriate content or problematic keywords?
  • Does the deployment solution offer enterprise-level access and permission control (eg. SSO, RBAC, MFA)?

Speed

  • How fast does the model generate the desired outputs? How fast does the model and deployment solution compare to other options?
  • Has the model shown improvements over time in speed and performance?
  • Does the model’s processing speed align with our production timelines and efficiency goals?
  • How does the model perform when multiple users are generating at the same time? How confident are we the model will perform at scale?

As with any emerging technology, the landscape is changing rapidly. We will work to keep this guide updated as there are new developments in custom model training and deploying.

(All images in this article were created in Invoke, with permission, using a custom AI image model trained on acclaimed artist Peter Mohrbacher's work).