lllustrious: The AI Model That Wants to Rule Anime Art Generation

Illustriousa text-to-image model based on Stable Diffusion XL, has become so dominant in the AI art community that Civitai, the largest hub for AI art models, had to create a separate category just to manage its vast ecosystem of resources.

And it all happened in three months. The secret behind the success? A return to basics with a twist.

While newer models such as SD 3.5 and Flux rely on long natural language descriptions, Onoma AIthe developers of Illustrious, took a different approach by using Danbooru tags to help their model understand concepts without having to reinvent the wheel with complex subtitling systems.

The model’s training in Danbooru’s vast library of tagged anime images gives it an edge in understanding visual concepts.

Each tag in the Danbooru system represents specific elements such as character traits, clothing items, poses or backgrounds, allowing precise control over the generated images without wasting precious tokens on long descriptions.

These tags have been around for years and have become something of a standard for categorizing images among art/anime enthusiasts.

The model is very accurate and efficient when it comes to understanding the characteristics of a photo.

“It’s like having an artist who understands exactly what you want without having to explain it in paragraphs,” shared Vishnu, a Discord member who participates in a server focused on NSFW AI content. Declutter. “You just need to know the right tags.”

At its core, Illustrious uses the good old SDXL architecture with an advanced dual encoder system that combines CLIP ViT-L and OpenCLIP ViT-bigG to understand words and associate them with their visual equivalent.

The model can process and generate images at an impressive resolution of 1536×1536, with the ability to stretch to 2048×2048 and even 3744×3744 without significant quality loss.

For context, the original SDXL handled full HD resolutions (1024×1024).

Deep dive

The journey to create Illustrious was methodical and deliberate. The initial training phase, which produced version 0.1, processed 7.5 million images at 1024 x 1024 resolution with a batch size of 192 images per batch.

The team carefully balanced the learning rates, over twenty epochs (the process in which AI studies 100% of its data set) to build a solid foundation. Once the results were satisfactory enough, the team moved on to increasing the size of the dataset and the resolutions used for subsequent iterations.

The advanced training phase is where Illustrious really started to shine. Version 1.0 expanded the dataset to 10 million images and increased the resolution to 1536 × 1536.

Although they reduced the batch size to 128, they introduced advanced tag manipulation strategies and registered tokens, fundamental changes that defined the model’s exceptional performance.

The final refinement phase for version 2.0 went one step further. Working with 20 million images at the same high resolution, but with a larger batch size of 512, the team integrated a multi-caption method that dramatically improved the correspondence between text and image.

The result was the best waifu generator known to man, with good tuning capabilities, fast compliance, decent aesthetics and high-quality output.

For the more tech-savvy developers, the Illustrious developers have also introduced many interesting techniques such as a “No Dropout Token” approach, which ensures that specific tokens are never excluded during training; the implementation of Quasi-Register Tokens, so that the model can handle unknown or strange concepts; a Cosine Annealing Scheduler, for the learning rate; a Multi-Level Dropout system and Input Perturbation Noise Augmentation, to turn a simple AI model into a powerhouse.

How to use Illustrious

Illustrious does not require any additional steps to work.

The installation process is the same as with any other SDXL model. Download the checkpoint and place it in the corresponding folder depending on which user interface you are using.

Windows and Linux

For ComfyUI the route is \models\checkpoints.
For A1111/Forge the route is /models/Stable-diffusion.
For Fooocus the route is also \models\checkpoints.

macOS

Mac users have similar routes. However, some popular macOS-oriented user interfaces require additional steps.

Draw Things users will need to click ‘Models’, go to ‘Customise’ and then click ‘Import Model’.
From there, they can enter the URL to download Illustrious directly or click “Import Custom Model” to select the file if they have downloaded the model and saved it to their local drive.
Diffusion Bee users will need to click the hamburger icon in the top right corner, then click ‘Settings’, then click ‘Add New Model’ and select their locally downloaded illustrious checkpoint.

Once the model is loaded, there are three things to keep in mind.

Do not use natural language. Remember to rely on Danbooru tags and stick to the old SDXL prompt style for better results.
Do not use Pony LoRas. Since the model uses different approaches, it is better to use Illustrious Loras for the best results.
Don’t try to use the original Illustrious model, but instead choose some of the most popular fine-tunes. The original Illustrious model is a basic model, perfect for refinements focused on the results you want to achieve. It is the same as SDXL, Pony or Flux. Fine tunes usually produce better results.

The best Illustrious models to choose from

There are many models to choose from, all focusing on different styles, aesthetics and features.

There are even general models like those from Noob AI that used Illustrious as a base and are used by fine tuners to build their models.

However, here are our top photos for different needs. These are great in quick understanding, output quality and ease of use. All examples are from the Civit AI community and are copyright free.

Best for Versatility: Mistoon_Anime