Newsletter

Deforum & Serverless GPUs: Crafting a User-Friendly AI Art Tool

Innovating with Deforum and Serverless GPUs to Empower AI Creators

Marius Jopen

30 Aug 2024 — 6 min read

Dear friends,

When I am generating my AI images and videos I am still operating in a very technical world.

The tools don’t feel very intuitive, especially when using my favorite video generation tool called Deforum.

As mentioned in my previous post, I decided to develop an application which lets users easily generate AI images and videos.

If you want to know how this works — read on. 🤟
If not, you can scroll to the bottom where I attached some nice images… 👇

Here are my technical requirements for the application:

This application is a scalable platform for image and video generation, powered by advanced AI models and tools like Deforum and Stable Diffusion. It leverages AWS for dynamic GPU scaling, ensuring fast performance for users.

With easy integration of custom models and an intuitive API, it’s designed for both casual creators and professionals seeking a powerful, flexible solution for their creative needs.

Autoscaling

When users use the app, the amount of GPUs should autoscale. On Runpod, we are limited to 5 GPUs when autoscaling a serverless pod. To use more GPUs, a bigger plan needs to be purchased.

As far as I understood, Runpod is not a good solution when we have 100 users who want to produce images or videos. AWS EC2 G4 (https://aws.amazon.com/ec2/instance-types/g4/) offers this. There are also other services which might offer this, but as far as I understood, AWS is the best solution.

If scaling beyond Runpod’s 5 GPU limit, it would be wise to look into AWS EC2 Auto Scaling groups or AWS Elastic Kubernetes Service (EKS) for dynamic scaling. This could allow handling a much larger number of concurrent users.

Additionally, implementing GPU utilization monitoring with AWS CloudWatch or third-party tools can help optimize resource allocation and trigger autoscaling events when necessary.

Speed

At home, I have a computer with a 4090, and I am very happy with the speed of image generation. Faster is always better but also more expensive. It would be important to not be slower than the 4090.

I do not have experience with the GPUs on AWS, but the image generation should not be slower than the 4090. To ensure this, benchmarking AWS GPUs against an RTX 4090 would be a critical step. Real-time monitoring of GPU performance can also ensure that the speed remains consistent.

Deforum Integration

Most important for me is that users can generate videos with Deforum (https://github.com/deforum-art/sd-webui-deforum). I did not find a lot of documentation online on how to run Deforum on a serverless GPU, but this is the crucial part.

Deforum also uses ControlNet (https://github.com/lllyasviel/ControlNet) to manipulate the output, so ControlNet needs to work as well. I really don’t know if and how this is possible because, as far as I understood, Deforum needs to be integrated into Stable Diffusion Automatic 1111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui).

So here I am curious how this will work. Additionally, planning for future integration with ComfyUI workflows, including custom extensions like Refiners, AnimateDiffusion, or the new image model FLUX, will ensure that your system remains flexible and extensible.

Adding Custom Loras and Checkpoints

I want to easily add custom Loras and Checkpoints once the project is running. It would be annoying if I need to upload a 90GB Docker file every time I want to add a new Lora or Checkpoint.

The best would be a folder where I can just drag and drop the Loras and Checkpoints, and I can use them. This drag-and-drop method should not require downtime or redeployment, making it convenient for continuous updates and scalability.

API / Receiving Data

I want to access the models via API. It would be good to not only receive the image but also the log, so users can follow the status of the image generation and see errors. I assume that the user will receive the image as a text file, which will then be encoded so it will be an image again.

Or it might make more sense to directly save the images somewhere, for example, on an S3 volume. Storing generated images and logs in an AWS S3 bucket could be an efficient solution, with secure, direct links provided to users via the API. Detailed logging will also help with debugging and auditing user activities.

I still need to add a user area to my frontend code. I will probably use Supabase (

https://supabase.com/

). It would be good if I can see for each user how much time this user spent generating on the GPUs. So receiving the time or even the costs for each image or batch generated would be good.

To manage costs effectively, it’s important to set up AWS budget alerts. API security measures such as OAuth, API keys, or JWT should also be implemented to ensure that only authorized users can access the services.

NodeJs and SvelteKit

I am a frontend developer and can do NodeJS as well a bit. I don’t really need help to build my application. Maybe only setting up the API endpoints in NodeJS would be helpful. Once they are set up and running, I can do the rest.

Video Generation

Deforum produces an image sequence. In Automatic 1111, it also generates a video file from the image sequence. I think I can take care of that myself. If I have more questions here, I will ask. Automating the video rendering process using tools like FFmpeg integrated into the backend or directly AWS would be good to know.

Image Generation

I do not only want to create videos with Deforum but also generate static images. Those images should have the same Loras and Checkpoints applied as they are on Deforum. I want this to create test images before letting Deforum render the whole video.

ComfyUI and Flux

At some point, I want to also use custom workflows from ComfyUI. Those workflows have custom extensions like Refiners, AnimateDiffusion, or the new image model FLUX. I need an easy way to upload extensions for my ComfyUI workflows.

On Baseten (https://www.baseten.co/blog/how-to-serve-your-comfyui-model-behind-an-api-endpoint/) they explain how to prepare a ComfyUI workflow for API use. I know that we don’t use Baseten, but it might be interesting to see it. Planning for this future integration now will make it easier to incorporate these features later.

LLM

Also, I would like to have a basic LLM running like LLaMA. It does not need to be super high-performing. I only want to use it to help users come up with more creative prompts. But this is not a high priority for now.

Frontend Backend

I already worked on the NodeJS backend (https://github.com/marius-jopen/limn-server) and the SvelteKit frontend (https://github.com/marius-jopen/limn-client). I am not a NodeJS professional, but feel pretty comfortable with JavaScript. I think I can find my way around it here.

Allrighty… Here are the promised images (Including Midjourney prompts):