AI Application

AI Avatar Video Generation Pipeline

About

This pipeline lets you create unlimited high-quality AI avatar videos from just a portrait image and an audio file, completely free and self-hosted.
It runs entirely inside ComfyUI using WAN 2.1 (14B, fp8) for image-to-video generation and InfiniteTalk for realistic lip-sync animation, producing output comparable to paid services like HeyGen and Kling.
Setup is streamlined with an included shell script that automatically downloads all required models (WAN 2.1, InfiniteTalk, MelBandRoformer VAE, UMT5-XXL text encoder, CLIP Vision, and Lightx2v LoRA) into the correct ComfyUI folder structure.
The workflow is packaged as a single JSON file. Drag it into ComfyUI, upload your portrait and audio, set the frame count based on audio duration, and run. Cloud GPU rental via vast.ai makes it accessible without expensive local hardware.