AI Avatar Video Generation Pipeline
AI Application

AI Avatar Video Generation Pipeline

About

  • This pipeline lets you create unlimited high-quality AI avatar videos from just a portrait image and an audio file, completely free and self-hosted.
  • It runs entirely inside ComfyUI using WAN 2.1 (14B, fp8) for image-to-video generation and InfiniteTalk for realistic lip-sync animation, producing output comparable to paid services like HeyGen and Kling.
  • Setup is streamlined with an included shell script that automatically downloads all required models (WAN 2.1, InfiniteTalk, MelBandRoformer VAE, UMT5-XXL text encoder, CLIP Vision, and Lightx2v LoRA) into the correct ComfyUI folder structure.
  • The workflow is packaged as a single JSON file. Drag it into ComfyUI, upload your portrait and audio, set the frame count based on audio duration, and run. Cloud GPU rental via vast.ai makes it accessible without expensive local hardware.

Key Features

  • Talking-head avatar video generation from a single portrait image and MP3 audio
  • Lip-sync powered by InfiniteTalk inside ComfyUI
  • WAN 2.1 (14B) image-to-video model with optional 720p upscaling
  • Automated model download script for one-command setup
  • Drag-and-drop ComfyUI workflow JSON for instant use
  • Free alternative to HeyGen and Kling. Runs on cloud GPUs via vast.ai

Tech Stack

ComfyUIWAN 2.1InfiniteTalkHugging FaceShell

Details

TypeAI Application
Stack Size5 technologies

Interested in working together?

I'm open to new opportunities and collaborations.

Get in Touch →