Stability AI and DeepFloyd have announced the research release of DeepFloyd IF, a cutting-edge text-to-image cascaded pixel diffusion model. The non-commercial, research-permissible license allows research labs to explore advanced text-to-image generation techniques. The model features deep text prompt understanding using T5-XXL-1.1 as a text encoder, with numerous text-image cross-attention layers for better prompt and image alignment. It excels in generating coherent text and images, offering high photorealism with an impressive zero-shot FID score of 6.66 on the COCO dataset. DeepFloyd IF also supports aspect ratio shifts and zero-shot image-to-image translations, allowing for modification of style, patterns, and details while maintaining the source image’s basic form. A fully open-source version of the model is planned for a future release.

Stability AI releases DeepFloyd