Generative Modeling - The Batch | DeepLearning.AI (Page 8)

AI-generated image of Joe Rogan interviewing Steve Jobs

Culture

All Synthetic, All the Time: Joe Rogan Meets Steve Jobs in an AI-Generated Podcast

For the debut episode of a new podcast series, Play.ht synthesized a 19-minute interview between the rock-star podcaster and late Apple CEO.

Example of a video produced from a story-like description

Machine Learning Research

Long-Form Videos from Text Stories: Google's Phenaki Generates Long-Form Video from Text

Only a week ago, researchers unveiled a system that generates a few seconds of video based on a text prompt. New work enables a text-to-video system to produce an entire visual narrative from several sentences of text.

Illustration of the Dialogue Transformer Language Model (DLM)

Machine Learning Research

The Sound of Conversation: AI Learns to Mimic Conversational Pauses and Interruptions

In spoken conversation, people naturally take turns amid interjections and other patterns that aren’t strictly verbal. A new approach generated natural-sounding audio dialogs without training on text transcriptions that mark when one party should stop speaking and the other should chime in.

Machine Learning Research

Text to Video Without Text-Video Training Data: Make-A-Video, an AI System from Meta, Generates Video from Text

Text-to-image generators like DALL·E 2, Midjourney, and Stable Diffusion are winning art contests and worrying artists. A new approach brings the magic of text-to-image generation to video.

Culture

Prompting DALL·E for Fun and Profit: A marketplace for phrases that produce art in DALL·E, Midjourney, and Stable Diffusion

An online marketplace enables people to buy text prompts designed to produce consistent output from the new generation of text-to-image generators.

Animated graphs showing how an ensemble of fine-tuned models can provide better performance.

Machine Learning Research

Ensemble Models Simplified: New Machine Learning Research Simplifies Ensembles

A CLIP model whose weights were the mean of an ensemble of fine-tuned models performed as well as the ensemble and better than its best-performing constituent.

Tech & Society

Text-to-Image Goes Viral: Inside Craiyon, Formerly Known as DALL·E Mini

A homebrew re-creation of OpenAI’s DALL·E model is the latest internet sensation. Craiyon has been generating around 50,000 user-prompted images daily, thanks to its ability to produce visual mashups like Darth Vader ice fishing and photorealistic Pokemon characters.

Business

Speaking Your Language: Startup Papercup Offers AI-Powered Voice Translation

A startup that automatically translates video voice overs into different languages is ready for its big break. London-based Papercup offers a voice translation service that combines algorithmic translation and voice synthesis with human-in-the-loop quality control.

Tech & Society

DALL·E 2’s Emergent Vocabulary: The text-to- image generator DALL·E 2 invents its own words and concepts

OpenAI’s text-to-image generator DALL·E 2 produces pictures with uncanny creativity on demand. Has it invented its own language as well? Ask DALL·E 2 to generate an image that includes text, and often its output will include seemingly random characters.

Variational Neural Cellular Automata (VNCA) overview

Machine Learning Research

Tech Imitates Life, Life Imitates Art: Image Generation Technique Works Pixel By Pixel

The computational systems known as cellular automata reproduce patterns of pixels by iteratively applying simple rules based loosely on the behavior of biological cells. New work extends their utility from reproducing images to generating new ones.

Didactic diagram of a hypothetical embedded-model architecture

Machine Learning Research

Image Generation + Probabilities: New Method Boosts Performance for Normalizing Flow

If you want to both synthesize data and find the probability of any given example — say, generate images of manufacturing defects to train a defect detector and identify the highest-probability defects — you may use the architecture known as a normalizing flow.

AI generated images with different descriptions

Machine Learning Research

More Realistic Pictures From Text: How the Glide Diffusion Model Generates Images from Text

OpenAI’s DALL·E got an upgrade that takes in text descriptions and produces images in styles from hand-drawn to photorealistic. The new version is a rewrite from the ground up. It uses the earlier CLIP zero-shot image classifier to represent text descriptions.