Machine Learning Research

Style Upgrade

Image-to-image translation, in which stylistic features from one image are imposed on the content of another to create a new picture, traditionally has been limited to translating either shapes or textures. A new network translates both.

Analytics DeepLearning.AI

21 Aug 2019 — 2 min read

What’s new: A team from Boeing’s South Korea lab created U-GAT-IT, a network that produces superior translations between images.

Key insights: Where earlier image-to-image translation networks work best with particular image styles, U-GAT-IT adds layers that make it useful across a variety of styles.

Such networks typically represent shapes and textures in hidden feature maps. U-GAT-IT adds a layer that weights the importance of each feature map based on each image’s style.
The researchers also introduce a layer that learns which normalization method works best.

How it works: U-GAT-IT uses a typical GAN architecture: A discriminator classifies images as either real or generated and a generator tries to fool the discriminator. It accepts two image inputs.

The generator takes the images and uses a CNN to extract feature maps that encode shapes and textures.
In earlier models, feature maps are passed directly to an attention layer that models the correspondence between pixels in each image. In U-GAT-IT, an intermediate weighting layer learns the importance of each feature map. The weights allow the system to distinguish the importance of different textures and shapes in each style.
The weighted feature maps are passed to the attention layer to assess pixel correspondences, and the generator produces an image from there.
The discriminator takes the first image as a real-world style example and the second as a candidate in the same style that’s either real or generated.
Like the generator, it encodes both images to feature maps via a CNN and uses a weighting layer to guide an attention layer.
The discriminator classifies the candidate image based on the attention layer’s output.

Results: Test subjects chose their favorite images from a selection of translations by U-GAT-IT and four earlier methods. The subjects preferred U-GAT-IT’s output by up to 73% in four out of five data sets.

Why it matters: Image-to-image translation is a hot topic with many practical applications. Professional image editors use it to boost image resolution and colorize black-and-white photos. Consumers enjoy the technology in apps like FaceApp.

We’re thinking: The best-performing deepfake networks lean heavily on image-translation techniques. A new generation that takes advantage of U-GAT-IT’s simultaneous shape-and-texture modeling may produce even more convincing fake pictures.

Line of futuristic humanoid robots, getting gradually bigger from left to right

Training power laws translate to robotics: Amazon builds forecasting model to predict multiple scenarios

Stability AI’s limited wins in Getty copyright suit. Kosmos’s new generalist scientific research agent. German Commons, a big open dataset for training AI models. Google’s experiments putting satellites with AI chips in space.

Robots extract colorful data streams from silo towers, highlighting data silos being broken.

Tear Down Data Silos!: Many software-as-a-service vendors aim to hold their customers' data in silos. Their customers would do well to open the silos so AI agents can use the data.

AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful.

OpenAI Reorgs For Profit, MiniMax-M2 Leads Open Coding, Universal Music Group Embraces AI, LLMs Go Private

The Batch AI News and Insights: AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful.

Chart illustrates exact and approximate memorization percentages in different Gemma models.

Masking Private Data in Training Sets: Google researchers released VaultGemma, an open-weights model redacting personal information

Large language models often memorize details in their training data, including private information that may appear only once, like a person’s name, address, or phone number. Researchers built the first open-weights language model that’s guaranteed not to remember such facts.

Read more

Training power laws translate to robotics: Amazon builds forecasting model to predict multiple scenarios

Tear Down Data Silos!: Many software-as-a-service vendors aim to hold their customers' data in silos. Their customers would do well to open the silos so AI agents can use the data.

OpenAI Reorgs For Profit, MiniMax-M2 Leads Open Coding, Universal Music Group Embraces AI, LLMs Go Private

Masking Private Data in Training Sets: Google researchers released VaultGemma, an open-weights model redacting personal information