Transformer - The Batch | DeepLearning.AI

Machine Learning Research

A Transformer Alternative Emerges: Mamba, a new approach that may outperform transformers

An architectural innovation improves upon transformers — up to 2 billion parameters, at least...

Hardware

Nvidia Revs AI Engine: All about Nvidia’s new Blackwell architecture and B200 GPU

Nvidia’s latest chip promises to boost AI’s speed and energy efficiency.

Tech & Society

Ancient Scrolls Recovered: Researchers decipher scrolls charred by Mount Vesuvius using AI

Three researchers decoded scrolls that had gone unread since they were turned into charcoal by the eruption of Mount Vesuvius in the year 79.

SingSong's process for manufacturing instrumental music to accompany input vocals.

Machine Learning Research

Sing a Tune, Generate an Accompaniment: SingSong, a tool that generates instrumental music for unaccompanied input vocals

A neural network makes music for unaccompanied vocal tracks. Chris Donahue, Antoine Caillon, Adam Roberts, and colleagues at Google proposed SingSong, a system that generates musical accompaniments for sung melodies. You can listen to its output here.

Tech & Society

Google’s Multimodal Challenger: All you need to know about Gemini, Google's new multimodal model

Google unveiled Gemini, its bid to catch up to, and perhaps surpass, OpenAI’s GPT-4. Google demonstrated the Gemini family of models that accept any combination of text (including code), images, video, and audio and output text and images. The demonstrations and metrics were impressive...

Animated diagram depicting the problem setup and proposed method

Machine Learning Research

Robot, Find My Keys: A machine learning model for robots to predict the location of objects in households

Researchers proposed a way for robots to find objects in households where things get moved around. Andrey Kurenkov and colleagues at Stanford University introduced Node Edge Predictor, a model that learned to predict where objects were located in houses.

Machine Learning Research

Taming Transformers: Researchers find new strategies to accelerate transformer architecture.

The transformer architecture is astonishingly powerful but notoriously slow. Researchers have developed numerous tweaks to accelerate it — enough to warrant a look at how these alternatives work, their strengths, and their weaknesses.

Machine Learning Research

Masked Pretraining for CNNs: ConvNeXt V2, the new model family that boosts ConvNet performance

Vision transformers have bested convolutional neural networks (CNNs) in a number of key vision tasks. Have CNNs hit their limit? New research suggests otherwise.

Machine Learning Research

Diffusion Transformed: A new class of diffusion models based on the transformer architecture

A tweak to diffusion models, which are responsible for most of the recent excitement about AI-generated images, enables them to produce more realistic output.

Business

Where Is Meta’s Generative Play?: Why Meta still lacks a flagship generative AI service

While Microsoft and Google scramble to supercharge their businesses with text generation, Meta has yet to launch a flagship generative AI service. Reporters went looking for reasons why.

Machine Learning Research

What the Brain Sees: How a text-to-image model generates images from brain scans

A pretrained text-to-image generator enabled researchers to see — roughly — what other people looked at based on brain scans. Yu Takagi and Shinji Nishimoto developed a method that uses Stable Diffusion to reconstruct images viewed by test subjects...

Tech & Society

Falcon Ascends: Falcon, the new open source commercial LLM, explained

A team in the United Arab Emirates, a seven-state federation on the Arabian Peninsula, built the latest top-performing open source large language model.