Machine Learning Research - The Batch | DeepLearning.AI (Page 19)

Graph showing how training loss affects token prediction accuracy and hallucination elimination.

Machine Learning Research

Getting the Facts Right: A memory method that reduces hallucinations in LLMs

Large language models that remember more hallucinate less.

Game character climbing a ladder with visible controls (QWASD) and health bars.

Machine Learning Research

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

A new model improves on recent progress in generating interactive virtual worlds from still images.

o1 Family Benchmarks comparing pass rates across AIME, Codeforces, and GPQA.

Machine Learning Research

Higher Reasoning: OpenAI debuts o1 and pro mode for $200/month

OpenAI launched not only its highly anticipated o1 model but also an operating mode that enables the model to deliver higher performance — at a hefty price.

Table comparing HarmBench and AdvBench ASR performance across models and benchmarks.

Machine Learning Research

Breaking Jailbreaks: New E-DPO method strengthens defenses against jailbreak prompts

Jailbreak prompts can prod a large language model (LLM) to overstep built-in boundaries, leading it to do things like respond to queries it was trained to refuse to answer. Researchers devised a way to further boost the probability that LLMs will respond in ways that respect such limits.

Table comparing model performance on Mathvista, MMMU, ChartQA, DocVQA, and other tasks.

Machine Learning Research

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Mistral AI unveiled Pixtral Large, which rivals top models at processing combinations of text and images.

Grounding DINO animation depicting object detection with bounding boxes on images.

Hardware

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

An open source model is designed to perform sophisticated object detection on edge devices like phones, cars, medical equipment, and smart doorbells.

Bar charts comparing performance of AI models across six tasks.

Machine Learning Research

Reasoning Revealed: DeepSeek-R1, a transparent challenger to OpenAI o1

An up-and-coming Hangzhou AI lab unveiled a model that implements run-time reasoning similar to OpenAI o1 and delivers competitive performance. Unlike o1, it displays its reasoning steps.

Efficient Foundations animation showing layered AI model components.

Machine Learning Research

More-Efficient Training for Transformers: Researchers reduce transformer training costs by 20% with minimal performance loss

Researchers cut the processing required to train transformers by around 20 percent with only a slight degradation in performance.

Comparison of Minecraft terrain with and without player modifications.

Machine Learning Research

No Game Engine Required: AI creates an interactive Minecraft-like world in real time

A real-time video generator lets you explore an open-ended, interactive virtual world — a video game without a game engine.

Graph showing test loss decreases with more tokens and larger model sizes (103-109 parameters).

Machine Learning Research

Next-Gen Models Show Limited Gains: AI giants rethink model training strategy as scaling laws break down

Builders of large AI models have relied on the idea that bigger neural networks trained on more data and given more processing power would show steady improvements. Recent developments are challenging that idea.

OpenDevin animation illustrating open-source AI model collaboration.

Machine Learning Research

Free Agents: OpenHands launches as an open toolkit for advanced code generation and automation

An open source package inspired by the commercial agentic code generator Devin aims to automate computer programming and more.

Model performance comparison across English, Chinese, Math, and Code tasks, with Hunyuan-Large leading.

Machine Learning Research

Mixture of Experts Pulls Ahead: Hunyuan-Large outshines open competitors with high benchmark scores

A new open source large language model outperforms competitors, including the open-weights Llama 3.1 405B, on a variety of benchmarks.