Machine Learning Research
Interactive Voice-to-Voice With Vision: MoshiVis adds image understanding to voice-first conversations
Researchers updated the highly responsive Moshi voice-to-voice model to discuss visual input.
Machine Learning Research
Researchers updated the highly responsive Moshi voice-to-voice model to discuss visual input.
Machine Learning Research
Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.
Machine Learning Research
Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.
Machine Learning Research
Google updated its open-weights family of large language models to include versions that handle image and video inputs.
Science
Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.
Machine Learning Research
An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.
Machine Learning Research
Multilingual AI models often suffer uneven performance across languages, especially in multimodal tasks. A pair of lean models counters this trend with consistent understanding of text and images across major languages.
Tech & Society
Large language models built by developers in China may, in some applications, be less useful outside that country because they avoid topics its government deems politically sensitive. A developer fine-tuned DeepSeek-R1 to widen its scope without degrading its overall performance.
Machine Learning Research
Microsoft debuted its first official large language model that responds to spoken input.
Machine Learning Research
Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.
Machine Learning Research
Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
Machine Learning Research
OpenAI launched GPT-4.5, which may be its last non-reasoning model.