
Machine Learning Research
Better Multimodal Performance With Open Weights: Qwen2.5-Omni 7B raises the bar for small multimodal models
Alibaba’s latest open-weights system raises the bar for multimodal tasks in a relatively small model.
Machine Learning Research
Alibaba’s latest open-weights system raises the bar for multimodal tasks in a relatively small model.
Machine Learning Research
Researchers updated the highly responsive Moshi voice-to-voice model to discuss visual input.
Machine Learning Research
Microsoft debuted its first official large language model that responds to spoken input.
Tech & Society
Amazon announced Alexa+, a major upgrade to its long-running voice assistant.
Machine Learning Research
Even cutting-edge, end-to-end, speech-to-speech systems like ChatGPT’s Advanced Voice Mode tend to get interrupted by interjections like “I see” and “uh-huh” that keep human conversations going. Researchers built an open alternative that’s designed to go with the flow of overlapping speech.
Business
Hate talking to customer service? An AI-powered tool may soon do it for you. Joshua Browder, chief executive of the consumer advocacy organization DoNotPay, demonstrated a system that autonomously navigates phone menus and converses...
Tech & Society
Amazon published a series of web pages designed to help people use AI responsibly. Amazon Web Services introduced so-called AI service cards that describe the uses and limitations of some models it serves.
Tech & Society
Most speech-to-speech translation systems use text as an intermediate mode. So how do you build an automated translator for a language that has no standard written form? A new approach trained neural networks to translate a primarily oral language.
Machine Learning Research
In spoken conversation, people naturally take turns amid interjections and other patterns that aren’t strictly verbal. A new approach generated natural-sounding audio dialogs without training on text transcriptions that mark when one party should stop speaking and the other should chime in.
Tech & Society
Even if we manage to stop robots from taking over the world, they may still have the last laugh. Researchers at Kyoto University developed a series of neural networks that enable a robot engaged in spoken conversation to chortle along with its human interlocutor.
Business
A startup that automatically translates video voice overs into different languages is ready for its big break. London-based Papercup offers a voice translation service that combines algorithmic translation and voice synthesis with human-in-the-loop quality control.
Tech & Society
Let’s get this out of the way: A brain is not a cluster of graphics processing units, and if it were, it would run software far more complex than the typical artificial neural network. Yet neural networks were inspired by the brain’s architecture.