Claude corners the market on office docs: A path to training AI models on copyrighted music
In today’s edition of Data Points, you’ll learn more about:
- Why Anthropic’s books copyright deal is postponed
- The U.S. government’s new AI youth safety investigation
- Seedream 4.0, Bytedance’s new image model
- Jupyter Agent 2, a set of tools for solving data science problems
But first:
Claude can now create and edit Microsoft Office files directly
Anthropic announced that Claude can generate and modify Excel spreadsheets, Word documents, PowerPoint presentations, and PDFs within Claude and its desktop application. Users can describe their requirements, upload data, and receive completed files rather than just text responses or in-app artifacts. The timing coincides with reports from The Information that Microsoft plans to integrate Anthropic’s AI models into Office 365 alongside OpenAI, after internal testing showed Claude Sonnet 4 outperforms OpenAI at visual design and spreadsheet automation tasks. This positions Claude as a comprehensive workplace assistant capable of handling document-based workflows end-to-end, potentially explaining Microsoft’s interest in incorporating Anthropic’s technology into its productivity suite. (Anthropic and Ars Technica)
Swedish music rights group launches AI licensing framework
STIM, representing 100,000 Swedish songwriters and composers, introduced a licensing system that allows AI companies to train on copyrighted music while paying royalties to creators. The framework includes mandatory technology to track AI-generated outputs, ensuring transparency and proper compensation for artists whose works are used in training data. CISAC estimates that AI could reduce music creators’ income by up to 24 percent by 2028, while generative AI outputs in music could reach $17 billion annually by the same year. The initiative addresses growing concerns about AI firms using copyrighted material without consent or compensation, offering a potential model for balancing technological innovation with creators’ rights. Stockholm-based startup Songfox is the first company to operate under the new license, enabling users to create legal AI-generated songs and covers. (Reuters)
Anthropic’s $1.5 billion copyright settlement still needs approval
A federal judge postponed approval of Anthropic’s proposed $1.5 billion copyright settlement with the Authors’ Guild, expressing concerns that the deal lacks crucial details and could exclude some authors. Judge William Alsup of the U.S. District Court for the Northern District of California called the agreement incomplete and demanded more information about how authors will be notified and compensated. The settlement would resolve claims that Anthropic downloaded millions of pirated books for AI training, potentially establishing a $3,000-per-book benchmark for similar cases against OpenAI, Meta, and other AI companies. The parties must submit additional information by September 15, including a final list of approximately 465,000 works covered by the settlement. (Bloomberg Law)
FTC investigates major AI companies over child safety
The U.S. Federal Trade Commission ordered seven companies — including Alphabet, Meta, OpenAI, and X — to provide information about how their AI chatbots affect children and teens, reflecting growing regulatory concern about AI’s impact on youth. The inquiry focuses on how companies test and monitor potential negative impacts, particularly since chatbots can simulate human relationships and emotions that may lead young users to form trusting bonds with the technology. The FTC seeks details on monetization practices, data handling, age restrictions, and compliance with child privacy laws. (Federal Trade Commission)
ByteDance challenges Google with new image editor Seedream 4.0
ByteDance launched Seedream 4.0, an AI image generation and editing tool that the company claims outperforms Google DeepMind’s Gemini 2.5 Flash Image (known as “Nano Banana”) on several benchmarks. The new model combines text-to-image generation with advanced editing capabilities in a single tool, featuring a new architecture that speeds up image inference by over 10 times compared to previous versions. ByteDance reports that Seedream 4.0 scored higher than Gemini 2.5 Flash Image on its internal MagicBench evaluation for prompt adherence, alignment, and aesthetics, though these results weren’t published in an official technical report. Seedream 4.0 costs $0.03 per image on Fal.ai compared to Gemini 2.5 Flash Image’s $0.039, and is available through ByteDance’s Jimeng and Doubao AI apps domestically and via Volcano Engine for corporate clients. (South China Morning Post)
Hugging Face trains small models to excel at data science tasks in Jupyter notebooks
Hugging Face developed Jupyter Agent, a system that enables AI models to execute code directly within Jupyter notebooks to solve data analysis problems. The team fine-tuned 4-billion parameter Qwen models on a custom dataset of 51,000 synthetic notebooks derived from Kaggle, achieving up to 75 percent accuracy on easy data science tasks, a 36 percent improvement over the base model. The approach combines simplified scaffolding with high-quality training data generated from educational notebooks, demonstrating that small models can perform competitively as data science agents when properly trained. The models, dataset, and training pipeline are freely available on Hugging Face Hub. (Hugging Face)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng reflected on Coursera’s annual conference in Las Vegas, highlighting the shift to skills-based education, the role of AI in learning, and the launch of new “skill tracks” to help learners build applied abilities.
“A lot of traditional education focuses on knowledge. After earning a degree, you know a lot! In contrast, a skills-based approach focuses on developing practical abilities and improving what you can do with what you know.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth:
- Meta and OpenAI are adding new rules to strengthen guardrails for teens’ chatbot use after recent criticism.
- Google has been ordered to share its search index with AI rivals, though it won’t be forced to sell Chrome or Android.
- In Texas, Alpha School is experimenting with a model where students spend two hours learning with AI versus six with a teacher.
- Researchers introduced ATLAS, a transformer-like architecture capable of processing input contexts as large as ten million tokens.