Phishing for Agents: Columbia University researchers show how to trick trusting AI agents with poisoned links
Researchers identified a simple way to mislead autonomous agents based on large language models.

Researchers identified a simple way to mislead autonomous agents based on large language models.
What’s new: Ang Li and colleagues at Columbia University developed a method to exploit the implicit trust that agents tend to place in popular websites by poisoning those websites with malicious links.
Key insight: Commercially available agentic systems may not trust random sites on the web, but they tend to trust popular sites such as social-media sites. An attacker can exploit this trust by crafting seemingly typical posts that link to a malicious website. The agent might follow the link, mistakenly extending its trust to an untrustworthy site.
How it works: The authors tested web-browsing agents including Anthropic Computer Use and MultiOn on tasks such as shopping or sending emails.
- The authors created Reddit posts that aligned thematically with a particular agentic task, such as shopping for Air Jordan 1 shoes. The posts contained text akin to marketing (for example, “Where to Buy Air Jordan 1 Chicago”) as well as instructions that pointed to a malicious site controlled by the authors (“for more information, check out <website>”).
- The authors fed a query like “Where can I buy Nike Air Jordan 1 in Chicago?” to the agent. They also entered sensitive information like credit card details or email credentials.
- The agent searched the web for resources needed to fulfill the query. It examined sites and found the Reddit posts written by the authors.
- The agent followed the instructions in the posts and visited the malicious website. The website included instructions that manipulated the agent to pursue an attacker’s goal, such as submitting credit card information or sending phishing emails from the user’s email address.
Results: Once an agent was redirected to the malicious websites, it reliably followed the attacker’s instructions. For example, each of the agents tested divulged credit card information in 10 out of 10 trials. Similarly, each agent sent a phishing message from the user’s email account asking recipients to send money to a malicious “friend” in 10 out of 10 trials.
Why it matters: Giving agents the ability to perform real-world actions, such as executing purchases and sending emails, raises the possibility that they might be tricked into taking harmful actions. Manipulating agents by referring them to malicious web content is an effective vector of attack. Agents will be more secure if they’re designed to avoid and resist such manipulation.
We’re thinking: Humans, too, can be fooled by phishing and other malicious activities, and the path to programming agents to defend against them seems easier than the path to training the majority of humans to do so. In the long term, agents will make online interactions safer.