Skip to content

Conversation

igrigorik
Copy link
Member

Some browsers are now offering built-in agents that operate within user's authenticated browser context and can automate browser tasks: interacting with a page, navigating across pages, operating across tabs. These workflows can operate in a foreground or background tab. Because the agent is operating with user's context and can share the session with the user — back and forth hand-off workflows — they become ~indistinguishable from humans interacting with the content.

Proposal:Sec-Fetch-Agent as sibling to Sec-Fetch-User that agents can use to self-identify actions that are performed by an agent from user's context.

This provides transparency to site operators, and provides signals that a site can use to facilitate the agent (e.g. additional content, potentially exposing WebMCP tools, etc), and/or allows the site to signal to the agent that user review or input is required to proceed.

Example flow

  • Buyer navigates to product page - Sec-Fetch-User: ?1
    • Buyer asks agent to compare and build a cart
  • Agent takes over - Sec-Fetch-Agent: ?1 (while performing the task)
    • Agent navigates to checkout, may fill some details - Sec-Fetch-Agent: ?1
    • Agent requests user to review checkout
  • Buyer reviews and finalizes (taps "Pay") transaction — Sec-Fetch-User: ?1
    • Buyer is navigated to order confirmation page

Additional considerations

  • The header only indicates agent presence, not agent capabilities or identity
  • Headers only sent to potentially trustworthy URLs
  • Agent signal should be propagated to nested context
    • Agent-initiated iframe loads should inherit the agent flag
    • Subresources in agent-loaded iframe are also marked

Some browsers are now offering built-in agents that operate within user's authenticated browser context and can automate browser tasks: interacting with a page, navigating across pages, operating across tabs. These workflows can operate in a foreground or background tab. Because the agent is operating with user's context and can share the session with the user — back and forth hand-off workflows — they become ~indistinguishable from humans interacting with the content.

Proposal:`Sec-Fetch-Agent` as sibling to `Sec-Fetch-User` that agents can use to self-identify actions that are performed by an agent from user's context. 

This provides transparency to site operators, and provides signals that a site can use to facilitate the agent (e.g. additional content, potentially exposing WebMCP tools, etc), and/or allows the site to signal to the agent that user review or input is required to proceed.

### Example flow
- Buyer navigates to product page - Sec-Fetch-User: ?1
  - Buyer asks agent to compare and build a cart
- Agent takes over - Sec-Fetch-Agent: ?1 (while performing the task)
  - Agent navigates to checkout, may fill some details - Sec-Fetch-Agent: ?1
  - Agent requests user to review checkout
- Buyer reviews and finalizes (taps "Pay") transaction — Sec-Fetch-User: ?1
  - Buyer is navigated to order confirmation page

### Additional considerations

- The header only indicates agent presence, not agent capabilities or identity
- Headers only sent to potentially trustworthy URLs
- Agent signal should be propagated to nested context
  - Agent-initiated iframe loads should inherit the agent flag
  - Subresources in agent-loaded iframe are also marked
@krgovind
Copy link

Interesting idea, thanks for filing this, @igrigorik!

A couple of thoughts:

  • It may not be trivial to narrow down which HTTP requests are a result of the agent's action, since the common agent implementations that I'm familiar with appear to be using frameworks similar to Puppeteer, via UI actions. Would a property on a click event make more sense?
  • Since the proposal is to not have the signal tied to the agent's identity, there is a risk that it could be spoofed; so we need to be careful that developers don't confer trust from the signal, but only infer risk (as in the provided example).

@igrigorik
Copy link
Member Author

The primary use case I have in mind here is in browser agents, which we're seeing popup in various browsers — e.g. Comet, Dia, Aria, etc. In these cases the agent is directly orchestrating the tab and should be able to provide necessary signals for when it's initiating navigation on behalf of the user, plus subrequests that result from it. That said, yes, I guess CDP-based sessions could use the same mechanism to explicitly advertise automated workflows.

Re, click event: no, the important signal here is for the server/page that is being navigated to.

@mikewest
Copy link
Member

Hey Ilya!

Rather than discussing this in a PR, it might make more sense to pull it out to a separate explainer so we can really dig into the use cases and your expectations around them. I think there's a lot of complexity hidden in what might be meant in the PR's suggestion of a [=request/agent-initiated=] field that's less about the mechanics of updating this spec and more about what the behavior necessary for servers to make reasonable decisions would actually be.

As a small example, you point to subresource requests above. Should we match Sec-Fetch-User's behavior, which sends that header only for the navigation request itself? Should we shift behavior such that data requested during a page load triggered by an agent would be flagged? To what extent do developers want insight into the provenance of an action during same-page navigation/for single-page apps?

I'd suggest that fleshing out the motivation, the threat model, and the expected behavior would be a helpful first step. It might well be the case that that conversation is happening elsewhere (anti-fraud CG?), in which case I'd appreciate pointers. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants