Skip to content

Conversation

tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Aug 25, 2025

Why

Replace operator agent with new agent handler

The operator agent was an older implementation that did not use tool calling and used a single model for both high-level reasoning and low-level action execution.

What Changed

  • Removed operator agent (StagehandOperatorHandler)

  • Added new agent handler (StagehandAgentHandler)

    • Leverages AI SDK for proper tool call handling
    • New executionModel option for dual-model architecture
    • Better error handling and retry mechanisms
    • Structured tool system with Zod schema validation
  • ExecutionModel feature:

    • Use a powerful model (like claude 4 sonnet) for reasoning and planning
    • Use a faster model (like gemini 2.0 flash) for Stagehand operations like act() and extract()
    • Enables cost and performance optimization

Test Plan

  • Tested locally with various agent tasks
  • Verified backward compatibility
  • Tested dual-model execution with different model combinations
  • Installed package from branch, for additional local testing to catch any additional edge cases

Copy link

changeset-bot bot commented Aug 25, 2025

🦋 Changeset detected

Latest commit: ed42209

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a major architectural refactor of the agent system, replacing the entire "operator agent" implementation with a new AI SDK-based agent architecture. The changes include:

Core Architecture Changes:

  • Removed StagehandOperatorHandler and types/operator.ts entirely
  • Introduced StagehandAgentHandler as the new default agent implementation
  • Renamed the existing agent handler to cuaAgentHandler (Computer Use Agent) for provider-specific execution
  • Updated the main library exports to use the new handlers while maintaining API compatibility

New Tool System:
The PR introduces a complete tool ecosystem under lib/agent/tools/ with 11 standardized tools that wrap existing Stagehand functionality:

  • act.ts - Web element interaction with observe-then-act pattern
  • ariaTree.ts - Accessibility tree extraction for page context
  • close.ts - Task completion signaling
  • extract.ts - Data extraction from pages
  • fillform.ts - Optimized multi-field form filling
  • goto.ts - URL navigation
  • navback.ts - Browser history navigation
  • screenshot.ts - JPEG screenshot capture with compression
  • scroll.ts - Page scrolling functionality
  • wait.ts - Time-based delays
  • index.ts - Centralized tool factory function

Implementation Details:

  • All tools use AI SDK's tool() function with Zod schema validation
  • The new StagehandAgentHandler leverages AI SDK's generateText with built-in tool calling
  • Added message processing utilities in messageProcessing.ts for context compression
  • Updated LLM client interface with getLanguageModel() getter for AI SDK integration
  • Fixed minor issues like grammar corrections in evaluation tasks

The refactor maintains backward compatibility through the same public API while completely overhauling the internal agent execution model from custom schema-based responses to standardized AI SDK tool calling patterns.

Confidence score: 2/5

  • This PR introduces significant architectural changes that could destabilize the agent system due to the complete replacement of core functionality
  • Score reflects the massive scope of changes, removal of entire systems, and potential integration issues with the new AI SDK dependency requirements
  • Pay close attention to lib/handlers/stagehandAgentHandler.ts, lib/agent/tools/act.ts, lib/agent/tools/fillform.ts, and lib/handlers/cuaAgentHandler.ts

20 files reviewed, 12 comments

Edit Code Review Bot Settings | Greptile

@tkattkat tkattkat marked this pull request as draft August 25, 2025 22:04
@tkattkat tkattkat marked this pull request as ready for review August 25, 2025 23:05
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review (commit 80cb25b), not the entire PR.

The latest changes implement the final pieces of the agent architecture refactor, completing the replacement of the operator agent with a new dual-agent system. The key additions include:

  1. Agent Tool Interface Standardization: The createAgentTools function now accepts an optional AgentToolOptions interface with an executionModel parameter, providing a unified way to configure tool behavior across the agent system.

  2. Execution Model Support: Multiple tool files (act.ts, extract.ts, fillform.ts) now support an optional executionModel parameter that allows different models to be used for tool execution versus agent reasoning. When provided, this model is passed to page.observe() and page.extract() operations.

  3. Type System Enhancement: The AgentConfig interface in types/stagehand.ts now includes an optional executionModel field with clear documentation about its format ("provider/model") and purpose for tool execution optimization.

  4. Agent Handler Architecture: Two new handler classes have been introduced:

    • StagehandAgentHandler: A new AISDK-based agent handler that serves as the default agent implementation with comprehensive error handling and step tracking
    • CuaAgentHandler: A Computer Use Agent handler for advanced visual browser automation with providers like OpenAI and Anthropic
  5. Main Library Integration: The lib/index.ts file has been updated to use class-based agent handlers instead of function-based ones, with the new StagehandAgentHandler becoming the default while maintaining CuaAgentHandler for advanced use cases.

This refactor enables more flexible model selection where users can specify different models for high-level reasoning versus tool execution, potentially optimizing for cost and performance by using faster models for routine operations while reserving powerful models for complex tasks.

Confidence score: 3/5

  • This PR introduces significant architectural changes that require careful testing to ensure compatibility
  • Score reflects the complexity of the agent system refactor and potential for integration issues
  • Pay close attention to the dynamic schema evaluation in extract.ts and error handling patterns across tool files

Context used:

Rule - Use camelCase naming convention for TypeScript code and snake_case naming convention for Python code in documentation examples. (link)
Context - We enforce linting and prettier at the CI level, so no code style comments that aren't obvious. (link)

10 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants