-
Notifications
You must be signed in to change notification settings - Fork 1k
replace operator agent with base of new agent #1014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
🦋 Changeset detectedLatest commit: ed42209 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a major architectural refactor of the agent system, replacing the entire "operator agent" implementation with a new AI SDK-based agent architecture. The changes include:
Core Architecture Changes:
- Removed
StagehandOperatorHandler
andtypes/operator.ts
entirely - Introduced
StagehandAgentHandler
as the new default agent implementation - Renamed the existing agent handler to
cuaAgentHandler
(Computer Use Agent) for provider-specific execution - Updated the main library exports to use the new handlers while maintaining API compatibility
New Tool System:
The PR introduces a complete tool ecosystem under lib/agent/tools/
with 11 standardized tools that wrap existing Stagehand functionality:
act.ts
- Web element interaction with observe-then-act patternariaTree.ts
- Accessibility tree extraction for page contextclose.ts
- Task completion signalingextract.ts
- Data extraction from pagesfillform.ts
- Optimized multi-field form fillinggoto.ts
- URL navigationnavback.ts
- Browser history navigationscreenshot.ts
- JPEG screenshot capture with compressionscroll.ts
- Page scrolling functionalitywait.ts
- Time-based delaysindex.ts
- Centralized tool factory function
Implementation Details:
- All tools use AI SDK's
tool()
function with Zod schema validation - The new
StagehandAgentHandler
leverages AI SDK'sgenerateText
with built-in tool calling - Added message processing utilities in
messageProcessing.ts
for context compression - Updated LLM client interface with
getLanguageModel()
getter for AI SDK integration - Fixed minor issues like grammar corrections in evaluation tasks
The refactor maintains backward compatibility through the same public API while completely overhauling the internal agent execution model from custom schema-based responses to standardized AI SDK tool calling patterns.
Confidence score: 2/5
- This PR introduces significant architectural changes that could destabilize the agent system due to the complete replacement of core functionality
- Score reflects the massive scope of changes, removal of entire systems, and potential integration issues with the new AI SDK dependency requirements
- Pay close attention to
lib/handlers/stagehandAgentHandler.ts
,lib/agent/tools/act.ts
,lib/agent/tools/fillform.ts
, andlib/handlers/cuaAgentHandler.ts
20 files reviewed, 12 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review (commit 80cb25b), not the entire PR.
The latest changes implement the final pieces of the agent architecture refactor, completing the replacement of the operator agent with a new dual-agent system. The key additions include:
-
Agent Tool Interface Standardization: The
createAgentTools
function now accepts an optionalAgentToolOptions
interface with anexecutionModel
parameter, providing a unified way to configure tool behavior across the agent system. -
Execution Model Support: Multiple tool files (
act.ts
,extract.ts
,fillform.ts
) now support an optionalexecutionModel
parameter that allows different models to be used for tool execution versus agent reasoning. When provided, this model is passed topage.observe()
andpage.extract()
operations. -
Type System Enhancement: The
AgentConfig
interface intypes/stagehand.ts
now includes an optionalexecutionModel
field with clear documentation about its format ("provider/model") and purpose for tool execution optimization. -
Agent Handler Architecture: Two new handler classes have been introduced:
StagehandAgentHandler
: A new AISDK-based agent handler that serves as the default agent implementation with comprehensive error handling and step trackingCuaAgentHandler
: A Computer Use Agent handler for advanced visual browser automation with providers like OpenAI and Anthropic
-
Main Library Integration: The
lib/index.ts
file has been updated to use class-based agent handlers instead of function-based ones, with the newStagehandAgentHandler
becoming the default while maintainingCuaAgentHandler
for advanced use cases.
This refactor enables more flexible model selection where users can specify different models for high-level reasoning versus tool execution, potentially optimizing for cost and performance by using faster models for routine operations while reserving powerful models for complex tasks.
Confidence score: 3/5
- This PR introduces significant architectural changes that require careful testing to ensure compatibility
- Score reflects the complexity of the agent system refactor and potential for integration issues
- Pay close attention to the dynamic schema evaluation in extract.ts and error handling patterns across tool files
Context used:
Rule - Use camelCase naming convention for TypeScript code and snake_case naming convention for Python code in documentation examples. (link)
Context - We enforce linting and prettier at the CI level, so no code style comments that aren't obvious. (link)
10 files reviewed, no comments
… into agent-revamp
Why
Replace operator agent with new agent handler
The operator agent was an older implementation that did not use tool calling and used a single model for both high-level reasoning and low-level action execution.
What Changed
Removed operator agent (
StagehandOperatorHandler
)Added new agent handler (
StagehandAgentHandler
)executionModel
option for dual-model architectureExecutionModel feature:
act()
andextract()
Test Plan