Skip to content

Conversation

aaronsb
Copy link

@aaronsb aaronsb commented Jul 21, 2025

Summary

This PR implements pagination support for the Glean MCP server to address issues with large responses consuming excessive context window tokens and to enable efficient handling of large result sets.

Changes Made

Phase 1: Search Pagination

  • Added cursor-based pagination to company_search and people_profile_search tools
  • New parameters: pageSize (1-100, default 10) and cursor (for fetching next page)
  • Response enhancements: Include pagination metadata with hasMoreResults and next cursor
  • Backward compatible: All existing functionality preserved

Phase 2: Chat Response Chunking

  • Automatic chunking for large chat responses exceeding token limits (~15k tokens)
  • Intelligent splitting at natural boundaries (paragraphs → sentences → characters)
  • Continuation support: Use continueFrom parameter with responseId and chunkIndex
  • Conservative limits: 15k tokens max per chunk with 3 chars/token estimation

Environment Variable Improvements

  • New GLEAN_SERVER_INSTANCE environment variable for clearer configuration
  • Copy exact URL from Glean admin panel (e.g., https://company-be.glean.com/)
  • Dotenv support added for local development with import 'dotenv/config'
  • Backward compatibility: Existing GLEAN_INSTANCE and GLEAN_BASE_URL still supported
  • Added .env.example with comprehensive configuration options

Testing & Documentation

  • Comprehensive tests: New pagination test suite covering all scenarios
  • Updated documentation: Clear examples for both search pagination and chat chunking
  • Real-world testing: Verified with cprime instance data
  • Snapshot updates: Test schemas updated to reflect new parameters

Breaking Changes

None. All changes are backward compatible.

Usage Examples

Search with pagination:

{
  "query": "Docker projects",
  "pageSize": 5,
  "cursor": "eyJ...pagination_cursor..."
}

Chat continuation:

{
  "message": "",
  "continueFrom": {
    "responseId": "uuid-123",
    "chunkIndex": 1
  }
}

Environment configuration:

# New approach (recommended)
GLEAN_SERVER_INSTANCE=https://company-be.glean.com/
GLEAN_API_TOKEN=your_token

# Legacy approach (still supported)
GLEAN_INSTANCE=company
GLEAN_API_TOKEN=your_token

Testing

  • ✅ All existing tests pass
  • ✅ New pagination tests added and passing
  • ✅ Manual testing with real Glean instance
  • ✅ Verified chunking prevents token limit errors
  • ✅ Confirmed backward compatibility

aaronsb added 3 commits July 21, 2025 08:44
- Add cursor-based pagination to company_search and people_profile_search tools
- Implement automatic response chunking for chat tool to handle large responses
- Add ChatResponseBuffer class for intelligent text splitting at natural boundaries
- Update all tool schemas to use proper Glean API pagination (cursor, not pageToken)
- Add comprehensive pagination tests covering all scenarios
- Update documentation with pagination examples and best practices
- Fix search formatter test to match new response format

This addresses the issue of large responses consuming excessive context window
and enables efficient handling of large result sets from Glean APIs.
- Reduce MAX_TOKENS from 20000 to 15000 for more buffer
- Change CHARS_PER_TOKEN from 4 to 3 for more conservative estimation
- Prevents token limit errors on very large chat responses
- Add comprehensive .env.example with all configuration options
- Update README to reference the example file
- Includes both new GLEAN_SERVER_INSTANCE and legacy options
@aaronsb aaronsb requested a review from a team as a code owner July 21, 2025 14:24
Copy link
Member

@rwjblue-glean rwjblue-glean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I'm still digesting the overall set of changes, but I've left some initial thoughts/comments inline.

/**
* Manages chunking of large chat responses to avoid token limit errors.
*/
export class ChatResponseBuffer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3/5 (strong opinion: non-blocking)

Could you create some dedicated semi-unit tests for this class? Having a basic setup to iterate forward some of the logic WRT how the chat results are chunked would make it easier to fix bugs in this area in the future.

* @param response The response object with potential chunk metadata
* @returns Formatted response with chunk information if applicable
*/
export function formatChunkedResponse(response: any): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2/5 (minor preference, non-blocking)

Can we use the actual type for response here (instead of any)?

.env.example Outdated
Comment on lines 4 to 5
# Your Glean server instance URL (copy from your Glean admin panel)
GLEAN_SERVER_INSTANCE=https://your-company-be.glean.com/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3/5 (strong opinion: non-blocking)

Let's call this GLEAN_SERVER_URL instead of GLEAN_SERVER_INSTANCE. The main reason here is that the underlying API clients already allow passing a server_url param upon new Glean(...) constructor, and using the same verbiage throughout the stack seems better.

@aaronsb
Copy link
Author

aaronsb commented Jul 21, 2025

Thanks for the detailed feedback, @rwjblue-glean! I appreciate you taking the time to review this.

I'll address each of your points:

  1. ChatResponseBuffer tests (3/5): Agreed! I'll create dedicated unit tests for the chunking logic to make it easier to maintain and debug in the future.

  2. Proper typing (2/5): Good catch - I'll replace the any type with the proper ChatResponse type from the API client.

  3. GLEAN_SERVER_URL naming (3/5): Excellent point about consistency with the API client's server_url parameter. I'll rename GLEAN_SERVER_INSTANCE to GLEAN_SERVER_URL throughout.

I'll push these updates shortly and let you know when they're ready for another look.

@aaronsb
Copy link
Author

aaronsb commented Jul 21, 2025

Hey @rwjblue-glean, thanks for the thorough review! 🙏

Aaron here - working with Claude Code to address your feedback. Here's what we've implemented:

1. Dedicated Unit Tests for ChatResponseBuffer

Created comprehensive test coverage in chat-response-buffer.test.ts with 15 tests covering:

  • Token estimation and chunking thresholds
  • Splitting strategies (paragraph boundaries, sentence boundaries, force splits)
  • Chunk storage/retrieval with proper cleanup
  • Edge cases (empty strings, whitespace, boundary conditions)

Example test showing our chunking logic validation:

it('should prefer splitting at paragraph boundaries', async () => {
  const paragraph = 'This is a test paragraph with some content.\n\n';
  const largeText = paragraph.repeat(2000); // Creates text large enough to chunk
  
  const result = await buffer.processResponse(largeText);
  
  expect(result.metadata).toBeDefined();
  expect(result.content.length).toBeGreaterThan(0);
  expect(result.content.length).toBeLessThan(largeText.length);
});

2. Replaced any Types with Proper TypeScript Types

This was more involved than expected! We:

  • Imported proper types from @gleanwork/api-client (ChatResponse, ChatMessage, Author enum, etc.)
  • Created type-safe interfaces for chunked responses
  • Added type guards for proper type discrimination
  • Updated all test files to use enums instead of string literals

The typing now properly reflects the API structure:

interface ChunkedChatResponse extends ChatResponse {
  _formatted?: string;
  _chunkMetadata?: ChatChunkMetadata;
}

type FormattableResponse = ChunkedChatResponse | ChatChunk;

3. Renamed GLEAN_SERVER_INSTANCE to GLEAN_SERVER_URL

Updated across:

  • Config parsing logic
  • Documentation (README.md)
  • Environment variable examples (.env.example)
  • All references for consistency

This better aligns with the API naming conventions and makes the purpose clearer.


All tests are passing, and the implementation maintains backward compatibility while addressing the review points. The ChatResponseBuffer now has proper test coverage that should catch any regressions in the chunking logic.

Let me know if you'd like any clarification or have additional feedback!

aaronsb and others added 3 commits July 21, 2025 12:20
- Add comprehensive unit tests for ChatResponseBuffer class with 15 test cases
- Replace all 'any' types with proper TypeScript types from @gleanwork/api-client
- Rename GLEAN_SERVER_INSTANCE to GLEAN_SERVER_URL for API consistency
- Update all documentation and examples to use new env var name
- Add type guards for safe type discrimination in formatChunkedResponse
- Fix test files to use Author/MessageType enums instead of string literals

Co-Authored-By: Claude <[email protected]>
The test that creates 100k characters with no natural boundaries
was timing out in CI (5 second default). Increased to 10 seconds
to handle the heavy processing load.

Co-Authored-By: Claude <[email protected]>
Reduced test from 100k to 50k characters to test force split logic
without causing timeouts in slower CI environments. The test still
validates the force split behavior with 2 chunks.

Co-Authored-By: Claude <[email protected]>
@aaronsb
Copy link
Author

aaronsb commented Jul 21, 2025

All checks should be passing!

Quick note on the test fix: The force-split test was creating 100k characters which was causing timeouts on GitHub's CI runners (which are, let's be honest, running on potato computers 🥔).

I dialed it back to 50k characters - still enough to properly test the force split logic (since it exceeds the 45k chunk boundary) but much more reasonable for the limited compute resources in CI. The test went from timing out at 10+ seconds to completing in under a second.

Sometimes you gotta optimize for the potatoes! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants