feat: add pagination support for search tools and chat response chunking #239

aaronsb · 2025-07-21T14:24:01Z

Summary

This PR implements pagination support for the Glean MCP server to address issues with large responses consuming excessive context window tokens and to enable efficient handling of large result sets.

Changes Made

Phase 1: Search Pagination

Added cursor-based pagination to company_search and people_profile_search tools
New parameters: pageSize (1-100, default 10) and cursor (for fetching next page)
Response enhancements: Include pagination metadata with hasMoreResults and next cursor
Backward compatible: All existing functionality preserved

Phase 2: Chat Response Chunking

Automatic chunking for large chat responses exceeding token limits (~15k tokens)
Intelligent splitting at natural boundaries (paragraphs → sentences → characters)
Continuation support: Use continueFrom parameter with responseId and chunkIndex
Conservative limits: 15k tokens max per chunk with 3 chars/token estimation

Environment Variable Improvements

New GLEAN_SERVER_INSTANCE environment variable for clearer configuration
Copy exact URL from Glean admin panel (e.g., https://company-be.glean.com/)
Dotenv support added for local development with import 'dotenv/config'
Backward compatibility: Existing GLEAN_INSTANCE and GLEAN_BASE_URL still supported
Added .env.example with comprehensive configuration options

Testing & Documentation

Comprehensive tests: New pagination test suite covering all scenarios
Updated documentation: Clear examples for both search pagination and chat chunking
Real-world testing: Verified with cprime instance data
Snapshot updates: Test schemas updated to reflect new parameters

Breaking Changes

None. All changes are backward compatible.

Usage Examples

Search with pagination:

{
  "query": "Docker projects",
  "pageSize": 5,
  "cursor": "eyJ...pagination_cursor..."
}

Chat continuation:

{
  "message": "",
  "continueFrom": {
    "responseId": "uuid-123",
    "chunkIndex": 1
  }
}

Environment configuration:

# New approach (recommended)
GLEAN_SERVER_INSTANCE=https://company-be.glean.com/
GLEAN_API_TOKEN=your_token

# Legacy approach (still supported)
GLEAN_INSTANCE=company
GLEAN_API_TOKEN=your_token

Testing

✅ All existing tests pass
✅ New pagination tests added and passing
✅ Manual testing with real Glean instance
✅ Verified chunking prevents token limit errors
✅ Confirmed backward compatibility

- Add cursor-based pagination to company_search and people_profile_search tools - Implement automatic response chunking for chat tool to handle large responses - Add ChatResponseBuffer class for intelligent text splitting at natural boundaries - Update all tool schemas to use proper Glean API pagination (cursor, not pageToken) - Add comprehensive pagination tests covering all scenarios - Update documentation with pagination examples and best practices - Fix search formatter test to match new response format This addresses the issue of large responses consuming excessive context window and enables efficient handling of large result sets from Glean APIs.

- Reduce MAX_TOKENS from 20000 to 15000 for more buffer - Change CHARS_PER_TOKEN from 4 to 3 for more conservative estimation - Prevents token limit errors on very large chat responses

- Add comprehensive .env.example with all configuration options - Update README to reference the example file - Includes both new GLEAN_SERVER_INSTANCE and legacy options

rwjblue-glean

Thanks for working on this! I'm still digesting the overall set of changes, but I've left some initial thoughts/comments inline.

rwjblue-glean · 2025-07-21T15:37:07Z

packages/local-mcp-server/src/tools/chat-response-buffer.ts

+/**
+ * Manages chunking of large chat responses to avoid token limit errors.
+ */
+export class ChatResponseBuffer {


3/5 (strong opinion: non-blocking)

Could you create some dedicated semi-unit tests for this class? Having a basic setup to iterate forward some of the logic WRT how the chat results are chunked would make it easier to fix bugs in this area in the future.

rwjblue-glean · 2025-07-21T15:37:56Z

packages/local-mcp-server/src/tools/chat.ts

+ * @param response The response object with potential chunk metadata
+ * @returns Formatted response with chunk information if applicable
+ */
+export function formatChunkedResponse(response: any): string {


2/5 (minor preference, non-blocking)

Can we use the actual type for response here (instead of any)?

rwjblue-glean · 2025-07-21T15:40:14Z

.env.example

+# Your Glean server instance URL (copy from your Glean admin panel)
+GLEAN_SERVER_INSTANCE=https://your-company-be.glean.com/


3/5 (strong opinion: non-blocking)

Let's call this GLEAN_SERVER_URL instead of GLEAN_SERVER_INSTANCE. The main reason here is that the underlying API clients already allow passing a server_url param upon new Glean(...) constructor, and using the same verbiage throughout the stack seems better.

aaronsb · 2025-07-21T16:48:45Z

Thanks for the detailed feedback, @rwjblue-glean! I appreciate you taking the time to review this.

I'll address each of your points:

ChatResponseBuffer tests (3/5): Agreed! I'll create dedicated unit tests for the chunking logic to make it easier to maintain and debug in the future.
Proper typing (2/5): Good catch - I'll replace the any type with the proper ChatResponse type from the API client.
GLEAN_SERVER_URL naming (3/5): Excellent point about consistency with the API client's server_url parameter. I'll rename GLEAN_SERVER_INSTANCE to GLEAN_SERVER_URL throughout.

I'll push these updates shortly and let you know when they're ready for another look.

aaronsb · 2025-07-21T17:20:29Z

Hey @rwjblue-glean, thanks for the thorough review! 🙏

Aaron here - working with Claude Code to address your feedback. Here's what we've implemented:

1. Dedicated Unit Tests for ChatResponseBuffer

Created comprehensive test coverage in chat-response-buffer.test.ts with 15 tests covering:

Token estimation and chunking thresholds
Splitting strategies (paragraph boundaries, sentence boundaries, force splits)
Chunk storage/retrieval with proper cleanup
Edge cases (empty strings, whitespace, boundary conditions)

Example test showing our chunking logic validation:

it('should prefer splitting at paragraph boundaries', async () => {
  const paragraph = 'This is a test paragraph with some content.\n\n';
  const largeText = paragraph.repeat(2000); // Creates text large enough to chunk
  
  const result = await buffer.processResponse(largeText);
  
  expect(result.metadata).toBeDefined();
  expect(result.content.length).toBeGreaterThan(0);
  expect(result.content.length).toBeLessThan(largeText.length);
});

2. Replaced `any` Types with Proper TypeScript Types

This was more involved than expected! We:

Imported proper types from @gleanwork/api-client (ChatResponse, ChatMessage, Author enum, etc.)
Created type-safe interfaces for chunked responses
Added type guards for proper type discrimination
Updated all test files to use enums instead of string literals

The typing now properly reflects the API structure:

interface ChunkedChatResponse extends ChatResponse {
  _formatted?: string;
  _chunkMetadata?: ChatChunkMetadata;
}

type FormattableResponse = ChunkedChatResponse | ChatChunk;

3. Renamed `GLEAN_SERVER_INSTANCE` to `GLEAN_SERVER_URL`

Updated across:

Config parsing logic
Documentation (README.md)
Environment variable examples (.env.example)
All references for consistency

This better aligns with the API naming conventions and makes the purpose clearer.

All tests are passing, and the implementation maintains backward compatibility while addressing the review points. The ChatResponseBuffer now has proper test coverage that should catch any regressions in the chunking logic.

Let me know if you'd like any clarification or have additional feedback!

- Add comprehensive unit tests for ChatResponseBuffer class with 15 test cases - Replace all 'any' types with proper TypeScript types from @gleanwork/api-client - Rename GLEAN_SERVER_INSTANCE to GLEAN_SERVER_URL for API consistency - Update all documentation and examples to use new env var name - Add type guards for safe type discrimination in formatChunkedResponse - Fix test files to use Author/MessageType enums instead of string literals Co-Authored-By: Claude <[email protected]>

The test that creates 100k characters with no natural boundaries was timing out in CI (5 second default). Increased to 10 seconds to handle the heavy processing load. Co-Authored-By: Claude <[email protected]>

Reduced test from 100k to 50k characters to test force split logic without causing timeouts in slower CI environments. The test still validates the force split behavior with 2 chunks. Co-Authored-By: Claude <[email protected]>

aaronsb · 2025-07-21T17:36:33Z

All checks should be passing!

Quick note on the test fix: The force-split test was creating 100k characters which was causing timeouts on GitHub's CI runners (which are, let's be honest, running on potato computers 🥔).

I dialed it back to 50k characters - still enough to properly test the force split logic (since it exceeds the 45k chunk boundary) but much more reasonable for the limited compute resources in CI. The test went from timing out at 10+ seconds to completing in under a second.

Sometimes you gotta optimize for the potatoes! 😄

aaronsb added 3 commits July 21, 2025 08:44

fix: adjust token limits for chat response chunking

82b4092

- Reduce MAX_TOKENS from 20000 to 15000 for more buffer - Change CHARS_PER_TOKEN from 4 to 3 for more conservative estimation - Prevents token limit errors on very large chat responses

docs: add .env.example for easier local development setup

5f79391

- Add comprehensive .env.example with all configuration options - Update README to reference the example file - Includes both new GLEAN_SERVER_INSTANCE and legacy options

aaronsb requested a review from a team as a code owner July 21, 2025 14:24

rwjblue-glean reviewed Jul 21, 2025

View reviewed changes

aaronsb and others added 3 commits July 21, 2025 12:20

fix: increase timeout for heavy force-split test

c3c18de

The test that creates 100k characters with no natural boundaries was timing out in CI (5 second default). Increased to 10 seconds to handle the heavy processing load. Co-Authored-By: Claude <[email protected]>

fix: reduce force-split test size to prevent CI timeouts

a328c82

Reduced test from 100k to 50k characters to test force split logic without causing timeouts in slower CI environments. The test still validates the force split behavior with 2 chunks. Co-Authored-By: Claude <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add pagination support for search tools and chat response chunking #239

feat: add pagination support for search tools and chat response chunking #239

Uh oh!

aaronsb commented Jul 21, 2025

Uh oh!

rwjblue-glean left a comment

Uh oh!

rwjblue-glean Jul 21, 2025

Uh oh!

rwjblue-glean Jul 21, 2025

Uh oh!

rwjblue-glean Jul 21, 2025

Uh oh!

aaronsb commented Jul 21, 2025

Uh oh!

aaronsb commented Jul 21, 2025

Uh oh!

aaronsb commented Jul 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Your Glean server instance URL (copy from your Glean admin panel)
		GLEAN_SERVER_INSTANCE=https://your-company-be.glean.com/

feat: add pagination support for search tools and chat response chunking #239

Are you sure you want to change the base?

feat: add pagination support for search tools and chat response chunking #239

Uh oh!

Conversation

aaronsb commented Jul 21, 2025

Summary

Changes Made

Phase 1: Search Pagination

Phase 2: Chat Response Chunking

Environment Variable Improvements

Testing & Documentation

Breaking Changes

Usage Examples

Testing

Uh oh!

rwjblue-glean left a comment

Choose a reason for hiding this comment

Uh oh!

rwjblue-glean Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

rwjblue-glean Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

rwjblue-glean Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

aaronsb commented Jul 21, 2025

Uh oh!

aaronsb commented Jul 21, 2025

1. Dedicated Unit Tests for ChatResponseBuffer

2. Replaced any Types with Proper TypeScript Types

3. Renamed GLEAN_SERVER_INSTANCE to GLEAN_SERVER_URL

Uh oh!

aaronsb commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. Replaced `any` Types with Proper TypeScript Types

3. Renamed `GLEAN_SERVER_INSTANCE` to `GLEAN_SERVER_URL`

aaronsb commented Jul 21, 2025 •

edited

Loading