Skip to content

Fix and Improve Online Search and Web page Read #1147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 31, 2025

Conversation

debanjum
Copy link
Member

@debanjum debanjum commented Mar 31, 2025

New

  • Support Firecrawl as a online search provider

Improve

  • Fallback to other enabled online search providers on failure
  • Speed up online search with Jina by excluding webpage content in search results

Fix

  • Fix Jina webpage reader. Improve it to include generated alt text to each image on webpage
  • Truncate online query to Serper if query exceeds max supported length

@debanjum debanjum added fix Fix something that isn't working as expected upgrade New feature or request coverage Add content type to search and index labels Mar 31, 2025
- Improve webpage read to include image alt text
- Improve Jina webpage search to not include each page content
- Use POST instead of GET for web search, webpage read with Jina
Previously query to serper with longer than max supported would throw
error instead of returning at least some results.

Truncating the onlien search query to serper to max supported length
mitigates that issue.
Make serper.dev higher priority than official google serp api because
it provides more detailed results with knowledge cards etc.
@debanjum debanjum force-pushed the fix-improve-online-search-webpage-read branch from b3b1018 to d62dd4e Compare March 31, 2025 11:40
@debanjum debanjum merged commit 1775606 into master Mar 31, 2025
10 checks passed
@debanjum debanjum deleted the fix-improve-online-search-webpage-read branch March 31, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coverage Add content type to search and index fix Fix something that isn't working as expected upgrade New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant