-
Notifications
You must be signed in to change notification settings - Fork 473
Feat/algolia migration #20302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/algolia migration #20302
Conversation
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
Files changed:
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
| **`check_ranking_parity.py`** | Production parity verification | ❌ Optional validation | | ||
| **`compare_to_prod_explain.py`** | Index comparison analysis | ❌ Optional analysis | | ||
| **`test_all_files.py`** | File processing validation | ❌ Dev only | | ||
| **`algolia_index_prod_match.py`** | Legacy production matcher | ❌ Reference only | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean going forward? Will this become obsolete when the new indexing version is released?
**Indexing Rules:** | ||
- ✅ Always include: `/releases/`, `/cockroachcloud/`, `/advisories/`, `/molt/` | ||
- ✅ Include stable version files: Files containing `v25.3` | ||
- ❌ Exclude old versions: `v24.x`, `v23.x`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would corrections to old versions of code be indexed if changes are detected?
## 🧠 Intelligent Bloat Removal | ||
|
||
### What Gets Removed | ||
- **85K+ Duplicate Records**: Content deduplication using MD5 hashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this process run every time? Will we need to run this after the first re-indexing?
2. **Force Override**: `ALGOLIA_FORCE_FULL=true` | ||
3. **Corrupted State**: Invalid state file | ||
4. **Stale State**: State file >7 days old | ||
5. **Content Changes**: Git commits affecting source files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? Affecting which source files?
|
||
## 📊 Performance Metrics | ||
|
||
### Size Optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these numbers represent?
- **Cause**: First run or state file was deleted | ||
- **Solution**: Normal - will do full indexing automatically | ||
|
||
**❌ "Git commits detected"** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what changes result in incremental indexing?
- ✅ Comprehensive test coverage (100% pass rate) | ||
- ✅ Performance optimization and bloat removal | ||
|
||
### Phase 2: Staging Deployment (Next) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this complete?
### 5. **Zero-Downtime Deployment** | ||
Incremental indexing allows continuous updates without search interruption. | ||
|
||
## 📞 Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List contacts or a channel for support.
@@ -6,7 +6,7 @@ algolia: | |||
- search.html | |||
- src/current/v23.1/** | |||
- v23.1/** | |||
index_name: cockroachcloud_docs | |||
index_name: stage_cockroach_docs | |||
search_api_key: 372a10456f4ed7042c531ff3a658771b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might consider making this an env var rather than including directly in the config in plain text.
] | ||
|
||
# Content that should ALWAYS be preserved (even if short) | ||
self.preserve_patterns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we've captured all SQL commands and keywords?
Algolia Search Migration: Jekyll to Python
Replaces the Jekyll Algolia gem with a custom Python indexing
system that provides intelligent content extraction,
incremental updates, and production-ready CI/CD integration.
Key Benefits
(15-20 min)
removal
support
decision logic
index
Performance Improvements
Intelligent Features
Smart Decision Logic
Automatically chooses full vs incremental indexing based on:
Intelligent Bloat Removal
download repetition
release notes
Dynamic Version Detection
Automatically reads from _config_cockroachdb.yml
versions:
stable: v25.3 # Detected and used for filtering
Files Changed
New Production Files
TeamCity
with bloat removal
Modified Files
detection
Removed Legacy Files
TeamCity Integration
Simple Deployment
Build Steps
Environment Variables
ALGOLIA_APP_ID=7RXZLDVR5F
ALGOLIA_ADMIN_API_KEY=
ALGOLIA_INDEX_ENVIRONMENT=staging|production
Zero-Configuration Operation
Comprehensive Testing
validation, parity testing
coverage, full field compatibility