add reranker_min_score #355

prasmussen15 · 2025-04-15T16:06:13Z

Important

Add reranker_min_score to filter search results by minimum score across various search functions and configurations.

Behavior:
- Introduces reranker_min_score in SearchConfig in search_config.py to set a minimum score threshold for reranking.
- Updates rank() in openai_reranker_client.py to normalize log probabilities and apply a strict zip.
- Modifies edge_search(), node_search(), and community_search() in search.py to use reranker_min_score.
Functions:
- Updates rrf(), node_distance_reranker(), episode_mentions_reranker(), and maximal_marginal_relevance() in search_utils.py to accept min_score parameter.
Tests:
- Adjusts test_graphiti_int.py to align with new search behavior.

^{This description was created by}^{for a1faae3. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to 5d9f289 in 1 minute and 52 seconds

More details

Looked at 342 lines of code in 5 files
Skipped 0 files when reviewing.
Skipped posting 9 drafted comments based on config settings.

1. graphiti_core/cross_encoder/openai_reranker_client.py:107

Draft comment:
Clarify the rationale behind using np.exp(logprob) and the conditional branch on token existence. An inline comment explaining this transformation would improve readability.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
The code is performing a mathematical transformation (exp of logprob) and then using it in a binary scoring system. This is non-obvious logic that future maintainers might need to understand. The transformation from log space to probability space is a key part of the ranking algorithm. The suggestion is specific and actionable.
The code might already be clear enough to other ML engineers who work with log probabilities regularly. The docstring at the class level already mentions that log-probabilities are used for ranking.
While ML engineers might understand the concept, the specific use of exp() for normalization and the binary scoring logic is complex enough to warrant an inline explanation for maintainability.
Keep the comment as it requests documentation for non-obvious mathematical transformation that is central to the ranking logic.

2. graphiti_core/search/search.py:95

Draft comment:
Ensure that the new 'reranker_min_score' parameter is consistently propagated. Consider adding validation to enforce non-negative values and document its usage.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 80%
The comment has two parts:

Ensuring consistent propagation - this is already done, so this part is not useful
Adding validation and documentation - while this could be helpful, it's too vague and speculative. The parameter already has a sensible default and is used consistently. If there were specific validation needs, they should be specified.
The comment does raise a valid point about documentation, which could be helpful for future maintainers. Maybe I'm being too harsh?
While documentation is good, this comment is too vague to be actionable. It's also speculative about validation needs without evidence that validation is actually needed.
Delete the comment. It's partly already addressed (propagation) and partly too vague/speculative (validation/documentation) to be actionable.

3. graphiti_core/search/search_utils.py:721

Draft comment:
Document the behavior of the 'rrf' function with the new 'min_score' parameter. Explain that candidates with scores below min_score are filtered out.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
The comment is requesting documentation for a new parameter that meaningfully changes the function's behavior by filtering results. The current docstring is very minimal and doesn't explain the parameters or return value. Having documentation for min_score would help users understand this filtering behavior. The comment is specific and actionable.
The function may be simple enough that the parameter name min_score is self-documenting. The suggested docstring text is also quite minimal.
While min_score may seem self-documenting, its exact effect (filtering out results below the threshold) is not obvious without documentation. Even minimal documentation is better than none for a public API function.
Keep the comment as it requests important documentation for a new parameter that affects the function's behavior in a non-obvious way.

4. graphiti_core/search/search_utils.py:807

Draft comment:
In maximal_marginal_relevance, the computation of max similarity has O(n²) complexity. Consider adding a comment about potential performance impact on large candidate sets.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. tests/test_graphiti_int.py:69

Draft comment:
The test query has been changed to 'Who is the user?'. Ensure that this new phrasing aligns with expected real-world queries.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80%
None

6. graphiti_core/search/search_utils.py:821

Draft comment:
Using a set() to deduplicate candidate UUIDs in maximal_marginal_relevance loses the ordering established by the ranking. Consider preserving order if the sorted rank is important.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80%
None

7. graphiti_core/search/search_utils.py:732

Draft comment:
The modifications in the rrf function filtering by min_score change its behavior. Please verify that removing UUIDs with cumulative scores below min_score is intended for all use cases.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80%
None

8. graphiti_core/search/search.py:379

Draft comment:
Typographical error: In the cross_encoder branch of community_search, the dictionary comprehension incorrectly uses 'node' instead of 'community'. Please update the key to 'community' for consistency and clarity.
Reason this comment was not posted:
Comment was on unchanged code.

9. tests/test_graphiti_int.py:38

Draft comment:
Typographical inconsistency: The variables 'NEO4j_USER' and 'NEO4j_PASSWORD' use a lowercase 'j', which is inconsistent with 'NEO4J_URI'. Consider renaming them to 'NEO4J_USER' and 'NEO4J_PASSWORD' for consistency.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_iI6txikyu9NI7j9a

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

graphiti_core/cross_encoder/openai_reranker_client.py

graphiti_core/search/search_utils.py

ellipsis-dev

👍 Looks good to me! Incremental review on 62ac5c4 in 1 minute and 53 seconds

More details

Looked at 37 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 7 drafted comments based on config settings.

1. graphiti_core/cross_encoder/openai_reranker_client.py:107

Draft comment:
Add comment explaining why empty top_logprobs are skipped.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
The code is handling edge cases in API responses, and the empty check seems like non-trivial logic that future maintainers might question. Looking at lines 98-104, we can see that empty lists are explicitly created when logprobs are None, suggesting this is intentional error handling. The reasoning behind this behavior isn't documented anywhere in the code.
The comment might be too minor since experienced developers could infer that this is standard error handling. Also, the code flow is relatively straightforward even without the comment.
However, this is a critical part of the ranking logic, and documenting error handling behavior is important for maintainability. The explicit creation of empty lists and their subsequent filtering deserves explanation.
Keep the comment. The request for documentation is valuable as it would explain important error handling behavior in a core ranking function.

2. graphiti_core/search/search_utils.py:775

Draft comment:
Clarify handling of score==0 to avoid division by zero. Consider a comment explaining why scores[uuid]==0 is allowed.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80%
Looking at the code:

scores[uuid]==0 is already handled explicitly and separately
The code structure makes it clear this is intentional
The comment suggests adding redundant explanation
The code is actually quite readable as-is
The special case for score==0 is logical since it represents the center node
The code could be slightly clearer about why score==0 is a special case. A new developer might wonder why zero scores get special treatment.
The context makes it clear - score==0 represents the center node itself, which should always be included. This is evident from the code structure where the center node is explicitly added back and given a score of 0.0.
The comment should be deleted. The code is already clear and the special handling of score==0 is well-structured and logical given the context.

3. graphiti_core/cross_encoder/openai_reranker_client.py:107

Draft comment:
Guard clause added to skip empty top_logprobs. Consider logging an unexpected empty response for future debugging.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 80%
None

4. graphiti_core/search/search_utils.py:232

Draft comment:
Adjusted Cypher query indentation improves readability. Ensure consistency with other query formats.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 80%
None

5. graphiti_core/search/search_utils.py:775

Draft comment:
Updated node_distance_reranker now checks for score==0 to avoid division by zero. Verify that including nodes with zero score is intentional.
Reason this comment was not posted:
Comment looked like it was already resolved.

6. graphiti_core/search/search_utils.py:129

Draft comment:
There is a missing comma between 'c.name_embedding AS name_embedding' and 'c.created_at AS created_at' in the RETURN clause of the query in get_communities_by_nodes. This typographical error may cause the query to fail. Please insert a comma after 'c.name_embedding AS name_embedding'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

7. graphiti_core/search/search_utils.py:748

Draft comment:
The Cypher query in node_distance_reranker contains 'MATCH p = SHORTEST 1 (center:Entity {uuid: $center_uuid})-[:RELATES_TO]-+(n:Entity {uuid: node_uuid})'. The phrase 'SHORTEST 1' appears to be a typographical error and is likely intended to be 'shortestPath' or a valid Cypher syntax. Please review and correct this segment of the query.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_hcIa390EV1pzw6OH

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on f85b768 in 2 minutes and 0 seconds

More details

Looked at 27 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 6 drafted comments based on config settings.

1. graphiti_core/search/search_utils.py:232

Draft comment:
The change in indentation for the query string is cosmetic. Please ensure it’s consistent with the project’s style guidelines.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 80%
None

2. graphiti_core/search/search_utils.py:773

Draft comment:
Assigning a fixed score (0.1) for the center node and removing the extra condition may alter inclusion logic. Confirm that using (1/score) with this value is intentional, especially for edge cases when min_score is adjusted.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 40% <= threshold 80%
The comment is asking the PR author to confirm their intention regarding a specific change in the code. It points out a potential issue with the logic and asks for confirmation, which is not allowed according to the rules. However, it does highlight a specific change and its potential impact, which could be useful for the author to consider. The comment does not directly suggest a code change or ask for a test to be written, which would make it more acceptable.

3. graphiti_core/search/search_utils.py:232

Draft comment:
The updated MATCH clause indentation improves readability. Verify that the extra whitespace in the literal does not affect the query behavior.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 80%
None

4. graphiti_core/search/search_utils.py:772

Draft comment:
Removing the score==0 check in the return comprehension can risk a division by zero if a score of 0 occurs. The center node is adjusted to 0.1, but ensure no other node can have a score of 0—or use a safe division strategy.
Reason this comment was not posted:
Comment looked like it was already resolved.

5. graphiti_core/search/search_utils.py:747

Draft comment:
Typographical error: The query in the node_distance_reranker function contains 'SHORTEST 1' which may be a typo. Please verify if this should be changed to 'shortestPath' or the correct Cypher function for path finding.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. graphiti_core/search/search_utils.py:129

Draft comment:
Typographical error: In the RETURN clause of get_communities_by_nodes, there is a missing comma after 'c.name_embedding AS name_embedding'. This could possibly cause issues in query parsing. Please add a comma after this expression.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_JaI1CaWOJbVzepCv

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on a1faae3 in 1 minute and 0 seconds

More details

Looked at 12 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. graphiti_core/cross_encoder/openai_reranker_client.py:22

Draft comment:
The removal of the unused 'pydantic.BaseModel' import is appropriate. However, the PR title suggests adding 'reranker_min_score', which doesn't appear in the diff.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 80% <= threshold 80%
The comment is pointing out a discrepancy between the PR title and the actual changes in the code. This could be useful for the author to ensure that the PR is complete and accurate.

2. graphiti_core/cross_encoder/openai_reranker_client.py:22

Draft comment:
Remove the unused import from pydantic (BaseModel) as it is not being used in the file. This keeps the dependencies lean.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_jyuYyhKF7yhSXfws

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

add reranker_min_score

5d9f289

prasmussen15 temporarily deployed to development April 15, 2025 16:06 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development April 15, 2025 16:06 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development April 15, 2025 16:06 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Apr 15, 2025

View reviewed changes

graphiti_core/cross_encoder/openai_reranker_client.py Show resolved Hide resolved

graphiti_core/search/search_utils.py Show resolved Hide resolved

update divide by 0 case

62ac5c4

prasmussen15 temporarily deployed to development April 15, 2025 16:11 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development April 15, 2025 16:11 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development April 15, 2025 16:11 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Apr 15, 2025

View reviewed changes

center node always gets a score of .1

f85b768

prasmussen15 temporarily deployed to development April 15, 2025 16:15 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development April 15, 2025 16:15 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development April 15, 2025 16:15 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Apr 15, 2025

View reviewed changes

linter

a1faae3

prasmussen15 temporarily deployed to development April 15, 2025 16:29 — with GitHub Actions Inactive

ellipsis-dev bot reviewed Apr 15, 2025

View reviewed changes

prasmussen15 merged commit 11e19a3 into main Apr 15, 2025
7 checks passed

prasmussen15 deleted the min-reranker-score branch April 15, 2025 16:33

github-actions bot locked and limited conversation to collaborators Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add reranker_min_score #355

add reranker_min_score #355

prasmussen15 commented Apr 15, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

add reranker_min_score #355

add reranker_min_score #355

Conversation

prasmussen15 commented Apr 15, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

prasmussen15 commented Apr 15, 2025 •

edited by ellipsis-dev bot

Loading