-
Notifications
You must be signed in to change notification settings - Fork 484
add reranker_min_score #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 5d9f289 in 1 minute and 52 seconds
More details
- Looked at
342
lines of code in5
files - Skipped
0
files when reviewing. - Skipped posting
9
drafted comments based on config settings.
1. graphiti_core/cross_encoder/openai_reranker_client.py:107
- Draft comment:
Clarify the rationale behind using np.exp(logprob) and the conditional branch on token existence. An inline comment explaining this transformation would improve readability. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
The code is performing a mathematical transformation (exp of logprob) and then using it in a binary scoring system. This is non-obvious logic that future maintainers might need to understand. The transformation from log space to probability space is a key part of the ranking algorithm. The suggestion is specific and actionable.
The code might already be clear enough to other ML engineers who work with log probabilities regularly. The docstring at the class level already mentions that log-probabilities are used for ranking.
While ML engineers might understand the concept, the specific use of exp() for normalization and the binary scoring logic is complex enough to warrant an inline explanation for maintainability.
Keep the comment as it requests documentation for non-obvious mathematical transformation that is central to the ranking logic.
2. graphiti_core/search/search.py:95
- Draft comment:
Ensure that the new 'reranker_min_score' parameter is consistently propagated. Consider adding validation to enforce non-negative values and document its usage. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 80%
The comment has two parts:
- Ensuring consistent propagation - this is already done, so this part is not useful
- Adding validation and documentation - while this could be helpful, it's too vague and speculative. The parameter already has a sensible default and is used consistently. If there were specific validation needs, they should be specified.
The comment does raise a valid point about documentation, which could be helpful for future maintainers. Maybe I'm being too harsh?
While documentation is good, this comment is too vague to be actionable. It's also speculative about validation needs without evidence that validation is actually needed.
Delete the comment. It's partly already addressed (propagation) and partly too vague/speculative (validation/documentation) to be actionable.
3. graphiti_core/search/search_utils.py:721
- Draft comment:
Document the behavior of the 'rrf' function with the new 'min_score' parameter. Explain that candidates with scores below min_score are filtered out. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
The comment is requesting documentation for a new parameter that meaningfully changes the function's behavior by filtering results. The current docstring is very minimal and doesn't explain the parameters or return value. Having documentation for min_score would help users understand this filtering behavior. The comment is specific and actionable.
The function may be simple enough that the parameter name min_score is self-documenting. The suggested docstring text is also quite minimal.
While min_score may seem self-documenting, its exact effect (filtering out results below the threshold) is not obvious without documentation. Even minimal documentation is better than none for a public API function.
Keep the comment as it requests important documentation for a new parameter that affects the function's behavior in a non-obvious way.
4. graphiti_core/search/search_utils.py:807
- Draft comment:
In maximal_marginal_relevance, the computation of max similarity has O(n²) complexity. Consider adding a comment about potential performance impact on large candidate sets. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. tests/test_graphiti_int.py:69
- Draft comment:
The test query has been changed to 'Who is the user?'. Ensure that this new phrasing aligns with expected real-world queries. - Reason this comment was not posted:
Confidence changes required:80%
<= threshold80%
None
6. graphiti_core/search/search_utils.py:821
- Draft comment:
Using a set() to deduplicate candidate UUIDs in maximal_marginal_relevance loses the ordering established by the ranking. Consider preserving order if the sorted rank is important. - Reason this comment was not posted:
Confidence changes required:80%
<= threshold80%
None
7. graphiti_core/search/search_utils.py:732
- Draft comment:
The modifications in the rrf function filtering by min_score change its behavior. Please verify that removing UUIDs with cumulative scores below min_score is intended for all use cases. - Reason this comment was not posted:
Confidence changes required:80%
<= threshold80%
None
8. graphiti_core/search/search.py:379
- Draft comment:
Typographical error: In the cross_encoder branch of community_search, the dictionary comprehension incorrectly uses 'node' instead of 'community'. Please update the key to 'community' for consistency and clarity. - Reason this comment was not posted:
Comment was on unchanged code.
9. tests/test_graphiti_int.py:38
- Draft comment:
Typographical inconsistency: The variables 'NEO4j_USER' and 'NEO4j_PASSWORD' use a lowercase 'j', which is inconsistent with 'NEO4J_URI'. Consider renaming them to 'NEO4J_USER' and 'NEO4J_PASSWORD' for consistency. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_iI6txikyu9NI7j9a
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 62ac5c4 in 1 minute and 53 seconds
More details
- Looked at
37
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
7
drafted comments based on config settings.
1. graphiti_core/cross_encoder/openai_reranker_client.py:107
- Draft comment:
Add comment explaining why empty top_logprobs are skipped. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
The code is handling edge cases in API responses, and the empty check seems like non-trivial logic that future maintainers might question. Looking at lines 98-104, we can see that empty lists are explicitly created when logprobs are None, suggesting this is intentional error handling. The reasoning behind this behavior isn't documented anywhere in the code.
The comment might be too minor since experienced developers could infer that this is standard error handling. Also, the code flow is relatively straightforward even without the comment.
However, this is a critical part of the ranking logic, and documenting error handling behavior is important for maintainability. The explicit creation of empty lists and their subsequent filtering deserves explanation.
Keep the comment. The request for documentation is valuable as it would explain important error handling behavior in a core ranking function.
2. graphiti_core/search/search_utils.py:775
- Draft comment:
Clarify handling of score==0 to avoid division by zero. Consider a comment explaining why scores[uuid]==0 is allowed. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80%
Looking at the code:
- scores[uuid]==0 is already handled explicitly and separately
- The code structure makes it clear this is intentional
- The comment suggests adding redundant explanation
- The code is actually quite readable as-is
- The special case for score==0 is logical since it represents the center node
The code could be slightly clearer about why score==0 is a special case. A new developer might wonder why zero scores get special treatment.
The context makes it clear - score==0 represents the center node itself, which should always be included. This is evident from the code structure where the center node is explicitly added back and given a score of 0.0.
The comment should be deleted. The code is already clear and the special handling of score==0 is well-structured and logical given the context.
3. graphiti_core/cross_encoder/openai_reranker_client.py:107
- Draft comment:
Guard clause added to skip empty top_logprobs. Consider logging an unexpected empty response for future debugging. - Reason this comment was not posted:
Confidence changes required:66%
<= threshold80%
None
4. graphiti_core/search/search_utils.py:232
- Draft comment:
Adjusted Cypher query indentation improves readability. Ensure consistency with other query formats. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold80%
None
5. graphiti_core/search/search_utils.py:775
- Draft comment:
Updated node_distance_reranker now checks for score==0 to avoid division by zero. Verify that including nodes with zero score is intentional. - Reason this comment was not posted:
Comment looked like it was already resolved.
6. graphiti_core/search/search_utils.py:129
- Draft comment:
There is a missing comma between 'c.name_embedding AS name_embedding' and 'c.created_at AS created_at' in the RETURN clause of the query in get_communities_by_nodes. This typographical error may cause the query to fail. Please insert a comma after 'c.name_embedding AS name_embedding'. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. graphiti_core/search/search_utils.py:748
- Draft comment:
The Cypher query in node_distance_reranker contains 'MATCH p = SHORTEST 1 (center:Entity {uuid: $center_uuid})-[:RELATES_TO]-+(n:Entity {uuid: node_uuid})'. The phrase 'SHORTEST 1' appears to be a typographical error and is likely intended to be 'shortestPath' or a valid Cypher syntax. Please review and correct this segment of the query. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_hcIa390EV1pzw6OH
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on f85b768 in 2 minutes and 0 seconds
More details
- Looked at
27
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. graphiti_core/search/search_utils.py:232
- Draft comment:
The change in indentation for the query string is cosmetic. Please ensure it’s consistent with the project’s style guidelines. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold80%
None
2. graphiti_core/search/search_utils.py:773
- Draft comment:
Assigning a fixed score (0.1) for the center node and removing the extra condition may alter inclusion logic. Confirm that using (1/score) with this value is intentional, especially for edge cases when min_score is adjusted. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =40%
<= threshold80%
The comment is asking the PR author to confirm their intention regarding a specific change in the code. It points out a potential issue with the logic and asks for confirmation, which is not allowed according to the rules. However, it does highlight a specific change and its potential impact, which could be useful for the author to consider. The comment does not directly suggest a code change or ask for a test to be written, which would make it more acceptable.
3. graphiti_core/search/search_utils.py:232
- Draft comment:
The updated MATCH clause indentation improves readability. Verify that the extra whitespace in the literal does not affect the query behavior. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold80%
None
4. graphiti_core/search/search_utils.py:772
- Draft comment:
Removing the score==0 check in the return comprehension can risk a division by zero if a score of 0 occurs. The center node is adjusted to 0.1, but ensure no other node can have a score of 0—or use a safe division strategy. - Reason this comment was not posted:
Comment looked like it was already resolved.
5. graphiti_core/search/search_utils.py:747
- Draft comment:
Typographical error: The query in the node_distance_reranker function contains 'SHORTEST 1' which may be a typo. Please verify if this should be changed to 'shortestPath' or the correct Cypher function for path finding. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. graphiti_core/search/search_utils.py:129
- Draft comment:
Typographical error: In the RETURN clause of get_communities_by_nodes, there is a missing comma after 'c.name_embedding AS name_embedding'. This could possibly cause issues in query parsing. Please add a comma after this expression. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_JaI1CaWOJbVzepCv
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on a1faae3 in 1 minute and 0 seconds
More details
- Looked at
12
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. graphiti_core/cross_encoder/openai_reranker_client.py:22
- Draft comment:
The removal of the unused 'pydantic.BaseModel' import is appropriate. However, the PR title suggests adding 'reranker_min_score', which doesn't appear in the diff. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =80%
<= threshold80%
The comment is pointing out a discrepancy between the PR title and the actual changes in the code. This could be useful for the author to ensure that the PR is complete and accurate.
2. graphiti_core/cross_encoder/openai_reranker_client.py:22
- Draft comment:
Remove the unused import from pydantic (BaseModel) as it is not being used in the file. This keeps the dependencies lean. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_jyuYyhKF7yhSXfws
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Add
reranker_min_score
to filter search results by minimum score across various search functions and configurations.reranker_min_score
inSearchConfig
insearch_config.py
to set a minimum score threshold for reranking.rank()
inopenai_reranker_client.py
to normalize log probabilities and apply a strict zip.edge_search()
,node_search()
, andcommunity_search()
insearch.py
to usereranker_min_score
.rrf()
,node_distance_reranker()
,episode_mentions_reranker()
, andmaximal_marginal_relevance()
insearch_utils.py
to acceptmin_score
parameter.test_graphiti_int.py
to align with new search behavior.This description was created by
for a1faae3. It will automatically update as commits are pushed.