Skip to content

add reranker_min_score #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 15, 2025
Merged

add reranker_min_score #355

merged 4 commits into from
Apr 15, 2025

Conversation

prasmussen15
Copy link
Collaborator

@prasmussen15 prasmussen15 commented Apr 15, 2025

Important

Add reranker_min_score to filter search results by minimum score across various search functions and configurations.

  • Behavior:
    • Introduces reranker_min_score in SearchConfig in search_config.py to set a minimum score threshold for reranking.
    • Updates rank() in openai_reranker_client.py to normalize log probabilities and apply a strict zip.
    • Modifies edge_search(), node_search(), and community_search() in search.py to use reranker_min_score.
  • Functions:
    • Updates rrf(), node_distance_reranker(), episode_mentions_reranker(), and maximal_marginal_relevance() in search_utils.py to accept min_score parameter.
  • Tests:
    • Adjusts test_graphiti_int.py to align with new search behavior.

This description was created by Ellipsis for a1faae3. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 5d9f289 in 1 minute and 52 seconds

More details
  • Looked at 342 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 drafted comments based on config settings.
1. graphiti_core/cross_encoder/openai_reranker_client.py:107
  • Draft comment:
    Clarify the rationale behind using np.exp(logprob) and the conditional branch on token existence. An inline comment explaining this transformation would improve readability.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
    The code is performing a mathematical transformation (exp of logprob) and then using it in a binary scoring system. This is non-obvious logic that future maintainers might need to understand. The transformation from log space to probability space is a key part of the ranking algorithm. The suggestion is specific and actionable.
    The code might already be clear enough to other ML engineers who work with log probabilities regularly. The docstring at the class level already mentions that log-probabilities are used for ranking.
    While ML engineers might understand the concept, the specific use of exp() for normalization and the binary scoring logic is complex enough to warrant an inline explanation for maintainability.
    Keep the comment as it requests documentation for non-obvious mathematical transformation that is central to the ranking logic.
2. graphiti_core/search/search.py:95
  • Draft comment:
    Ensure that the new 'reranker_min_score' parameter is consistently propagated. Consider adding validation to enforce non-negative values and document its usage.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 80%
    The comment has two parts:
  1. Ensuring consistent propagation - this is already done, so this part is not useful
  2. Adding validation and documentation - while this could be helpful, it's too vague and speculative. The parameter already has a sensible default and is used consistently. If there were specific validation needs, they should be specified.
    The comment does raise a valid point about documentation, which could be helpful for future maintainers. Maybe I'm being too harsh?
    While documentation is good, this comment is too vague to be actionable. It's also speculative about validation needs without evidence that validation is actually needed.
    Delete the comment. It's partly already addressed (propagation) and partly too vague/speculative (validation/documentation) to be actionable.
3. graphiti_core/search/search_utils.py:721
  • Draft comment:
    Document the behavior of the 'rrf' function with the new 'min_score' parameter. Explain that candidates with scores below min_score are filtered out.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
    The comment is requesting documentation for a new parameter that meaningfully changes the function's behavior by filtering results. The current docstring is very minimal and doesn't explain the parameters or return value. Having documentation for min_score would help users understand this filtering behavior. The comment is specific and actionable.
    The function may be simple enough that the parameter name min_score is self-documenting. The suggested docstring text is also quite minimal.
    While min_score may seem self-documenting, its exact effect (filtering out results below the threshold) is not obvious without documentation. Even minimal documentation is better than none for a public API function.
    Keep the comment as it requests important documentation for a new parameter that affects the function's behavior in a non-obvious way.
4. graphiti_core/search/search_utils.py:807
  • Draft comment:
    In maximal_marginal_relevance, the computation of max similarity has O(n²) complexity. Consider adding a comment about potential performance impact on large candidate sets.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. tests/test_graphiti_int.py:69
  • Draft comment:
    The test query has been changed to 'Who is the user?'. Ensure that this new phrasing aligns with expected real-world queries.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 80%
    None
6. graphiti_core/search/search_utils.py:821
  • Draft comment:
    Using a set() to deduplicate candidate UUIDs in maximal_marginal_relevance loses the ordering established by the ranking. Consider preserving order if the sorted rank is important.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 80%
    None
7. graphiti_core/search/search_utils.py:732
  • Draft comment:
    The modifications in the rrf function filtering by min_score change its behavior. Please verify that removing UUIDs with cumulative scores below min_score is intended for all use cases.
  • Reason this comment was not posted:
    Confidence changes required: 80% <= threshold 80%
    None
8. graphiti_core/search/search.py:379
  • Draft comment:
    Typographical error: In the cross_encoder branch of community_search, the dictionary comprehension incorrectly uses 'node' instead of 'community'. Please update the key to 'community' for consistency and clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.
9. tests/test_graphiti_int.py:38
  • Draft comment:
    Typographical inconsistency: The variables 'NEO4j_USER' and 'NEO4j_PASSWORD' use a lowercase 'j', which is inconsistent with 'NEO4J_URI'. Consider renaming them to 'NEO4J_USER' and 'NEO4J_PASSWORD' for consistency.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_iI6txikyu9NI7j9a


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 62ac5c4 in 1 minute and 53 seconds

More details
  • Looked at 37 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. graphiti_core/cross_encoder/openai_reranker_client.py:107
  • Draft comment:
    Add comment explaining why empty top_logprobs are skipped.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%
    The code is handling edge cases in API responses, and the empty check seems like non-trivial logic that future maintainers might question. Looking at lines 98-104, we can see that empty lists are explicitly created when logprobs are None, suggesting this is intentional error handling. The reasoning behind this behavior isn't documented anywhere in the code.
    The comment might be too minor since experienced developers could infer that this is standard error handling. Also, the code flow is relatively straightforward even without the comment.
    However, this is a critical part of the ranking logic, and documenting error handling behavior is important for maintainability. The explicit creation of empty lists and their subsequent filtering deserves explanation.
    Keep the comment. The request for documentation is valuable as it would explain important error handling behavior in a core ranking function.
2. graphiti_core/search/search_utils.py:775
  • Draft comment:
    Clarify handling of score==0 to avoid division by zero. Consider a comment explaining why scores[uuid]==0 is allowed.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 80%
    Looking at the code:
  1. scores[uuid]==0 is already handled explicitly and separately
  2. The code structure makes it clear this is intentional
  3. The comment suggests adding redundant explanation
  4. The code is actually quite readable as-is
  5. The special case for score==0 is logical since it represents the center node
    The code could be slightly clearer about why score==0 is a special case. A new developer might wonder why zero scores get special treatment.
    The context makes it clear - score==0 represents the center node itself, which should always be included. This is evident from the code structure where the center node is explicitly added back and given a score of 0.0.
    The comment should be deleted. The code is already clear and the special handling of score==0 is well-structured and logical given the context.
3. graphiti_core/cross_encoder/openai_reranker_client.py:107
  • Draft comment:
    Guard clause added to skip empty top_logprobs. Consider logging an unexpected empty response for future debugging.
  • Reason this comment was not posted:
    Confidence changes required: 66% <= threshold 80%
    None
4. graphiti_core/search/search_utils.py:232
  • Draft comment:
    Adjusted Cypher query indentation improves readability. Ensure consistency with other query formats.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 80%
    None
5. graphiti_core/search/search_utils.py:775
  • Draft comment:
    Updated node_distance_reranker now checks for score==0 to avoid division by zero. Verify that including nodes with zero score is intentional.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
6. graphiti_core/search/search_utils.py:129
  • Draft comment:
    There is a missing comma between 'c.name_embedding AS name_embedding' and 'c.created_at AS created_at' in the RETURN clause of the query in get_communities_by_nodes. This typographical error may cause the query to fail. Please insert a comma after 'c.name_embedding AS name_embedding'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. graphiti_core/search/search_utils.py:748
  • Draft comment:
    The Cypher query in node_distance_reranker contains 'MATCH p = SHORTEST 1 (center:Entity {uuid: $center_uuid})-[:RELATES_TO]-+(n:Entity {uuid: node_uuid})'. The phrase 'SHORTEST 1' appears to be a typographical error and is likely intended to be 'shortestPath' or a valid Cypher syntax. Please review and correct this segment of the query.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_hcIa390EV1pzw6OH


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on f85b768 in 2 minutes and 0 seconds

More details
  • Looked at 27 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. graphiti_core/search/search_utils.py:232
  • Draft comment:
    The change in indentation for the query string is cosmetic. Please ensure it’s consistent with the project’s style guidelines.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 80%
    None
2. graphiti_core/search/search_utils.py:773
  • Draft comment:
    Assigning a fixed score (0.1) for the center node and removing the extra condition may alter inclusion logic. Confirm that using (1/score) with this value is intentional, especially for edge cases when min_score is adjusted.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 40% <= threshold 80%
    The comment is asking the PR author to confirm their intention regarding a specific change in the code. It points out a potential issue with the logic and asks for confirmation, which is not allowed according to the rules. However, it does highlight a specific change and its potential impact, which could be useful for the author to consider. The comment does not directly suggest a code change or ask for a test to be written, which would make it more acceptable.
3. graphiti_core/search/search_utils.py:232
  • Draft comment:
    The updated MATCH clause indentation improves readability. Verify that the extra whitespace in the literal does not affect the query behavior.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 80%
    None
4. graphiti_core/search/search_utils.py:772
  • Draft comment:
    Removing the score==0 check in the return comprehension can risk a division by zero if a score of 0 occurs. The center node is adjusted to 0.1, but ensure no other node can have a score of 0—or use a safe division strategy.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. graphiti_core/search/search_utils.py:747
  • Draft comment:
    Typographical error: The query in the node_distance_reranker function contains 'SHORTEST 1' which may be a typo. Please verify if this should be changed to 'shortestPath' or the correct Cypher function for path finding.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. graphiti_core/search/search_utils.py:129
  • Draft comment:
    Typographical error: In the RETURN clause of get_communities_by_nodes, there is a missing comma after 'c.name_embedding AS name_embedding'. This could possibly cause issues in query parsing. Please add a comma after this expression.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_JaI1CaWOJbVzepCv


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on a1faae3 in 1 minute and 0 seconds

More details
  • Looked at 12 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. graphiti_core/cross_encoder/openai_reranker_client.py:22
  • Draft comment:
    The removal of the unused 'pydantic.BaseModel' import is appropriate. However, the PR title suggests adding 'reranker_min_score', which doesn't appear in the diff.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 80% <= threshold 80%
    The comment is pointing out a discrepancy between the PR title and the actual changes in the code. This could be useful for the author to ensure that the PR is complete and accurate.
2. graphiti_core/cross_encoder/openai_reranker_client.py:22
  • Draft comment:
    Remove the unused import from pydantic (BaseModel) as it is not being used in the file. This keeps the dependencies lean.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_jyuYyhKF7yhSXfws


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@prasmussen15 prasmussen15 merged commit 11e19a3 into main Apr 15, 2025
7 checks passed
@prasmussen15 prasmussen15 deleted the min-reranker-score branch April 15, 2025 16:33
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant