Skip to content

[Bug]: ValueError: cannot convert float NaN to integer when running global search with dynamic selection #1864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
lsukharn opened this issue Apr 4, 2025 · 1 comment
Labels
backlog We've confirmed some action is needed on this and will plan it bug Something isn't working

Comments

@lsukharn
Copy link

lsukharn commented Apr 4, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I ran this notebook on my data: https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
and got an error message after:

api_key = os.environ["GRAPHRAG_API_KEY"]
llm_model = os.environ["GRAPHRAG_LLM_MODEL"]
api_base = os.environ["API_BASE_TEST"]
deployment_name = os.environ["GRAPHRAG_LLM_MODEL_DEPLOYMENT_NAME"]

config = LanguageModelConfig(
    api_key=api_key,
    type=ModelType.AzureOpenAIChat,
    api_base=api_base,
    api_version='2025-01-01-preview',
    model=llm_model,
    deployment_name=deployment_name,
    max_retries=20,
)
model = ModelManager().get_or_create_chat_model(
    name="global_search",
    model_type=ModelType.AzureOpenAIChat,
    config=config,
)

token_encoder = tiktoken.encoding_for_model(llm_model)

OUTPUT_DIR = "./graphrag_project/output"
COMMUNITY_REPORT_TABLE = "community_reports"
ENTITY_TABLE = "entities"
COMMUNITY_TABLE = "communities"

# we don't fix a specific community level but instead use an agent to dynamicially
# search through all the community reports to check if they are relevant.
COMMUNITY_LEVEL = None

community_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_TABLE}.parquet")
entity_df = pd.read_parquet(f"{OUTPUT_DIR}/{ENTITY_TABLE}.parquet")
report_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")

communities = read_indexer_communities(community_df, report_df)
reports = read_indexer_reports(
    report_df,
    community_df,
    community_level=COMMUNITY_LEVEL,
    dynamic_community_selection=True,
)
entities = read_indexer_entities(
    entity_df, community_df, community_level=COMMUNITY_LEVEL
)

print(f"Total report count: {len(report_df)}")
print(
    f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
)

report_df.head()

File ~\Desktop\graphrag_repo.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities..(x)
158 # group entities by id and degree and remove duplicated community IDs
159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
160 nodes_df["community"] = nodes_df["community"].apply(
--> 161 lambda x: [str(int(i)) for i in x]
162 )
163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
164 subset=["id"]
165 )
166 # read entity dataframe to knowledge model objects

ValueError: cannot convert float NaN to integer

See full error in the logs section.

Steps to reproduce

No response

Expected Behavior

I should be able to use the "dynamic" part of global search. The script works when I specify the COMMUNITY_LEVEL=2, but it fails when it's None.

GraphRAG Config Used

# Paste your config here

Logs and screenshots

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[47], line 12
      5 communities = read_indexer_communities(community_df, report_df)
      6 reports = read_indexer_reports(
      7     report_df,
      8     community_df,
      9     community_level=COMMUNITY_LEVEL,
     10     dynamic_community_selection=True,
     11 )
---> 12 entities = read_indexer_entities(
     13     entity_df, community_df, community_level=COMMUNITY_LEVEL
     14 )
     16 print(f"Total report count: {len(report_df)}")
     17 print(
     18     f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
     19 )

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:160, in read_indexer_entities(final_entities, final_communities, community_level)
    158 # group entities by id and degree and remove duplicated community IDs
    159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
--> 160 nodes_df["community"] = nodes_df["community"].apply(
    161     lambda x: [str(int(i)) for i in x]
    162 )
    163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
    164     subset=["id"]
    165 )
    166 # read entity dataframe to knowledge model objects

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    918 if isinstance(arr, ExtensionArray):
    919     return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities.<locals>.<lambda>(x)
    158 # group entities by id and degree and remove duplicated community IDs
    159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
    160 nodes_df["community"] = nodes_df["community"].apply(
--> 161     lambda x: [str(int(i)) for i in x]
    162 )
    163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
    164     subset=["id"]
    165 )
    166 # read entity dataframe to knowledge model objects

ValueError: cannot convert float NaN to integer

Additional Information

  • GraphRAG Version: 2.1.0
  • Operating System: Windows 11
  • Python Version: 3.12
  • Related Issues:
@lsukharn lsukharn added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 4, 2025
@natoverse
Copy link
Collaborator

We'll double-check the case where None is sent. In the meantime, try a high number like 6 (most conmmunity hierarchies top out at 4 levels deep). Dynamic selection should still perform as usual and we won't try to grab all reports.

@natoverse natoverse added backlog We've confirmed some action is needed on this and will plan it and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog We've confirmed some action is needed on this and will plan it bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants