[Bug]: `ValueError: cannot convert float NaN to integer` when running global search with dynamic selection #1864

lsukharn · 2025-04-04T20:09:05Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

I ran this notebook on my data: https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
and got an error message after:

api_key = os.environ["GRAPHRAG_API_KEY"]
llm_model = os.environ["GRAPHRAG_LLM_MODEL"]
api_base = os.environ["API_BASE_TEST"]
deployment_name = os.environ["GRAPHRAG_LLM_MODEL_DEPLOYMENT_NAME"]

config = LanguageModelConfig(
    api_key=api_key,
    type=ModelType.AzureOpenAIChat,
    api_base=api_base,
    api_version='2025-01-01-preview',
    model=llm_model,
    deployment_name=deployment_name,
    max_retries=20,
)
model = ModelManager().get_or_create_chat_model(
    name="global_search",
    model_type=ModelType.AzureOpenAIChat,
    config=config,
)

token_encoder = tiktoken.encoding_for_model(llm_model)

OUTPUT_DIR = "./graphrag_project/output"
COMMUNITY_REPORT_TABLE = "community_reports"
ENTITY_TABLE = "entities"
COMMUNITY_TABLE = "communities"

# we don't fix a specific community level but instead use an agent to dynamicially
# search through all the community reports to check if they are relevant.
COMMUNITY_LEVEL = None

community_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_TABLE}.parquet")
entity_df = pd.read_parquet(f"{OUTPUT_DIR}/{ENTITY_TABLE}.parquet")
report_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")

communities = read_indexer_communities(community_df, report_df)
reports = read_indexer_reports(
    report_df,
    community_df,
    community_level=COMMUNITY_LEVEL,
    dynamic_community_selection=True,
)
entities = read_indexer_entities(
    entity_df, community_df, community_level=COMMUNITY_LEVEL
)

print(f"Total report count: {len(report_df)}")
print(
    f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
)

report_df.head()

File ~\Desktop\graphrag_repo.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities..(x)
158 # group entities by id and degree and remove duplicated community IDs
159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
160 nodes_df["community"] = nodes_df["community"].apply(
--> 161 lambda x: [str(int(i)) for i in x]
162 )
163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
164 subset=["id"]
165 )
166 # read entity dataframe to knowledge model objects

ValueError: cannot convert float NaN to integer

See full error in the logs section.

Steps to reproduce

No response

Expected Behavior

I should be able to use the "dynamic" part of global search. The script works when I specify the COMMUNITY_LEVEL=2, but it fails when it's None.

GraphRAG Config Used

# Paste your config here

Logs and screenshots

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[47], line 12
      5 communities = read_indexer_communities(community_df, report_df)
      6 reports = read_indexer_reports(
      7     report_df,
      8     community_df,
      9     community_level=COMMUNITY_LEVEL,
     10     dynamic_community_selection=True,
     11 )
---> 12 entities = read_indexer_entities(
     13     entity_df, community_df, community_level=COMMUNITY_LEVEL
     14 )
     16 print(f"Total report count: {len(report_df)}")
     17 print(
     18     f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
     19 )

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:160, in read_indexer_entities(final_entities, final_communities, community_level)
    158 # group entities by id and degree and remove duplicated community IDs
    159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
--> 160 nodes_df["community"] = nodes_df["community"].apply(
    161     lambda x: [str(int(i)) for i in x]
    162 )
    163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
    164     subset=["id"]
    165 )
    166 # read entity dataframe to knowledge model objects

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    918 if isinstance(arr, ExtensionArray):
    919     return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities.<locals>.<lambda>(x)
    158 # group entities by id and degree and remove duplicated community IDs
    159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
    160 nodes_df["community"] = nodes_df["community"].apply(
--> 161     lambda x: [str(int(i)) for i in x]
    162 )
    163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
    164     subset=["id"]
    165 )
    166 # read entity dataframe to knowledge model objects

ValueError: cannot convert float NaN to integer

Additional Information

GraphRAG Version: 2.1.0
Operating System: Windows 11
Python Version: 3.12
Related Issues:

The text was updated successfully, but these errors were encountered:

natoverse · 2025-04-08T19:56:01Z

We'll double-check the case where None is sent. In the meantime, try a high number like 6 (most conmmunity hierarchies top out at 4 levels deep). Dynamic selection should still perform as usual and we won't try to grab all reports.

lsukharn added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 4, 2025

natoverse added backlog We've confirmed some action is needed on this and will plan it and removed triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `ValueError: cannot convert float NaN to integer` when running global search with dynamic selection #1864

[Bug]: `ValueError: cannot convert float NaN to integer` when running global search with dynamic selection #1864

lsukharn commented Apr 4, 2025 •

edited

Loading

natoverse commented Apr 8, 2025

[Bug]: ValueError: cannot convert float NaN to integer when running global search with dynamic selection #1864

[Bug]: ValueError: cannot convert float NaN to integer when running global search with dynamic selection #1864

Comments

lsukharn commented Apr 4, 2025 • edited Loading

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

natoverse commented Apr 8, 2025

[Bug]: `ValueError: cannot convert float NaN to integer` when running global search with dynamic selection #1864

[Bug]: `ValueError: cannot convert float NaN to integer` when running global search with dynamic selection #1864

lsukharn commented Apr 4, 2025 •

edited

Loading