You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
I ran this notebook on my data: https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
and got an error message after:
api_key = os.environ["GRAPHRAG_API_KEY"]
llm_model = os.environ["GRAPHRAG_LLM_MODEL"]
api_base = os.environ["API_BASE_TEST"]
deployment_name = os.environ["GRAPHRAG_LLM_MODEL_DEPLOYMENT_NAME"]
config = LanguageModelConfig(
api_key=api_key,
type=ModelType.AzureOpenAIChat,
api_base=api_base,
api_version='2025-01-01-preview',
model=llm_model,
deployment_name=deployment_name,
max_retries=20,
)
model = ModelManager().get_or_create_chat_model(
name="global_search",
model_type=ModelType.AzureOpenAIChat,
config=config,
)
token_encoder = tiktoken.encoding_for_model(llm_model)
OUTPUT_DIR = "./graphrag_project/output"
COMMUNITY_REPORT_TABLE = "community_reports"
ENTITY_TABLE = "entities"
COMMUNITY_TABLE = "communities"
# we don't fix a specific community level but instead use an agent to dynamicially
# search through all the community reports to check if they are relevant.
COMMUNITY_LEVEL = None
community_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_TABLE}.parquet")
entity_df = pd.read_parquet(f"{OUTPUT_DIR}/{ENTITY_TABLE}.parquet")
report_df = pd.read_parquet(f"{OUTPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
communities = read_indexer_communities(community_df, report_df)
reports = read_indexer_reports(
report_df,
community_df,
community_level=COMMUNITY_LEVEL,
dynamic_community_selection=True,
)
entities = read_indexer_entities(
entity_df, community_df, community_level=COMMUNITY_LEVEL
)
print(f"Total report count: {len(report_df)}")
print(
f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
)
report_df.head()
File ~\Desktop\graphrag_repo.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities..(x)
158 # group entities by id and degree and remove duplicated community IDs
159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
160 nodes_df["community"] = nodes_df["community"].apply(
--> 161 lambda x: [str(int(i)) for i in x]
162 )
163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
164 subset=["id"]
165 )
166 # read entity dataframe to knowledge model objects
ValueError: cannot convert float NaN to integer
See full error in the logs section.
Steps to reproduce
No response
Expected Behavior
I should be able to use the "dynamic" part of global search. The script works when I specify the COMMUNITY_LEVEL=2, but it fails when it's None.
GraphRAG Config Used
# Paste your config here
Logs and screenshots
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[47], line 12
5 communities = read_indexer_communities(community_df, report_df)
6 reports = read_indexer_reports(
7 report_df,
8 community_df,
9 community_level=COMMUNITY_LEVEL,
10 dynamic_community_selection=True,
11 )
---> 12 entities = read_indexer_entities(
13 entity_df, community_df, community_level=COMMUNITY_LEVEL
14 )
16 print(f"Total report count: {len(report_df)}")
17 print(
18 f"Report count after filtering by community level {COMMUNITY_LEVEL}: {len(reports)}"
19 )
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:160, in read_indexer_entities(final_entities, final_communities, community_level)
158 # group entities by id and degree and remove duplicated community IDs
159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
--> 160 nodes_df["community"] = nodes_df["community"].apply(
161 lambda x: [str(int(i)) for i in x]
162 )
163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
164 subset=["id"]
165 )
166 # read entity dataframe to knowledge model objects
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
4789 def apply(
4790 self,
4791 func: AggFuncType,
(...) 4796 **kwargs,
4797 ) -> DataFrame | Series:
4798 """
4799 Invoke function on values of Series.
4800
(...) 4915 dtype: float64
4916 """
4917 return SeriesApply(
4918 self,
4919 func,
4920 convert_dtype=convert_dtype,
4921 by_row=by_row,
4922 args=args,
4923 kwargs=kwargs,
-> 4924 ).apply()
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
1424 return self.apply_compat()
1426 # self.func is Callable
-> 1427 return self.apply_standard()
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
1501 # row-wise access
1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
1503 # we need to give `na_action="ignore"` for categorical data.
1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
1505 # Categorical (GH51645).
1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
1508 mapper=curried, na_action=action, convert=self.convert_dtype
1509 )
1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
1512 # GH#43986 Need to do list(mapped) in order to get treated as nested
1513 # See also GH#25959 regarding EA support
1514 return obj._constructor_expanddim(list(mapped), index=obj.index)
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
918 if isinstance(arr, ExtensionArray):
919 return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
1741 values = arr.astype(object, copy=False)
1742 if na_action is None:
-> 1743 return lib.map_infer(values, mapper, convert=convert)
1744 else:
1745 return lib.map_infer_mask(
1746 values, mapper, mask=isna(values).view(np.uint8), convert=convert
1747 )
File lib.pyx:2972, in pandas._libs.lib.map_infer()
File ~\Desktop\graphrag_repo\.venv\Lib\site-packages\graphrag\query\indexer_adapters.py:161, in read_indexer_entities.<locals>.<lambda>(x)
158 # group entities by id and degree and remove duplicated community IDs
159 nodes_df = nodes_df.groupby(["id"]).agg({"community": set}).reset_index()
160 nodes_df["community"] = nodes_df["community"].apply(
--> 161 lambda x: [str(int(i)) for i in x]
162 )
163 final_df = nodes_df.merge(entities_df, on="id", how="inner").drop_duplicates(
164 subset=["id"]
165 )
166 # read entity dataframe to knowledge model objects
ValueError: cannot convert float NaN to integer
Additional Information
GraphRAG Version: 2.1.0
Operating System: Windows 11
Python Version: 3.12
Related Issues:
The text was updated successfully, but these errors were encountered:
lsukharn
added
bug
Something isn't working
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Apr 4, 2025
We'll double-check the case where None is sent. In the meantime, try a high number like 6 (most conmmunity hierarchies top out at 4 levels deep). Dynamic selection should still perform as usual and we won't try to grab all reports.
natoverse
added
backlog
We've confirmed some action is needed on this and will plan it
and removed
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Apr 8, 2025
Do you need to file an issue?
Describe the bug
I ran this notebook on my data:
https://github.com/microsoft/graphrag/blob/main/docs/examples_notebooks/global_search_with_dynamic_community_selection.ipynb
and got an error message after:
See full error in the logs section.
Steps to reproduce
No response
Expected Behavior
I should be able to use the "dynamic" part of global search. The script works when I specify the
COMMUNITY_LEVEL=2
, but it fails when it'sNone
.GraphRAG Config Used
# Paste your config here
Logs and screenshots
Additional Information
The text was updated successfully, but these errors were encountered: