AV2 - graph.sample endpoints #942

DarthMax · 2025-08-27T21:13:59Z

ref GDSA-75

netlify · 2025-08-27T21:14:05Z

✅ Deploy Preview for neo4j-graph-data-science-client canceled.

Name	Link
🔨 Latest commit	`c24762a`
🔍 Latest deploy log	https://app.netlify.com/projects/neo4j-graph-data-science-client/deploys/68b0b7f323e2f300080f6b48

Mats-SX

gonna continue after lunch

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

Mats-SX

will continue later

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

Mats-SX · 2025-08-28T11:42:04Z

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

+        start_nodes : list of int, optional
+            A list of node IDs to start the random walk from. If not provided, all
+            nodes are used as potential starting points.
+        restart_probability : float, optional


why is it float, optional here but Optional[List[str]], default=None below?
shouldn't it be Optional[float], default=0.44 (or whatever the default value is)?

We do not actually specify the GDS defined default value here. None essentially means that the default value defined by GDS will be used. That was a decision I made in order to avoid differences between GDS and the API.

I think we should include all default values, or no default values. it is strange to me that we include some default values, such as None in default=None but we don't for others, such as float, optional.

Ah the none is there, so that Python allows you to omit this parameter. It has nothing to do with documenting an actual GDS default. The default will be set on the GDS side

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

Mats-SX

OK I will stop commenting on the documentation. I think we should copy the content as close as verbatim as we can from the GDS Manual. I started doing it but it is inefficient to do as review comments. I will push a commit instead.

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

Mats-SX

OK I immediately got stuck on the documentation deviation. Frankly, some of the doc content here was plain wrong. Most of it was correct, but reworded from the manual.

I pushed some commits to fix the doc strings according to my liking. Happy to receive your thoughts on it, too. The actual code looks good, except the odd mix of camelCase and snake_case in the config converter ? but it could be correct.

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

graphdatascience/procedure_surface/arrow/graph_sampling_arrow_endpoints.py

Mats-SX · 2025-08-28T15:29:31Z

graphdatascience/procedure_surface/cypher/graph_sampling_cypher_endpoints.py

+            graph_name=graph_name,
+            from_graph_name=G.name(),
+            config=config,


I guess it doesn't matter functionally here, but we could try to be consistent if we like

Suggested change

graph_name=graph_name,

from_graph_name=G.name(),

config=config,

graphName=graph_name,

fromGraphName=G.name(),

config=config,

...tascience/tests/integrationV2/procedure_surface/arrow/test_graph_sampling_arrow_endpoints.py

...science/tests/integrationV2/procedure_surface/cypher/test_graph_sampling_cypher_endpoints.py

DarthMax · 2025-08-28T19:44:45Z

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

        node_label_stratification : bool, optional
-            If True, the algorithm tries to preserve the label distribution of the original graph in the sampled graph.
+            If true, preserves the node label distribution of the original graph.
+            Default is False.


I do not like specifying the default values here for several reasons.
This is a lot of work to get right -> error prone
These values can change -> risking the chance of inconsistency
We do not have default values specified anywhere so far.

I will take your commits and push them to another branch.
So far we treated the docs as draft, knowing that they need a structured overhaul.
I like most of your suggetions but starting to change things here in the middle is not a good place imho, this needs to be a separate task.

okay, but then maybe we should have no docs at all. having 50% correct docs doesn't help very much, since we need to come back later anyway to fix it.

DarthMax · 2025-08-28T20:06:53Z

I moved your doc improvements to a new PR, let's discuss them there
#944

DarthMax added 5 commits August 27, 2025 22:21

Add GraphSamplingEndpoints

2a4f61c

Implement GraphSamplingArrowEndpoints

f051ff6

Use snake case names for sampling arguments

470bb31

Implement cypher sampling endpoints

d647c45

Expose sampling endpoints in catalog endpoints

3895d59

Fix tests and code style

e197129

DarthMax mentioned this pull request Aug 28, 2025

AV2 - Cypher Catalog #943

Merged

Mats-SX self-assigned this Aug 28, 2025

Mats-SX reviewed Aug 28, 2025

View reviewed changes

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py Show resolved Hide resolved

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py Show resolved Hide resolved

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py Show resolved Hide resolved

Mats-SX reviewed Aug 28, 2025

View reviewed changes

DarthMax commented Aug 28, 2025

View reviewed changes

DarthMax force-pushed the sampling_endpoints branch from 1c6b3d0 to e197129 Compare August 28, 2025 19:48

Minor cleanups

c5e00b4

DarthMax force-pushed the sampling_endpoints branch from ee3e02f to c5e00b4 Compare August 28, 2025 20:05

DarthMax mentioned this pull request Aug 28, 2025

Improve docstrings #944

Merged

Fix tests

c24762a

DarthMax merged commit 6902aea into main Aug 28, 2025
8 checks passed

DarthMax deleted the sampling_endpoints branch August 28, 2025 20:32

AV2 - graph.sample endpoints #942

AV2 - graph.sample endpoints #942

Uh oh!

Conversation

DarthMax commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for neo4j-graph-data-science-client canceled.

Uh oh!

Mats-SX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mats-SX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Mats-SX Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

DarthMax Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Mats-SX Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

DarthMax Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mats-SX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mats-SX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mats-SX Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarthMax Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

DarthMax Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Mats-SX Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarthMax commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

DarthMax commented Aug 27, 2025 •

edited

Loading

netlify bot commented Aug 27, 2025 •

edited

Loading

Mats-SX Sep 2, 2025 •

edited

Loading

DarthMax commented Aug 28, 2025 •

edited

Loading