Skip to content

Conversation

DarthMax
Copy link
Contributor

@DarthMax DarthMax commented Aug 27, 2025

ref GDSA-75

Copy link

netlify bot commented Aug 27, 2025

Deploy Preview for neo4j-graph-data-science-client canceled.

Name Link
🔨 Latest commit c24762a
🔍 Latest deploy log https://app.netlify.com/projects/neo4j-graph-data-science-client/deploys/68b0b7f323e2f300080f6b48

@DarthMax DarthMax mentioned this pull request Aug 28, 2025
@Mats-SX Mats-SX self-assigned this Aug 28, 2025
Copy link
Contributor

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna continue after lunch

Copy link
Contributor

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will continue later

start_nodes : list of int, optional
A list of node IDs to start the random walk from. If not provided, all
nodes are used as potential starting points.
restart_probability : float, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it float, optional here but Optional[List[str]], default=None below?
shouldn't it be Optional[float], default=0.44 (or whatever the default value is)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not actually specify the GDS defined default value here. None essentially means that the default value defined by GDS will be used. That was a decision I made in order to avoid differences between GDS and the API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include all default values, or no default values. it is strange to me that we include some default values, such as None in default=None but we don't for others, such as float, optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah the none is there, so that Python allows you to omit this parameter. It has nothing to do with documenting an actual GDS default. The default will be set on the GDS side

Copy link
Contributor

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will stop commenting on the documentation. I think we should copy the content as close as verbatim as we can from the GDS Manual. I started doing it but it is inefficient to do as review comments. I will push a commit instead.

Copy link
Contributor

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I immediately got stuck on the documentation deviation. Frankly, some of the doc content here was plain wrong. Most of it was correct, but reworded from the manual.

I pushed some commits to fix the doc strings according to my liking. Happy to receive your thoughts on it, too. The actual code looks good, except the odd mix of camelCase and snake_case in the config converter ? but it could be correct.

Comment on lines +49 to +51
graph_name=graph_name,
from_graph_name=G.name(),
config=config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't matter functionally here, but we could try to be consistent if we like

Suggested change
graph_name=graph_name,
from_graph_name=G.name(),
config=config,
graphName=graph_name,
fromGraphName=G.name(),
config=config,

node_label_stratification : bool, optional
If True, the algorithm tries to preserve the label distribution of the original graph in the sampled graph.
If true, preserves the node label distribution of the original graph.
Default is False.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like specifying the default values here for several reasons.
This is a lot of work to get right -> error prone
These values can change -> risking the chance of inconsistency
We do not have default values specified anywhere so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take your commits and push them to another branch.
So far we treated the docs as draft, knowing that they need a structured overhaul.
I like most of your suggetions but starting to change things here in the middle is not a good place imho, this needs to be a separate task.

Copy link
Contributor

@Mats-SX Mats-SX Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, but then maybe we should have no docs at all. having 50% correct docs doesn't help very much, since we need to come back later anyway to fix it.

@DarthMax DarthMax force-pushed the sampling_endpoints branch from 1c6b3d0 to e197129 Compare August 28, 2025 19:48
@DarthMax DarthMax force-pushed the sampling_endpoints branch from ee3e02f to c5e00b4 Compare August 28, 2025 20:05
@DarthMax DarthMax mentioned this pull request Aug 28, 2025
@DarthMax
Copy link
Contributor Author

DarthMax commented Aug 28, 2025

I moved your doc improvements to a new PR, let's discuss them there
#944

@DarthMax DarthMax merged commit 6902aea into main Aug 28, 2025
8 checks passed
@DarthMax DarthMax deleted the sampling_endpoints branch August 28, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants