-
Notifications
You must be signed in to change notification settings - Fork 54
AV2 - graph.sample endpoints #942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for neo4j-graph-data-science-client canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna continue after lunch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will continue later
start_nodes : list of int, optional | ||
A list of node IDs to start the random walk from. If not provided, all | ||
nodes are used as potential starting points. | ||
restart_probability : float, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it float, optional
here but Optional[List[str]], default=None
below?
shouldn't it be Optional[float], default=0.44
(or whatever the default value is)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not actually specify the GDS defined default value here. None essentially means that the default value defined by GDS will be used. That was a decision I made in order to avoid differences between GDS and the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include all default values, or no default values. it is strange to me that we include some default values, such as None
in default=None
but we don't for others, such as float, optional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the none is there, so that Python allows you to omit this parameter. It has nothing to do with documenting an actual GDS default. The default will be set on the GDS side
graphdatascience/procedure_surface/api/graph_sampling_endpoints.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will stop commenting on the documentation. I think we should copy the content as close as verbatim as we can from the GDS Manual. I started doing it but it is inefficient to do as review comments. I will push a commit instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I immediately got stuck on the documentation deviation. Frankly, some of the doc content here was plain wrong. Most of it was correct, but reworded from the manual.
I pushed some commits to fix the doc strings according to my liking. Happy to receive your thoughts on it, too. The actual code looks good, except the odd mix of camelCase and snake_case in the config converter ? but it could be correct.
graphdatascience/procedure_surface/api/graph_sampling_endpoints.py
Outdated
Show resolved
Hide resolved
graphdatascience/procedure_surface/arrow/graph_sampling_arrow_endpoints.py
Show resolved
Hide resolved
graph_name=graph_name, | ||
from_graph_name=G.name(), | ||
config=config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't matter functionally here, but we could try to be consistent if we like
graph_name=graph_name, | |
from_graph_name=G.name(), | |
config=config, | |
graphName=graph_name, | |
fromGraphName=G.name(), | |
config=config, |
...tascience/tests/integrationV2/procedure_surface/arrow/test_graph_sampling_arrow_endpoints.py
Show resolved
Hide resolved
...science/tests/integrationV2/procedure_surface/cypher/test_graph_sampling_cypher_endpoints.py
Show resolved
Hide resolved
node_label_stratification : bool, optional | ||
If True, the algorithm tries to preserve the label distribution of the original graph in the sampled graph. | ||
If true, preserves the node label distribution of the original graph. | ||
Default is False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like specifying the default values here for several reasons.
This is a lot of work to get right -> error prone
These values can change -> risking the chance of inconsistency
We do not have default values specified anywhere so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take your commits and push them to another branch.
So far we treated the docs as draft, knowing that they need a structured overhaul.
I like most of your suggetions but starting to change things here in the middle is not a good place imho, this needs to be a separate task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, but then maybe we should have no docs at all. having 50% correct docs doesn't help very much, since we need to come back later anyway to fix it.
1c6b3d0
to
e197129
Compare
ee3e02f
to
c5e00b4
Compare
I moved your doc improvements to a new PR, let's discuss them there |
ref GDSA-75