Skip to content

Commit 122c599

Browse files
committed
Update docstring for CNARW
- Content copied as much as possible from GDS Manual - Add default values
1 parent c0ad179 commit 122c599

File tree

1 file changed

+32
-29
lines changed

1 file changed

+32
-29
lines changed

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

Lines changed: 32 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,7 @@ def rwr(
4141
Parameters
4242
----------
4343
G : GraphV2
44-
The input graph on which the Random Walk with Restart (RWR) will be
45-
performed.
44+
The input graph to be sampled.
4645
graph_name : str
4746
The name of the new graph that is stored in the graph catalog.
4847
start_nodes : list of int, optional
@@ -106,55 +105,59 @@ def cnarw(
106105
job_id: Optional[str] = None,
107106
) -> GraphWithSamplingResult:
108107
"""
109-
Computes a set of Random Walks with Restart (RWR) for the given graph and stores the result as a new graph in the catalog.
108+
Common Neighbour Aware Random Walk (CNARW) samples the graph by taking random walks from a set of start nodes
110109
111-
This method performs a random walk, beginning from a set of nodes (if provided),
112-
where at each step there is a probability to restart back at the original nodes.
113-
The result is turned into a new graph induced by the random walks and stored in the catalog.
110+
CNARW is a graph sampling technique that involves optimizing the selection of the next-hop node. It takes into
111+
account the number of common neighbours between the current node and the next-hop candidates. On each step of a
112+
random walk, there is a probability that the walk stops, and a new walk from one of the start nodes starts
113+
instead (i.e. the walk restarts). Each node visited on these walks will be part of the sampled subgraph. The
114+
resulting subgraph is stored as a new graph in the Graph Catalog.
114115
115116
Parameters
116117
----------
117118
G : GraphV2
118-
The input graph on which the Random Walk with Restart (RWR) will be
119-
performed.
119+
The input graph to be sampled.
120120
graph_name : str
121-
The name of the new graph in the catalog.
121+
The name of the new graph that is stored in the graph catalog.
122122
start_nodes : list of int, optional
123-
A list of node IDs to start the random walk from. If not provided, all
124-
nodes are used as potential starting points.
123+
IDs of the initial set of nodes in the original graph from which the sampling random walks will start.
124+
By default, a single node is chosen uniformly at random.
125125
restart_probability : float, optional
126-
The probability of restarting back to the original node at each step.
127-
Should be a value between 0 and 1. If not specified, a default value is used.
126+
The probability that a sampling random walk restarts from one of the start nodes.
127+
Default is 0.1.
128128
sampling_ratio : float, optional
129-
The ratio of nodes to sample during the computation. This value should
130-
be between 0 and 1. If not specified, no sampling is performed.
129+
The fraction of nodes in the original graph to be sampled.
130+
Default is 0.15.
131131
node_label_stratification : bool, optional
132-
If True, the algorithm tries to preserve the label distribution of the original graph in the sampled graph.
132+
If true, preserves the node label distribution of the original graph.
133+
Default is False.
133134
relationship_weight_property : str, optional
134-
The name of the property on relationships to use as weights during
135-
the random walk. If not specified, the relationships are treated as
136-
unweighted.
135+
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.
137136
relationship_types : list of str, optional
138-
The relationship types used to select relationships for this algorithm run.
137+
Filter the named graph using the given relationship types. Relationships with any of the given types will be
138+
included.
139139
node_labels : list of str, optional
140-
The node labels used to select nodes for this algorithm run.
140+
Filter the named graph using the given node labels. Nodes with any of the given labels will be included.
141141
sudo : bool, optional
142-
Override memory estimation limits. Use with caution as this can lead to
143-
memory issues if the estimation is significantly wrong.
142+
Bypass heap control. Use with caution.
143+
Default is False.
144144
log_progress : bool, optional
145-
If True, logs the progress of the computation.
145+
Turn `on/off` percentage logging while running procedure.
146+
Default is True.
146147
username : str, optional
147-
The username to attribute the procedure run to
148+
Use Administrator access to run an algorithm on a graph owned by another user.
149+
Default is None.
148150
concurrency : int, optional
149-
The number of concurrent threads used for the algorithm execution.
151+
The number of concurrent threads used for running the algorithm.
152+
Default is 4.
150153
job_id : str, optional
151-
An identifier for the job that can be used for monitoring and cancellation
154+
An ID that can be provided to more easily track the algorithm’s progress.
155+
By default, a random job id is generated.
152156
153157
Returns
154158
-------
155159
GraphSamplingResult
156-
Tuple of the graph object and the result of the Random Walk with Restart (RWR), including the sampled
157-
nodes and their scores.
160+
Tuple of the graph object and the result of the Common Neighbour Aware Random Walk (CNARW), including the dimensions of the sampled graph.
158161
"""
159162
pass
160163

0 commit comments

Comments
 (0)