Add some 'shared' KGE utils #233

nshah171 · 2025-07-29T20:00:47Z

These utils are used in KGE experimentation and are likely to be generically useful for other graph applications (outside KGE).

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: YES

svij-sc

Still working through this, but approving to unblock

svij-sc · 2025-08-11T21:08:32Z

python/gigl/experimental/knowledge_graph_embedding/common/torchrec/batch.py

+from torchrec.streamable import Pipelineable
+
+
+class BatchBase(Pipelineable, abc.ABC):


Does this case class get used anywhere except DataclassBatch ?
If not, it might be easier for us to just maintain DataclassBatch which subclasses torchrec.datasets.utils.Batch

svij-sc · 2025-08-11T21:18:15Z

python/gigl/experimental/knowledge_graph_embedding/common/torchrec/large_embedding_lookup.py

+
+
+class LargeEmbeddingLookup(nn.Module):
+    def __init__(self, embeddings_config: List[torchrec.EmbeddingBagConfig]):


nit: should embeddings_config be called tables ?

svij-sc · 2025-08-11T21:19:32Z

python/gigl/experimental/knowledge_graph_embedding/common/torchrec/large_embedding_lookup.py

+        super().__init__()
+        self.ebc = torchrec.EmbeddingBagCollection(
+            tables=embeddings_config,
+            device=torch.device("meta"),


For my own knowledge, trying to understand, will this always be "meta" ?

svij-sc · 2025-08-12T00:50:58Z

python/gigl/experimental/knowledge_graph_embedding/common/graph_dataset.py

+)
+
+
+class GcsIterableDataset(torch.utils.data.IterableDataset):


I like these abstractions!
Albeit we probably dont need so many classes - it can be auto inferred based off file suffix, et al. We can revisit these though.

svij-sc · 2025-08-12T01:06:44Z

python/gigl/experimental/knowledge_graph_embedding/common/torchrec/utils.py

+        otherwise the model itself.
+    """
+
+    if torch.distributed.is_initialized():


im assuming this also only works on nccl backend?
i.e. both cuda needs to be available and nccl backend enabled?

svij-sc · 2025-08-12T01:09:46Z

python/gigl/experimental/knowledge_graph_embedding/common/torchrec/utils.py

+        # Build a sharding plan
+        logger.info("***** Wrapping in DistributedModelParallel *****")
+        logger.info(f"Model before wrapping: {model}")
+        model = DistributedModelParallel(


QQ: I see there are a lot more configurable params here.
Curious why we only chose to parameterize sharding_plan ?

svij-sc · 2025-08-12T01:19:19Z

python/gigl/experimental/knowledge_graph_embedding/common/torchrec/utils.py

+    return model
+
+
+def get_sharding_plan(


I am suprised DistributedModelParallel doesnt do this for us already?
Is there something here that I am not seeing that would be model/business logic specific?

svij-sc · 2025-08-12T01:42:58Z

python/gigl/experimental/knowledge_graph_embedding/common/dist_checkpoint.py

+        if self.optimizer and state_dict.get(self.OPTIMIZER_KEY):
+            self.optimizer.load_state_dict(state_dict[self.OPTIMIZER_KEY])
+
+    def to_state_dict(self) -> STATE_DICT_TYPE:


This function doesn't seem to follow the traditional Stateful protocol.
Curious where is it used?

svij-sc · 2025-08-12T01:48:52Z

python/gigl/experimental/knowledge_graph_embedding/common/dist_checkpoint.py

+    local_uri = (
+        checkpoint_id
+        if isinstance(checkpoint_id, LocalUri)
+        else LocalUri(tempfile.mkdtemp(prefix="checkpoint"))


Very much a nit.

The user of mkdtemp() is responsible for deleting the temporary directory and its contents when done with it.

One thing we can do is always write to tmpdir then copy to either local or gcs dir then delete the local dir.

nshah-sc added 2 commits July 29, 2025 19:59

add common utils

5c30b91

fix

a475af2

nshah171 added the experimental label Jul 30, 2025

svij-sc approved these changes Aug 12, 2025

View reviewed changes

svij-sc reviewed Aug 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add some 'shared' KGE utils #233

Add some 'shared' KGE utils #233

Uh oh!

nshah171 commented Jul 29, 2025

Uh oh!

svij-sc left a comment •

edited

Loading

Uh oh!

svij-sc Aug 11, 2025

Uh oh!

svij-sc Aug 11, 2025

Uh oh!

svij-sc Aug 11, 2025

Uh oh!

svij-sc Aug 12, 2025

Uh oh!

svij-sc Aug 12, 2025

Uh oh!

svij-sc Aug 12, 2025

Uh oh!

svij-sc Aug 12, 2025

Uh oh!

svij-sc Aug 12, 2025

Uh oh!

svij-sc Aug 12, 2025

Uh oh!

Uh oh!

		from torchrec.streamable import Pipelineable


		class BatchBase(Pipelineable, abc.ABC):



		class LargeEmbeddingLookup(nn.Module):
		def __init__(self, embeddings_config: List[torchrec.EmbeddingBagConfig]):

		)


		class GcsIterableDataset(torch.utils.data.IterableDataset):

Add some 'shared' KGE utils #233

Are you sure you want to change the base?

Add some 'shared' KGE utils #233

Uh oh!

Conversation

nshah171 commented Jul 29, 2025

Uh oh!

svij-sc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

svij-sc left a comment •

edited

Loading