RFC-0042-aecf-multimodal-fusion.md #76

leochlon · 2025-06-13T01:27:40Z

Summary

We propose adding Adaptive Entropy-Gated Contrastive Fusion (AECF) to PyTorch as a core multimodal fusion layer that addresses a critical production problem: missing modalities in real-world deployments.

The Problem

Current multimodal models fail catastrophically when sensors break, data is incomplete, or modalities are unavailable at inference time. This is a major barrier to deploying multimodal AI in production environments.

The Solution

AECF uses entropy-driven curriculum learning to train models that are robust to missing modalities:

High attention entropy → Less masking → Easier learning
Low attention entropy → More masking → Robustness training

Key Results

+18 percentage points mAP improvement when modalities are missing
200% reduction in calibration error
Only 1% runtime overhead
Drop-in replacement for existing fusion layers

Implementation

Complete reference implementation with 5,337 lines of production-ready code, comprehensive tests, and MS-COCO benchmarks included in the RFC.

Why This Matters

Multimodal AI is rapidly expanding (vision-language models, robotics, autonomous vehicles), but robustness to missing modalities remains an unsolved problem. AECF provides a principled, efficient solution that PyTorch users need today.

Request: Please route to multimodal/vision experts for technical review.

… Multimodal Learning This RFC proposes adding AECF as a standard multimodal fusion layer in PyTorch. Key features: - Adaptive entropy-driven curriculum masking for robust multimodal learning - Drop-in replacement for existing fusion approaches - Built-in robustness to missing modalities at inference time - Superior calibration properties with 18pp mAP improvement - Minimal runtime overhead (<3%) The implementation includes: - torch.nn.CurriculumMasking for entropy-based adaptive masking - torch.nn.MultimodalAttentionPool for attention-based multimodal fusion - Factory functions and functional interfaces for ease of use Based on 'Robust Multimodal Learning via Entropy-Gated Contrastive Fusion' (Chlon et al., 2025) - https://arxiv.org/abs/2505.15417

This document provides: - High-level explanation of what AECF does and why it matters - Technical implementation details and architecture - Experimental results and validation - Integration plan for PyTorch core - Comprehensive test coverage overview Serves as supplementary material to the main RFC document.

This commit adds: - Complete working implementation (5,337 lines of Python code) - Comprehensive test suite (765 lines of unit tests) - Real-world MS-COCO benchmarking experiments - Performance validation showing +18pp mAP improvement - Production-ready features (gradient checkpointing, numerical stability) - Multiple fusion layer comparisons and architectures The reference implementation demonstrates: ✅ Drop-in compatibility with existing PyTorch code ✅ Superior performance under missing modality scenarios ✅ Robust numerical stability under all tested conditions ✅ <3% runtime overhead compared to standard attention ✅ Easy integration with vision-language, medical, and robotics models Reviewers can immediately test the implementation: cd reference-implementation/ pip install -r requirements.txt python -m pytest test_suite/ -v python -m aecf.coco_tests.test_organized This strengthens the RFC proposal by providing concrete evidence of AECF's benefits and demonstrating implementation feasibility.

Added complete submission guide with step-by-step instructions for submitting the AECF RFC to PyTorch. The RFC is now complete with: ✅ 20KB+ comprehensive RFC document following PyTorch template ✅ 5,337 lines of reference implementation code ✅ 765 lines of comprehensive unit tests ✅ Real-world MS-COCO benchmarking experiments ✅ Performance validation showing +18pp mAP improvement ✅ Production-ready optimizations and numerical stability ✅ Complete documentation and usage examples Ready for submission to pytorch/rfcs repository

facebook-github-bot · 2025-06-13T01:27:46Z

Hi @leochlon!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot · 2025-06-13T02:14:38Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

mikaylagawarecki · 2025-06-26T15:40:23Z

Hey @leochlon, thanks for the request! Note that we maintain a very high bar for inclusion of new modules within torch.nn, as each comes with a substantial maintenance cost on our end. In general, we will accept new modules if the underlying techniques have already achieved widespread adoption and there is a broad expectation that it PyTorch will provide such a module. It's also beneficial if there are performance reasons why the module should be provided by PyTorch itself rather than in a third-party repo.

From what I can tell, this is a new technique (https://arxiv.org/html/2505.15417v1) that needs time to establish user acceptance. I'll encourage you to maintain an implementation of this technique in a separate GitHub repo to make it available for users. We can leave this issue open to gauge user interest over time and revisit this in the future if the technique becomes ubiquitous. Please let us know if there is some technical reason why it is not possible to maintain this in a separate repo so we can evaluate the extension mechanisms we provide within PyTorch.

I'll also tag @NicolasHug here on torchvision just in case he has any thoughts

leonchlon added 5 commits June 13, 2025 01:56

Update RFC documents with latest changes

ecbe0c0

leochlon mentioned this pull request Jun 13, 2025

RFC: Add Adaptive Entropy-Gated Contrastive Fusion (AECF) for Robust Multimodal Attention Pooling pytorch/pytorch#155878

Open

facebook-github-bot added the cla signed label Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC-0042-aecf-multimodal-fusion.md #76

RFC-0042-aecf-multimodal-fusion.md #76

leochlon commented Jun 13, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

mikaylagawarecki commented Jun 26, 2025

Uh oh!

RFC-0042-aecf-multimodal-fusion.md #76

Are you sure you want to change the base?

RFC-0042-aecf-multimodal-fusion.md #76

Conversation

leochlon commented Jun 13, 2025

Summary

The Problem

The Solution

Key Results

Implementation

Why This Matters

Uh oh!

Uh oh!

facebook-github-bot commented Jun 13, 2025

Action Required

Process

Uh oh!

facebook-github-bot commented Jun 13, 2025

Uh oh!

mikaylagawarecki commented Jun 26, 2025

Uh oh!