Skip to content
Change the repository type filter

All

    Repositories list

    • [IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
      Python
      01110Updated Aug 4, 2025Aug 4, 2025
    • ScanDiff

      Public
      This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
      01010Updated Aug 4, 2025Aug 4, 2025
    • LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
      Python
      814120Updated Aug 2, 2025Aug 2, 2025
    • Official PyTorch implementation for "Zero-Shot Styled Text Image Generation, but Make It Autoregressive" (CVPR25)
      Python
      0510Updated Jul 31, 2025Jul 31, 2025
    • Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025
      Python
      01400Updated Jul 30, 2025Jul 30, 2025
    • pacscore

      Public
      [CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
      Python
      86240Updated Jul 29, 2025Jul 29, 2025
    • Repository for the CAIP2025 paper "Tracing Information Flow in LLaMA Vision: A Step Toward Multimodal Understanding"
      Python
      0200Updated Jul 15, 2025Jul 15, 2025
    • [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
      Python
      04120Updated Jul 14, 2025Jul 14, 2025
    • mammoth

      Public
      An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
      Python
      12371310Updated Jul 11, 2025Jul 11, 2025
    • [TPDL 2025] Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval
      0000Updated Jul 9, 2025Jul 9, 2025
    • MAD

      Public
      Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Attend-Diffuse operator (ECCV24)
      Python
      11300Updated Jul 9, 2025Jul 9, 2025
    • DICE

      Public
      [ICCV 2025] What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
      0500Updated Jul 8, 2025Jul 8, 2025
    • CoDE

      Public
      [ECCV'24] Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
      Python
      04200Updated Jul 2, 2025Jul 2, 2025
    • MaPeT

      Public
      Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
      Python
      11620Updated Jul 1, 2025Jul 1, 2025
    • Official implementation of "Augmenting and Mixing Transformers with Synthetic Data for Image Captioning"
      Python
      0000Updated Jun 22, 2025Jun 22, 2025
    • Python
      0500Updated Jun 13, 2025Jun 13, 2025
    • Python
      0100Updated Jun 10, 2025Jun 10, 2025
    • DitHub

      Public
      HTML
      0300Updated May 27, 2025May 27, 2025
    • FourBi

      Public
      Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)
      Python
      31300Updated May 15, 2025May 15, 2025
    • This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
      25500Updated May 9, 2025May 9, 2025
    • General Federated Continual Learning Framework
      Python
      2500Updated Apr 20, 2025Apr 20, 2025
    • COGT

      Public
      [ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
      Python
      0900Updated Apr 15, 2025Apr 15, 2025
    • HySAC

      Public
      Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
      Python
      02210Updated Apr 8, 2025Apr 8, 2025
    • ITSERR WP8 - Code for Latin embeddings semantic search
      Python
      0100Updated Apr 1, 2025Apr 1, 2025
    • ReT

      Public
      [CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
      Python
      01800Updated Mar 29, 2025Mar 29, 2025
    • Python
      1000Updated Mar 10, 2025Mar 10, 2025
    • HWD

      Public
      Python
      12500Updated Mar 7, 2025Mar 7, 2025
    • VATr

      Public
      Python
      78230Updated Mar 7, 2025Mar 7, 2025
    • cvcs2025

      Public
      0000Updated Feb 28, 2025Feb 28, 2025
    • LAM

      Public
      The Ludovico Antonio Muratori (LAM) dataset is the largest line-level HTR dataset to date and contains 25,823 lines from Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is in…
      0500Updated Feb 26, 2025Feb 26, 2025