AImageLab

All

94 repositories

awesome-captioning-evaluation
Public
[IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Python
•0•11•1•0•Updated Aug 4, 2025Aug 4, 2025
ScanDiff
Public
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV 2025
0•10•1•0•Updated Aug 4, 2025Aug 4, 2025
LLaVA-MORE
Public
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
vision-and-language llms llava siglip multimodal-llms llama3 llava-llama3 llama3-vision gemma-2 llama3-1
Python
•
Apache License 2.0
•8•141•2•0•Updated Aug 2, 2025Aug 2, 2025
Emuru-autoregressive-text-img
Public
Official PyTorch implementation for "Zero-Shot Styled Text Image Generation, but Make It Autoregressive" (CVPR25)
Python
•
MIT License
•0•5•1•0•Updated Jul 31, 2025Jul 31, 2025
TransFusion
Public
Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025
machine-learning deep-learning pytorch
Python
•0•14•0•0•Updated Jul 30, 2025Jul 30, 2025
pacscore
Public
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
computer-vision cvpr captioning-images captioning captioning-videos vision-and-language cvpr2023
Python
•8•62•4•0•Updated Jul 29, 2025Jul 29, 2025
MLLMs-FlowTracker
Public
Repository for the CAIP2025 paper "Tracing Information Flow in LLaMA Vision: A Step Toward Multimodal Understanding"
Python
•0•2•0•0•Updated Jul 15, 2025Jul 15, 2025
ReflectiVA
Public
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
vqa knowledge-base vlm multimodal mllm
Python
•
Apache License 2.0
•0•41•2•0•Updated Jul 14, 2025Jul 14, 2025
mammoth
Public
An Extendible (General) Continual Learning Framework based on Pytorch - official codebase of Dark Experience for General Continual Learning
deep-learning knowledge-distillation neurips2020 dark-experience-replay pytorch der continual-learning experience-replay
Python
•
MIT License
•123•713•1•0•Updated Jul 11, 2025Jul 11, 2025
biblical-retrieval-synthesis
Public
[TPDL 2025] Generating Synthetic Data with Large Language Models for Low-Resource Sentence Retrieval
retrieval language-model synthetic-data
0•0•0•0•Updated Jul 9, 2025Jul 9, 2025
MAD
Public
Official PyTorch implementation for "Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas", presenting the Merge-Attend-Diffuse operator (ECCV24)
Python
•1•13•0•0•Updated Jul 9, 2025Jul 9, 2025
DICE
Public
[ICCV 2025] What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
0•5•0•0•Updated Jul 8, 2025Jul 8, 2025
CoDE
Public
[ECCV'24] Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
deepfake-detection global-local
Python
•
MIT License
•0•42•0•0•Updated Jul 2, 2025Jul 2, 2025
MaPeT
Public
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Python
•1•16•2•0•Updated Jul 1, 2025Jul 1, 2025
synthcap_pp
Public
Official implementation of "Augmenting and Mixing Transformers with Synthetic Data for Image Captioning"
Python
•
Apache License 2.0
•0•0•0•0•Updated Jun 22, 2025Jun 22, 2025
mammoth-lite
Public
Python
•0•5•0•0•Updated Jun 13, 2025Jun 13, 2025
Sanctuaria-Gaze
Public
Python
•
MIT License
•0•1•0•0•Updated Jun 10, 2025Jun 10, 2025
DitHub
Public
HTML
•
Apache License 2.0
•0•3•0•0•Updated May 27, 2025May 27, 2025
FourBi
Public
Binarizing Documents by Leveraging both Space and Frequency. (ICDAR 2024)
Python
•3•13•0•0•Updated May 15, 2025May 15, 2025
awesome-human-visual-attention
Public
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
2•55•0•0•Updated May 9, 2025May 9, 2025
fed-mammoth
Public
General Federated Continual Learning Framework
Python
•
MIT License
•2•5•0•0•Updated Apr 20, 2025Apr 20, 2025
COGT
Public
[ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
Python
•0•9•0•0•Updated Apr 15, 2025Apr 15, 2025
HySAC
Public
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
Python
•0•22•1•0•Updated Apr 8, 2025Apr 8, 2025
itserr-wp8-latin-embeddings
Public
ITSERR WP8 - Code for Latin embeddings semantic search
information-retrieval latin embeddings
Python
•
Apache License 2.0
•0•1•0•0•Updated Apr 1, 2025Apr 1, 2025
ReT
Public
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
information-retrieval recurrent-neural-networks embeddings multimodal-retrieval rag multimodal-embeddings
Python
•
Apache License 2.0
•0•18•0•0•Updated Mar 29, 2025Mar 29, 2025
Latin-Document-Search-Engine
Public
Python
•1•0•0•0•Updated Mar 10, 2025Mar 10, 2025
HWD
Public
Python
•
Other
•1•25•0•0•Updated Mar 7, 2025Mar 7, 2025
VATr
Public
Python
•
MIT License
•7•82•3•0•Updated Mar 7, 2025Mar 7, 2025
cvcs2025
Public
0•0•0•0•Updated Feb 28, 2025Feb 28, 2025
LAM
Public
The Ludovico Antonio Muratori (LAM) dataset is the largest line-level HTR dataset to date and contains 25,823 lines from Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is in…
0•5•0•0•Updated Feb 26, 2025Feb 26, 2025