The indexing of CVPR 2024 paper's code

Click the "Not reproducible" word to see error log. 😜

Can Biases in ImageNet Models Explain Generalization? Python
HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection Python
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames Python
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains Jupyter Notebook
Unmixing before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement No code
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding Python
Fully Geometric Panoramic Localization Python Not reproducible
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion No code
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding Python Not reproducible
DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
G3DR: Generative 3D Reconstruction in ImageNet Python
Insights from the Use of Previously Unseen Neural Architecture Search Datasets No code
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation Python
ShapeWalk: Compositional Shape Editing through Language-Guided Chains JavaScript
ICP-Flow: LiDAR Scene Flow Estimation with ICP Python
Generative Proxemics: A Prior for 3D Social Interaction from Images Python
How to Train Neural Field Representations: A Comprehensive Study and Benchmark Python
eTraM: Event-based Traffic Monitoring Dataset
TUMTraf V2X Cooperative Perception Dataset
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces Python
Towards Co-Evaluation of Cameras, HDR, and Algorithms for Industrial-Grade 6DoF Pose Estimation No code
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds Python
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation Python
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications Python
Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction Python
Geometry Transfer for Stylizing Radiance Fields No code
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation Python
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction Python Not reproducible
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data Python
SPAD: Spatially Aware Multiview Diffusers
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios Python
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor Python Not reproducible
FastMAC: Stochastic Spectral Sampling of Correspondence Graph Python
UniDepth: Universal Monocular Metric Depth Estimation Python Not reproducible
Label Propagation for Zero-shot Classification with Vision-Language Models Python
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection Python
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling Python
MonoCD: Monocular 3D Object Detection with Complementary Depths Python
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation Python
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation Python
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets Python
Neural Implicit Morphing of Face Images Python Not reproducible
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects Python
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos Python
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis Python
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation No code
Novel View Synthesis with View-Dependent Effects from a Single Image JavaScript
A2XP: Towards Private Domain Generalization Python
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning Python
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning Python
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection No code
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos Python
NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning Python
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior Python
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation Python
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks Python
Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignment and Prompt Tuning Python
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? Python
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning Python Not reproducible
Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
LightIt: Illumination Modeling and Control for Diffusion Models
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation Python
FreeMan: Towards benchmarking 3D human pose estimation under Real-World Conditions HTML
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction Python
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content Python
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction C++
Unbiased Estimator for Distorted Conic in Camera Calibration C++
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation C++
Brain Decodes Deep Nets Jupyter Notebook
Seamless Human Motion Composition with Blended Positional Encodings Python
GenZI: Zero-Shot 3D Human-Scene Interaction Generation Python Not reproducible
PREGO: online mistake detection in PRocedural EGOcentric videos No code
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting Python
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion Python Not reproducible
Open Vocabulary Semantic Scene Sketch Understanding Python
Unsupervised 3D Structure Inference from Category-Specific Image Collections
MonoHair: High-Fidelity Hair Modeling from a Monocular Video Python
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion Python
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack Python
Data Poisoning based Backdoor Attacks to Contrastive Learning Python
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation Python
Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models Python
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians Python
Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space No code
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models Python
Understanding Video Transfomers via Universal Concept Discovery Jupyter Notebook
StyLitGAN: Image-based Relighting via Latent Control Jupyter Notebook
Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers Jupyter Notebook
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis Python Not reproducible
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving Python Not reproducible
Generative Unlearning for Any Identity No code
Holodeck: Language Guided Generation of 3D Embodied AI Environments Python Not reproducible
3D Neural Edge Reconstruction
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness Python Not reproducible
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding JavaScript
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Human Gaussian Splatting : Real-time Rendering of Animatable Avatars No code
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling No code
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Structure-Aware Sparse-View X-ray 3D Reconstruction Python
Language-Driven Anchors for Zero-Shot Adversarial Robustness Python
Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation Python
Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation Python
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model No code
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes Python
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning No code
PoNQ: a Neural QEM-based Mesh Representation Jupyter Notebook
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features Python
URHand: Universal Relightable Hands
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining Python Not reproducible
Generating Human Motion in 3D Scenes from Text Descriptions Python
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting No code
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams No code
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting Python Not reproducible
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging Python Not reproducible
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation Python
From a Bird’s Eye View to See: Joint Camera and Subject Registration without the Camera Calibration Python
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations Python
Monocular Identity-Conditioned Facial Reflectance Reconstruction JavaScript
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Transcriptomics-guided Slide Representation Learning in Computational Pathology Python
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MindBridge: A Cross-Subject Brain Decoding Framework
Diversified and Personalized Multi-rater Medical Image Segmentation Python
Learned Trajectory Embedding for Subspace Clustering
DART: Implicit Doppler Tomography for Radar Novel View Synthesis Python
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models Python Not reproducible
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention JavaScript
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding Python
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model No code
Permutation Equivariance of Transformers and Its Applications Python
Rolling Shutter Correction with Intermediate Distortion Flow Estimation No code
Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI Python
Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D Strokes Python
Describing Differences in Image Sets with Natural Language Jupyter Notebook
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation CSS
Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration Python Not reproducible
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation Python
DPHMs: Diffusion Parametric Head Models for Depth-based Tracking Python
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Grounded Question-Answering in Long Egocentric Videos Python
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation Python Not reproducible
Rotation-Agnostic Image Representation Learning for Digital Pathology Python
Exploring Orthogonality in Open World Object Detection Python
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling No code
Seeing Motion at Nighttime with an Event Camera Python
Synergistic Global-space Camera and Human Reconstruction from Videos
The Manga Whisperer: Automatically Generating Transcriptions for Comics Python
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion Python
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation Python
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection Python
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models Python
Dual Prior Unfolding for Snapshot Compressive Imaging No code
Generalizable Novel-View Synthesis using a Stereo Camera Python
KVQ: Kwai Video Quality Assessment for Short-form Videos Python
Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification No code
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization JavaScript
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance Python Not reproducible
SignGraph: A Sign Sequence is Worth Graphs of Nodes Python Not reproducible
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion Python
OneFormer3D: One Transformer for Unified Point Cloud Segmentation Python
MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation No code
Optimizing Diffusion Noise Can Serve As Universal Motion Priors Python
M&M VTO: Multi-Garment Virtual Try-On and Editing
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection No code
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond Python
A Simple Baseline for Efficient Hand Mesh Reconstruction Python
Boosting Flow-based Generative Super-Resolution Models via Learned Prior Python
Latent Modulated Function for Computational Optimal Continuous Image Representation Python
FedAS: Bridging Inconsistency in Personalized Federated Learning No code
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention Python
Utility-Fairness Trade-Offs and How to Find Them Python
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery Python Not reproducible
Noisy-Correspondence Learning for Text-to-Image Person Re-identification Python
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity Python
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs Python Not reproducible
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation Python Not reproducible
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans JavaScript
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation Python
BrainWash: A Poisoning Attack to Forget in Continual Learning Python
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning Python Not reproducible
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction Python
Matching Anything by Segmenting Anything Python
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation Python Not reproducible
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception Python
MemFlow: Optical Flow Estimation and Prediction with Memory
Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective Python Not reproducible
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction Python
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives Python
SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology Python
Learned representation-guided diffusion models for large-image generation
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization No code
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications Python Not reproducible
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld Python
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting Python
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Python
Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining Python
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation Python
HIPTrack: Visual Tracking with Historical Prompts
Convolutional Prompting meets Language Models for Continual Learning Python Not reproducible
Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline Jupyter Notebook
Lane2Seq: Towards Unified Lane Detection via Sequence Generation Python
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
MoST: Motion Style Transformer between Diverse Action Contents
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning Python
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition Python
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models Python
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Mask Grounding for Referring Image Segmentation
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval Python
iKUN: Speak to Trackers without Retraining Python
VidToMe: Video Token Merging for Zero-Shot Video Editing
A noisy elephant in the room: Is your out-of-distribution detector robust to label noise? Jupyter Notebook
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
PIGEON: Predicting Image Geolocations Python
LEOD: Label-Efficient Object Detection for Event Cameras Python
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Language Models as Black-Box Optimizers for Vision-Language Models Python
Improved Implicit Neural Representation with Fourier Reparameterized Training Python Not reproducible
Efficient Solution of Point-Line Absolute Pose C++
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing Python Not reproducible
The Neglected Tails of Vision-Language Models Python Not reproducible
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Anchor-based Robust Finetuning of Vision-Language Models No code
Improved Visual Grounding through Self-Consistent Explanations Python
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models Python
IReNe: Instant Recoloring of Neural Radiance Fields JavaScript
Video Interpolation with Diffusion Models
Prompt Highlighter: Interactive Control for Multi-Modal LLMs Python
Efficient Meshflow and Optical Flow Estimation from Event Cameras No code
Language-only Training of Zero-shot Composed Image Retrieval Python
Adversarial Score Distillation: When score distillation meets GAN Python Not reproducible
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation Python
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles Python Not reproducible
AV-RIR: Audio-Visual Room Impulse Response Estimation Python
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners Python
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning No code
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Python
Segment and Caption Anything Python Not reproducible
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction Python Not reproducible
See, Say, and Segment: Correcting False Premises with LMMs No code
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation JavaScript
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning Python
MMA-Diffusion: MultiModal Attack on Diffusion Models Python
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Partial-to-Partial Shape Matching with Geometric Consistency
Towards Robust Learning to Optimize with Theoretical Guarantees Python
3D Facial Expressions through Analysis-by-Neural-Synthesis Python Not reproducible
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion Python
Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment Python
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking Python
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs Jupyter Notebook
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models Python
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Python
Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments Python
G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping Python
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach No code
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation JavaScript
Learned Scanpaths Aid Blind Panoramic Video Quality Assessment No code
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening Python Not reproducible
Plug-and-Play Diffusion Distillation
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching No code
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation Python
A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
FMA-Net: Flow Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
RegionGPT: Towards Region Understanding Vision Language Model JavaScript
Material Palette: Extraction of Materials from a Single Image
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness Python Not reproducible
DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting Python
Residual Denoising Diffusion Models Python
JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients Python Not reproducible
Single Domain Generalization for Crowd Counting Python Not reproducible
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences Python
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation Python
Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor Python
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly Python
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering Python
PlatoNeRF: 3D Reconstruction in Plato’s Cave via Single-View Two-Bounce Lidar Python
MeaCap: Memory-Augmented Zero-shot Image Captioning Python Not reproducible
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network No code
Shadows Don’t Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling Python
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer Python
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection Python
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning Python
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Multiview Aerial Visual RECognition (MAVREC) Dataset: Can Multi-view Improve Aerial Visual Perception? Jupyter Notebook
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation Python
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection Python
Disentangled Pre-training for Human-Object Interaction Detection Python
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding Python
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models Python
Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation C++
Absolute Pose from One or Two Scaled and Oriented Features C++
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
MICap: A Unified Model for Identity-aware Movie Descriptions
SAI3D: Segment Any Instance in 3D Scenes
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains Python Not reproducible
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning Python
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Python Not reproducible
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos Python
High-Quality Facial Geometry and Appearance Capture at Home Python
Consistent Prompting for Rehearsal-Free Continual Learning Python Not reproducible
Frequency-Adaptive Dilated Convolution for Semantic Segmentation Python
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation Python
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning Python
Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching Python
ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models Python Not reproducible
Simple Semantic-Aided Few-Shot Learning Python
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry Python
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation Python
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion Python Not reproducible
MoDE: CLIP Data Experts via Clustering
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training Python
Continual Forgetting for Pre-trained Vision Models Python
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts Python Not reproducible
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Instance-based Max-margin for Practical Few-shot Recognition No code
SpecNeRF: Gaussian Directional Encoding for Specular Reflections JavaScript
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models Python
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation Python
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory Python
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection Python
Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements Jupyter Notebook
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data No code
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance Python Not reproducible
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion Jupyter Notebook
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms Python
Relightable and Animatable Neural Avatar from Sparse-View Video
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection Python
On Exact Inversion of DPM-Solvers Python
Differentiable Point-based Inverse Rendering Python
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers Python Not reproducible
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF Python
HashPoint: Accelerated Point Searching and Sampling for Neural Rendering No code
LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example Python
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Control4D: Efficient 4D Portrait Editing with Text
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation Python Not reproducible
Looking 3D: Anomaly Detection with 2D-3D Alignment Python
A Unified and Interpretable Emotion Representation and Expression Generation
GenesisTex: Adapting Image Denoising Diffusion to Texture Space JavaScript
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association Python
Space-time Diffusion Features for Zero-shot Text-driven Motion Transfer Python
Point Transformer V3: Simpler, Faster, Stronger Python
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields Python
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding No code
Dense Optical Tracking: Connecting the Dots Python
LAENeRF: Local Appearance Editing for Neural Radiance Fields C++
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction No code
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation No code
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation Python
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis Python
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates Python
Intrinsic Image Diffusion for Indoor Single-view Material Estimation Python
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction Python Not reproducible
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Towards 3D Vision with Low-Cost Single-Photon Cameras No code
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer Python Not reproducible
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings Python
CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images No code
Continuous Pose for Monocular Cameras in Neural Implicit Representation Python
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering C++
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior Python Not reproducible
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Jupyter Notebook
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks Python
REACTO: Reconstructing Articulated Objects from a Single Video No code
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes JavaScript
Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI Python
Efficient Test-Time Adaptation of Vision-Language Models Python
Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis Python
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Python
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline Python
Making Vision Transformers Truly Shift-Equivariant
Boosting Neural Representations for Videos with a Conditional Decoder Python Not reproducible
Exploiting Diffusion Prior for Generalizable Dense Prediction
Logit Standardization in Knowledge Distillation Jupyter Notebook
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers Python
FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning Python
ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis Python
RMT: Retentive Networks Meet Vision Transformers Python
Efficient Dataset Distillation via Minimax Diffusion Python
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network Jupyter Notebook
Building Optimal Neural Architectures using Interpretable Knowledge Python
Learning Structure-from-Motion with Graph Attention Networks
MaxQ: Multi-Axis Query for N:M Sparsity Network Python
State Space Models for Event Cameras Python
Learning Inclusion Matching for Animation Paint Bucket Colorization Python
FedUV: Uniformity and Variance for Heterogeneous Federated Learning Python
Training-free Pretrained Model Merging Python
Learning Vision from Models Rivals Learning Vision from Data
RepViT: Revisiting Mobile CNN From ViT Perspective Jupyter Notebook
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts Python
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
Data Valuation and Detections in Federated Learning
6-DoF Pose Estimation with MultiScale Residual Correlation Python
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics No code
WaveMo: Learning Wavefront Modulations to See Through Scattering Python Not reproducible
Wired Perspectives: Multi-View Wire Art Embraces Generative AI No code
Frozen Feature Augmentation for Few-Shot Image Classification JavaScript
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos No code
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments JavaScript
DemoFusion: Democratising High-Resolution Image Generation With No $$$ Jupyter Notebook
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models Python
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models JavaScript
Don’t drop your samples! Coherence-aware training benefits Conditional diffusion Python
CapHuman: Capture Your Moments in Parallel Universes
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer No code
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis Jupyter Notebook
MACE: Mass Concept Erasure in Diffusion Models Python
One-Shot Structure-Aware Stylized Image Synthesis Python
Robust Self-calibration of Focal Lengths from the Fundamental Matrix Python
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning Python Not reproducible
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Optimal Transport Aggregation for Visual Place Recognition Python
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion Python Not reproducible
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos Python
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning Python
Self-correcting LLM-controlled Diffusion Python Not reproducible
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning Python
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Python Not reproducible
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery Python
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images Python
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation No code
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation Python
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models Python
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Face2Diffusion for Fast and Editable Face Personalization Jupyter Notebook
MaGGIe: Masked Guided Gradual Human Instance Matting Python
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses Python Not reproducible
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution Python
Task-Driven Wavelets using Constrained Empirical Risk Minimization Python
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Learning to Count without Annotations Python
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework Python Not reproducible
AM-RADIO: Agglomerative Models - Reduce All Domains Into One Python
EgoGen: An Egocentric Synthetic Data Generator Python
RoHM: Robust Human Motion Reconstruction via Diffusion Python
An N-Point Linear Solver for Line and Motion Estimation with Event Cameras C++
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes JavaScript
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models Python
LAMP: Learn A Motion Pattern for Few-Shot Video Generation Python
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing Python
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples No code
Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data Python Not reproducible
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System No code
AVID: Any-Length Video Inpainting with Diffusion Model No code
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
BigGait: Learning Gait Representation You Want by Large Vision Models Python
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Language-conditioned Detection Transformer Python Not reproducible
AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor No code
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection Python
Object Recognition as Next Token Prediction Python
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation Python
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Python Not reproducible
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Scaling Laws of Synthetic Images for Model Training ... for Now
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection Python Not reproducible
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video No code
BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models Python
Discriminative Probing and Tuning for Text-to-Image Generation Python
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval Python
InceptionNeXt: When Inception Meets ConvNeXt Python
\emph{RealCustom}: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization JavaScript
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
LiDAR-based Person Re-identification Python Not reproducible
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition Python
Active Generalized Category Discovery Python
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration Python
Relation Rectification in Diffusion Model Python
Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Image Sculpting: Precise Object Editing with 3D Geometry Control Python Not reproducible
Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences Python
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting Python
Towards Generalizing to Unseen Domains with Few Labels Python
Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships Python Not reproducible
MS-DETR: Efficient DETR Training with Mixed Supervision Python
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
ParamISP: Learned Forward and Inverse ISPs using Camera Parameters Python
Improved Baselines with Visual Instruction Tuning Python
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Towards Realistic Scene Generation with LiDAR Diffusion Models Python
Single Mesh Diffusion Models with Field Latents for Texture Generation
Riemannian Multinomial Logistics Regression for SPD Neural Networks
Learning Multi-dimensional Human Preference for Text-to-Image Generation HTML
IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification Python Not reproducible
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation Python
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
Style Aligned Image Generation via Shared Attention
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On Python
Transfer CLIP for Generalizable Image Denoising Python
Holistic Features are almost Sufficient for Text-to-Video Retrieval Python Not reproducible
Video Harmonization with Triplet Spatio-Temporal Variation Patterns Python
Unified Entropy Optimization for Open-Set Test-Time Adaptation Python
Diversity-aware Channel Pruning for StyleGAN Compression
Edit One for All: Interactive Batch Image Editing
SFOD: Spiking Fusion Object Detector Python
Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping Python Not reproducible
FaceLift: Semi-supervised 3D Facial Landmark Localization JavaScript
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval Python
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design Python
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval Python
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
EGTR: Extracting Graph from Transformer for Scene Graph Generation No code
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Jupyter Notebook
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models JavaScript
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion Python
SURE: SUrvey REcipes for building reliable and robust deep networks Python
Bootstrapping SparseFormers from Vision Foundation Models Python
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation Python Not reproducible
Vlogger: Make Your Dream A Vlog Python Not reproducible
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement Python
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis Python
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning Python
Novel Class Discovery for Ultra-Fine-Grained Visual Categorization Python
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring No code
Fourier-basis functions to bridge augmentation gap: Rethinking frequency augmentation in image classification Python
Single-Model and Any-Modality for Video Object Tracking No code
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Unifying Top-down and Bottom-up Scanpath Prediction using Transformers Python Not reproducible
GSVA: Generalized Segmentation via Multimodal Large Language Models No code
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation Python
Task-conditioned adaptation of visual features in multi-task policy learning HTML
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Amodal Completion via Progressive Mixed Context Diffusion
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model Python
Language-driven Grasp Detection
Named Entity Driven Zero-Shot Image Manipulation No code
Doubly Abductive Counterfactual Inference for Text-based Image Editing Python
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing Python Not reproducible
Image Restoration by Denoising Diffusion Models With Iteratively Preconditioned Guidance Python
Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation CSS
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models Python
TULIP: Transformer for Upsampling of LiDAR Point Cloud Python
MemoNav: Working Memory Model for Visual Navigation No code
Restoration by Generation with Constrained Priors JavaScript
SVGDreamer: Text Guided SVG Generation with Diffusion Model Python
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation Python Not reproducible
Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households Python
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes No code
TIM: A Time Interval Machine for Audio-Visual Action Recognition Python
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features Python
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space Python
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Makeup Prior Models for 3D Facial Makeup Estimation and Applications No code
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives Python
Readout Guidance: Learning Control from Diffusion Features Jupyter Notebook
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections Python
Learning the 3D Fauna of the Web JavaScript
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction Python
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization Python
Low-power, Continuous Remote Behavioral Localization with Event Cameras Python
Action Scene Graphs for Long-Form Understanding of Egocentric Videos Jupyter Notebook
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes Python
Grid Diffusion Models for Text-to-Video Generation No code
Selective, Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation JavaScript
ControlRoom3D: Room Generation using Semantic Controls JavaScript
vid-TLDR: Training Free Token merging for Light-weight Video Transformer Python
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition No code
CSTA: CNN-based Spatiotemporal Attention for Video Summarization Python
Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation Python Not reproducible
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection Python
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection No code
Object Pose Estimation via the Aggregation of Diffusion Features Python
SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
Context-Aware Integration of Language and Visual References for Natural Language Tracking No code
Purified and Unified Steganographic Network Python
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation Python
VCoder: Versatile Vision Encoders for Multimodal Large Language Models Python
SketchINR: A First Look into Sketches as Implicit Neural Representations
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction Python
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training Python
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes No code
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training Python
Implicit Event-RGBD Neural SLAM
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Python
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects Python
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving Python Not reproducible
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis Python
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching Python
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling Python
VILA: On Pre-training for Visual Language Models Python
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting Cuda
COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction Python Not reproducible
MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion Python
NEAT: Distilling 3D Wireframes from Neural Attraction Fields Python
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition Python
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
MuRF: Multi-Baseline Radiance Fields Python Not reproducible
NECA: Neural Customizable Human Avatar Python
General Object Foundation Model for Images and Videos at Scale Python
A Simple Recipe for Language-guided Domain Generalized Segmentation Python
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models Python
Federated Online Adaptation for Deep Stereo Python
Collaborating Foundation models for Domain Generalized Semantic Segmentation Python
Generalized Predictive Model for Autonomous Driving Python
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition Python
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians Python
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds Python
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images Python
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing No code
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis Python
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image Python
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning Python
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes JavaScript
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding Jupyter Notebook
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
StraightPCF: Straight Point Cloud Filtering HTML
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation No code
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior Python Not reproducible
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization Python
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding Python
3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
Towards Memorization-Free Diffusion Models No code
Gradient Alignment for Cross-domain Face Anti-Spoofing Python
CNC-Net: Self-Supervised Learning for CNC Machining Operations No code
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors Python Not reproducible
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models Jupyter Notebook
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi Python Not reproducible
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation Python Not reproducible
Osprey: Pixel Understanding with Visual Instruction Tuning Python
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images Python
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM Python Not reproducible
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D Python Not reproducible
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D No code
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM Python Not reproducible
Text-to-3D using Gaussian Splatting Python
Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling
When StyleGAN Meets Stable Diffusion: a ${\mathcal{W}_+}$ Adapter for Personalized Image Generation Python Not reproducible
ReconFusion: 3D Reconstruction with Diffusion Priors
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
CoGS: Controllable Gaussian Splatting
Motion Blur Decomposition with Cross-shutter Guidance Python
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields C++
Learning to Produce Semi-dense Correspondences for Visual Localization Python Not reproducible
Compact 3D Gaussian Representation for Radiance Field Python
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Generalizable Face Landmarking Guided by Conditional Face Warping
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Close Imitation of Expert Retouching for Black-and-White Photography Python
Compositional Chain-of-Thought Prompting for Large Multimodal Models Python
Visual Point Cloud Forecasting enables Scalable Autonomous Driving Python
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object Python Not reproducible
VBench: Comprehensive Benchmark Suite for Video Generative Models
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting Python
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
Learning from Synthetic Human Group Activities
Instance Tracking in 3D Scenes from Egocentric Videos
PoseGPT: Chatting about 3D Human Pose No code
Three Pillars improving Vision Foundation Model Distillation for Lidar Python
Cloud-Device Collaborative Learning for Multimodal Large Language Models
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels No code
3D-LFM: Lifting Foundation Model Jupyter Notebook
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction Python
Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It Python
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning No code
LangSplat: 3D Language Gaussian Splatting
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World Python
Learning from Observer Gaze: Zero-shot Attention Prediction Oriented by Human-Object Interaction Recognition JavaScript
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback JavaScript
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images Python
C$^\text{2}$RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction Python
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions Jupyter Notebook
Posterior Distillation Sampling
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding Python
GLACE: Global Local Accelerated Coordinate Encoding Python
Neural Markov Random Field for Stereo Matching Python
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation No code
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration Python
Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding Python
Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation No code
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning Python
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies No code
VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Python
Mosaic-SDF for 3D Generative Models JavaScript
Learning Continuous 3D Words for Text-to-Image Generation
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars Jupyter Notebook
Situational Awareness Matters in 3D Vision Language Reasoning
Gaussian Shell Maps for Efficient 3D Human Generation Jupyter Notebook
Neural Clustering based Visual Representation Learning Python
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders Python Not reproducible
Learning Large-Factor EM Image Super-Resolution with Generative Priors Python
Score-Guided Diffusion for 3D Human Recovery Python
Distributionally Generative Augmentation for Fair Facial Attribute Classification Python
NAPGuard: Towards Detecting Naturalistic Adversarial Patches Python
Unleashing Network Potentials for Semantic Scene Completion Python
NeRF Director: Revisiting View Selection in Neural Volume Rendering No code
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models Python
YOLO-World: Real-Time Open-Vocabulary Object Detection Python
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation Jupyter Notebook
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension Python Not reproducible
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition Python
ASAM: Boosting Segment Anything Model with Adversarial Tuning Python
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation Python
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection Python Not reproducible
Inversion-Free Image Editing with Language-Guided Diffusion Models Python
RoMa: Robust Dense Feature Matching Python
DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
$360+x$: A Panoptic Multi-modal Scene Understanding Dataset Python
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning Python
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning Python
Rethinking Boundary Discontinuity Problem for Oriented Object Detection Jupyter Notebook
BiPer: Binary Neural Networks using a Periodic Function Python
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers Python Not reproducible
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning Python
Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition No code
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning Python
EscherNet: A Generative Model for Scalable View Synthesis Python
Revisiting Adversarial Training under Long-Tailed Distributions Python Not reproducible
DiffusionLight: Light Probes for Free by Painting a Chrome Ball Python Not reproducible
MorpheuS: Neural Dynamic 360$^{\circ}$ Surface Reconstruction from Monocular RGB-D Video Python Not reproducible
PEM: Prototype-based Efficient MaskFormer for Image Segmentation Python
ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images Python
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models HTML
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models Python
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes Python
Do Vision and Language Encoders Represent the World Similarly? Jupyter Notebook
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation No code
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation Python
Instruct-Imagen: Image Generation with Multi-modal Instruction
CAMixerSR: Only Details Need More "Attention" Python
CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection Python
Atom-Level Optical Chemical Structure Recognition with Limited Supervision Python
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications Python
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

If you like what you see here, please consider buymeacoffee🤗

大哥、大姐、大爷、大妈,赏我5美刀,支持我一下呗