Coupled Diffusion Sampling for Training-Free Multi-View Image Editing
用於免訓練多視圖圖像編輯的耦合擴散採樣
📅2025-10-16
🆔2510.14981
👥 Hadi Alzayer, Yunzhi Zhang, Chen Geng et al.
cs.CVcs.AI
We present an inference-time diffusion sampling method to perform multi-view
consistent image editing using pre-trained 2D image editing models. These
models can independently produce high-quality edi...
我們提出了一種推理時間擴散採樣方法來執行多視圖
使用預先訓練的 2D 圖像編輯模型進行一致的圖像編輯。這些
模型可以獨立地對一組圖像中的每個圖像進行高質量的編輯
3D 場景或物體的多視圖圖像,但它們不保持一致性
跨視圖。現有的方法通常通過優化來解決這個問題
顯式 3D 表示,但它們經歷了漫長的優化
稀疏視圖設置下的過程和不穩定。我們提出隱式 3D
通過將生成的 2D 圖像序列約束為正則化方法
堅持...
From Pixels to Words -- Towards Native Vision-Language Primitives at
Scale
從像素到文字——走向原生視覺語言原語
規模
📅2025-10-16
🆔2510.14979
👥 Haiwen Diao, Mingxuan Li, Silei Wu et al.
cs.CVcs.AI
The edifice of native Vision-Language Models (VLMs) has emerged as a rising
contender to typical modular VLMs, shaped by evolving model architectures and
training paradigms. Yet, two lingering clouds ...
The design of complex machines stands as both a marker of human intelligence
and a foundation of engineering practice. Given recent advances in large
language models (LLMs), we ask whether they, too, ...
Learning an Image Editing Model without Image Editing Pairs
學習沒有圖像編輯對的圖像編輯模型
📅2025-10-16
🆔2510.14978
👥 Nupur Kumari, Sheng-Yu Wang, Nanxuan Zhao et al.
cs.CVcs.LG
Recent image editing models have achieved impressive results while following
natural language editing instructions, but they rely on supervised fine-tuning
with large datasets of input-target pairs. T...
Ponimator: Unfolding Interactive Pose for Versatile Human-human
Interaction Animation
Ponimator:為多功能人機展開互動姿勢
交互動畫
📅2025-10-16
🆔2510.14976
👥 Shaowei Liu, Chuan Guo, Bing Zhou et al.
cs.CVcs.GRcs.RO
Close-proximity human-human interactive poses convey rich contextual
information about interaction dynamics. Given such poses, humans can
intuitively infer the context and anticipate possible past and...
Terra: Explorable Native 3D World Model with Point Latents
Terra:具有潛在點的可探索原生 3D 世界模型
📅2025-10-16
🆔2510.14977
👥 Yuanhui Huang, Weiliang Chen, Wenzhao Zheng et al.
cs.CVcs.AIcs.LG
World models have garnered increasing attention for comprehensive modeling of
the real world. However, most existing methods still rely on pixel-aligned
representations as the basis for world evolutio...
世界模型越來越受到人們對綜合建模的關注
現實世界。然而,大多數現有方法仍然依賴於像素對齊
表徵作為世界演化的基礎,忽略了固有的 3D
物理世界的本質。這可能會破壞 3D 一致性
降低世界模型的建模效率。在本文中,我們提出
Terra,一個原生 3D 世界模型,代表並生成可探索的
內在 3D 潛在空間中的環境。具體來說,我們提出了一部小說
點到高斯變分自動編碼器 (P2G-VAE),將 3D 輸入編...
WithAnyone: Towards Controllable and ID Consistent Image Generation
WithAnyone:實現可控且 ID 一致的圖像生成
📅2025-10-16
🆔2510.14975
👥 Hengyuan Xu, Wei Cheng, Peng Xing et al.
cs.CVcs.AI
Identity-consistent generation has become an important focus in text-to-image
research, with recent models achieving notable success in producing images
aligned with a reference identity. Yet, the sca...
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
pi-Flow:通過仿蒸餾進行基於策略的少步生成
📅2025-10-16
🆔2510.14974
👥 Hansheng Chen, Kai Zhang, Hao Tan et al.
cs.LGcs.AIcs.CV
Few-step diffusion or flow-based generative models typically distill a
velocity-predicting teacher into a student that predicts a shortcut towards
denoised data. This format mismatch has led to comple...
Attention Is All You Need for KV Cache in Diffusion LLMs
對於擴散 LLM 中的 KV 緩存,只需關注即可
📅2025-10-16
🆔2510.14973
👥 Quan Nguyen-Tri, Mukul Ranjan, Zhiqiang Shen
cs.CLcs.AIcs.LG
This work studies how to adaptively recompute key-value (KV) caches for
diffusion large language models (DLMs) to maximize prediction accuracy while
minimizing decoding latency. Prior methods' decoder...
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
TokDrift:當法學碩士用子詞說話而代碼用語法說話時
📅2025-10-16
🆔2510.14972
👥 Yinxi Li, Yuntian Deng, Pengyu Nie
cs.CLcs.AIcs.LGcs.PLcs.SE
Large language models (LLMs) for code rely on subword tokenizers, such as
byte-pair encoding (BPE), learned from mixed natural language text and
programming language code but driven by statistics rath...