H-Adapter: Pose-Robust Hairstyle Transfer via Attention-Derived, Source-Aligned Hair Masks

SNOWSNOW Corp.
*Work done during an internship at SNOW Corp.  Corresponding author.

ECCV 2026

Qualitative Results & Applications

Abstract

Hairstyle transfer has practical applications such as virtual try-on, yet remains challenging when the source and reference exhibit large head-pose discrepancies. We propose H-Adapter, which improves pose robustness by training with a region-specific loss that disentangles hair and non-hair objectives and thereby induces spatially disentangled cross-attention, from which a source-aligned hair edit mask is derived to guide diffusion-based inpainting. Experiments on pose-agnostic and pose-different subsets demonstrate strong quantitative results, including the best FID, FIDCLIP, and CLIP-I under pose differences, while maintaining competitive non-hair preservation and improving qualitative fidelity to fine-grained reference hairstyle details. Beyond source-conditioned transfer, H-Adapter supports practical extensions including text-to-image generation, auxiliary prompt-based hair color control, and compatibility with an identity-preserving IP-Adapter variant. We also introduce a VLM-as-a-judge protocol and observe consistent gains in hairstyle faithfulness, non-hair preservation, and artifact quality.

Method Overview

We train an H-Adapter on top of IP-Adapter with a region-specific loss that disentangles hair and non-hair objectives. At inference, the resulting cross-attention yields a source-aligned coarse hair mask, which guides a two-stage diffusion inpainting pipeline (warm-up → intermediate → pixel-level segmentation) to produce pose- and shape-consistent hairstyle transfer.

Training pipeline: H-Adapter optimized with a region-specific objective using image-mask pairs.
(a) Training. H-Adapter is optimized with a region-specific objective that confines reference-hairstyle learning to the hair region.
Inference pipeline: two-stage coarse-to-fine inpainting guided by an attention-derived source-aligned hair mask.
(b) Inference. A coarse attention-derived mask (green) guides intermediate inpainting; a pixel-level segmentation mask (blue) is then used for final inpainting (purple).
Auxiliary prompt-based control of hair color via the reference-injection weight.
(c) Auxiliary prompt-based control.
Reference-guided text-to-image generation without a source-image condition.
(d) Reference-guided text-to-image generation.
Identity-preserving hair transfer by composing H-Adapter with an IP-Adapter FaceID Plus variant.
(e) Identity-preserving hair transfer.

BibTeX

@inproceedings{jeong2026hadapter,
  title     = {H-Adapter: Pose-Robust Hairstyle Transfer via Attention-Derived, Source-Aligned Hair Masks},
  author    = {Jeong, Seulgi and Cho, Yunseong and Park, Sanghun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2026}
}