Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Published in International Conference on Computer Vision (ICCV), 2017

Jifei Song*, Qian Yu*, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details; (2) It combines coarse and fine semantic information via a shortcut connection fusion block; and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher order learnable energy function (HOLEF) based loss.

Paper and Supplementary