论文--中国科学院上海高等研究院

论文

您当前的位置：首页科研成果论文

DRFormer: Learning dual relations using Transformer for pedestrian attribute recognition

作者

Tang, Zengming; Huang, Jun
刊物名称

NEUROCOMPUTING
年、卷、文献号

2022, 497, 0925-2312
关键词

Tang, Zengming; Huang, Jun
摘要

Pedestrian attribute recognition is a challenging task because of appearance variations, illumination variations, etc., in pedestrian images. We observe that two typical relations, i.e., relations of regions and relations of attributes, are beneficial to accomplish this task. In this paper, we explore the potential of Transformer on pedestrian attribute recognition task for the first time, and propose a Transformer framework called Dual-Relations Transformer (DRFormer). Vision Transformer (ViT) is adopted as a feature extractor for its nature of modeling long-range relations of regions. Furthermore, an Attribute Relation Module (ARM) is designed with Transformer encoder to capture relations of attributes. In ARM, we encode spatial information and semantic information of attributes into vector embedding representations. Being equipped with spatial information, DRFormer is capable of localizing attribute-related regions. Semantic information enables DRFormer to learn underlying semantic relations among attributes. Extensive experiments are conducted on three popular datasets, including PETA, PA-100K, and RAP, and demonstrate the superiority of our proposed DRFormer over state-of-the-art methods. (c) 2022 Elsevier B.V. All rights reserved.