Enhancing Transformer-based Object Detection with Novel Encoders and Matching Strategies

Authors

  • Leszek Ziora CUT, Poland Author

DOI:

https://doi.org/10.64758/rz4dgg90

Keywords:

Transformer-based object detection, Similarity-based Deduplication Encoder (SDE), Hybrid Multi-object Encoder (HMoE)

Abstract

This paper seeks to improve transformer-based object detectors for dealing with several issues arising in terms of the large scale features with fusion, redundant tokens, and biased scales with respect to big objects. Here, innovative proposals include similarity-based deduplication encoding for removal of redundancy, Hybrid Multi-object encoding for robust cross-size attentions, and an One-to-many Positive matching for stable generation. The study used quantitative methods in evaluating detection accuracy, training convergence speed, and performance metrics using benchmark datasets such as COCO and VOC2007. Results are shown to exhibit significant improvements in accuracy, efficiency in training, and overall performance while reducing training time by 66% without decreasing or even raising the detection accuracy. These innovations provide a balance in optimizing object detection based on the Transformer, forming a basis for further advancements in object detection technologies.

 

Downloads

Published

2025-07-07