Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/617399
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKaihua Zhangen_US
dc.contributor.authorQingshan Liuen_US
dc.contributor.authorYi Wuen_US
dc.contributor.authorMing-Hsuan Yangen_US
dc.date.accessioned2020-05-20T09:18:04Z-
dc.date.available2020-05-20T09:18:04Z-
dc.date.issued2016en_US
dc.identifier.issn1057-7149en_US
dc.identifier.issn1941-0042en_US
dc.identifier.other10.1109/TIP.2016.2531283en_US
dc.identifier.urihttp://localhost/handle/Hannan/155292en_US
dc.identifier.urihttp://localhost/handle/Hannan/617399-
dc.description.abstractDeep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However, the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper, we present that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to learn robust representations for visual tracking. In the first frame, we extract a set of normalized patches from the target region as fixed filters, which integrate a series of adaptive contextual filters surrounding the target to define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps together form a global representation, via which the inner geometric layout of the target is also preserved. A simple soft shrinkage method that suppresses noisy values below an adaptive threshold is employed to de-noise the global representation. Our convolutional networks have a lightweight structure and perform favorably against several state-of-the-art methods on the recent tracking benchmark data set with 50 challenging videos.en_US
dc.publisherIEEEen_US
dc.relation.haspart7410052.pdfen_US
dc.subjectConvolutional Networks|Visual tracking|Deep learningen_US
dc.titleRobust Visual Tracking via Convolutional Networks Without Trainingen_US
dc.typeArticleen_US
dc.journal.volume25en_US
dc.journal.issue4en_US
dc.journal.titleIEEE Transactions on Image Processingen_US
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7410052.pdf6.17 MBAdobe PDFThumbnail
Preview File
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKaihua Zhangen_US
dc.contributor.authorQingshan Liuen_US
dc.contributor.authorYi Wuen_US
dc.contributor.authorMing-Hsuan Yangen_US
dc.date.accessioned2020-05-20T09:18:04Z-
dc.date.available2020-05-20T09:18:04Z-
dc.date.issued2016en_US
dc.identifier.issn1057-7149en_US
dc.identifier.issn1941-0042en_US
dc.identifier.other10.1109/TIP.2016.2531283en_US
dc.identifier.urihttp://localhost/handle/Hannan/155292en_US
dc.identifier.urihttp://localhost/handle/Hannan/617399-
dc.description.abstractDeep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However, the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper, we present that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to learn robust representations for visual tracking. In the first frame, we extract a set of normalized patches from the target region as fixed filters, which integrate a series of adaptive contextual filters surrounding the target to define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps together form a global representation, via which the inner geometric layout of the target is also preserved. A simple soft shrinkage method that suppresses noisy values below an adaptive threshold is employed to de-noise the global representation. Our convolutional networks have a lightweight structure and perform favorably against several state-of-the-art methods on the recent tracking benchmark data set with 50 challenging videos.en_US
dc.publisherIEEEen_US
dc.relation.haspart7410052.pdfen_US
dc.subjectConvolutional Networks|Visual tracking|Deep learningen_US
dc.titleRobust Visual Tracking via Convolutional Networks Without Trainingen_US
dc.typeArticleen_US
dc.journal.volume25en_US
dc.journal.issue4en_US
dc.journal.titleIEEE Transactions on Image Processingen_US
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7410052.pdf6.17 MBAdobe PDFThumbnail
Preview File
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKaihua Zhangen_US
dc.contributor.authorQingshan Liuen_US
dc.contributor.authorYi Wuen_US
dc.contributor.authorMing-Hsuan Yangen_US
dc.date.accessioned2020-05-20T09:18:04Z-
dc.date.available2020-05-20T09:18:04Z-
dc.date.issued2016en_US
dc.identifier.issn1057-7149en_US
dc.identifier.issn1941-0042en_US
dc.identifier.other10.1109/TIP.2016.2531283en_US
dc.identifier.urihttp://localhost/handle/Hannan/155292en_US
dc.identifier.urihttp://localhost/handle/Hannan/617399-
dc.description.abstractDeep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However, the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper, we present that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to learn robust representations for visual tracking. In the first frame, we extract a set of normalized patches from the target region as fixed filters, which integrate a series of adaptive contextual filters surrounding the target to define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps together form a global representation, via which the inner geometric layout of the target is also preserved. A simple soft shrinkage method that suppresses noisy values below an adaptive threshold is employed to de-noise the global representation. Our convolutional networks have a lightweight structure and perform favorably against several state-of-the-art methods on the recent tracking benchmark data set with 50 challenging videos.en_US
dc.publisherIEEEen_US
dc.relation.haspart7410052.pdfen_US
dc.subjectConvolutional Networks|Visual tracking|Deep learningen_US
dc.titleRobust Visual Tracking via Convolutional Networks Without Trainingen_US
dc.typeArticleen_US
dc.journal.volume25en_US
dc.journal.issue4en_US
dc.journal.titleIEEE Transactions on Image Processingen_US
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7410052.pdf6.17 MBAdobe PDFThumbnail
Preview File