Abstract:
This paper proposes an unsupervised learning framework in which models of multiple objects� appearance classes are learned from video. These models are used to detect objects of different classes in the scene. The proposed technique combines appearance and motion features in a weighted combination framework resulting in models of object classes. Thus, better detection results are achieved compared to foreground based tracking and to those obtained in a supervised way. Since the proposed technique is unsupervised, a good detection rate is achieved without manual effort expended in data collection and labelling. Experimental results confirm that the proposed framework offers a promising solution for detection in unfamiliar scenes.