Limits...
On-the-fly learning for visual search of large-scale image and video datasets.

Chatfield K, Arandjelović R, Parkhi O, Zisserman A - Int J Multimed Inf Retr (2015)

Bottom Line: We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks).We also sketch the architecture of the real-time on-the-fly system.Quantitative results are given on a number of large-scale image and video benchmarks (e.g.  TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.

View Article: PubMed Central - PubMed

Affiliation: Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK.

ABSTRACT

The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove 'outliers' in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g.  TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.

No MeSH data available.


Face retrieval attribute search examples. Top search results for facial attribute queries ‘beard’ and ‘black spectacles’, respectively, on the BBC News dataset
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4498639&req=5

Fig12: Face retrieval attribute search examples. Top search results for facial attribute queries ‘beard’ and ‘black spectacles’, respectively, on the BBC News dataset

Mentions: Facial attribute search On-the-fly face classification can also be used for retrieving face tracks with specific attributes such as a moustache, beard, glasses and gender, by simply using these for the text query, rather than specifying a person (by name) as in the case of identity retrieval. This simple technique enables users to explore the content along other dimensions. Figure 12 shows several facial attribute examples on the BBC News dataset.Fig. 12


On-the-fly learning for visual search of large-scale image and video datasets.

Chatfield K, Arandjelović R, Parkhi O, Zisserman A - Int J Multimed Inf Retr (2015)

Face retrieval attribute search examples. Top search results for facial attribute queries ‘beard’ and ‘black spectacles’, respectively, on the BBC News dataset
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4498639&req=5

Fig12: Face retrieval attribute search examples. Top search results for facial attribute queries ‘beard’ and ‘black spectacles’, respectively, on the BBC News dataset
Mentions: Facial attribute search On-the-fly face classification can also be used for retrieving face tracks with specific attributes such as a moustache, beard, glasses and gender, by simply using these for the text query, rather than specifying a person (by name) as in the case of identity retrieval. This simple technique enables users to explore the content along other dimensions. Figure 12 shows several facial attribute examples on the BBC News dataset.Fig. 12

Bottom Line: We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks).We also sketch the architecture of the real-time on-the-fly system.Quantitative results are given on a number of large-scale image and video benchmarks (e.g.  TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.

View Article: PubMed Central - PubMed

Affiliation: Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK.

ABSTRACT

The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove 'outliers' in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g.  TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.

No MeSH data available.