The CarLogo-51 dataset is a image corpus for large-scale near-duplicate image search. It is collected from the Internet, composed of 51 categories of images containing famous car logos (see the figure), and could be combined with any sets of distractor images in the Web image search tasks. It is also a simulation of the Web environment, that every concept contains many instances. Therefore, we could adopt affinity propagation methods, such as ImageWeb, to improve the image search quality signi cantly.

Frequently Asked Questions

  • What is this dataset used for?
  • It is used for evaluating the large-scale near-duplicate image search systems.

  • How can I construct the distractor image sets as mentioned in the paper/technical report?
  • Please download the specified number of images from the Internet. In order not to make the experiment inaccurate, you need to guarantee that the images in the distractor set do not contain the car logos in the dataset.

  • How is the performance plotted?
  • We have implemented 4 baseline systems, i.e., the Hierarchical Visual Vocabulary Tree (HVVT) [Nister, CVPR06], the Hamming Embedding (HE) for spatial verification [Jegou, ECCV08], the Soft Assignment (SA) on descriptor quantization [Philbin, CVPR08], and the Scalar Quantization (SQ) for a training-free codebook [Zhou, ACMMM12]. We append the ImageWeb algorithm [Xie, CVIU14] on the result of SQ, and report the results (ImageWeb) after post-processing. It is verified that ImageWeb works very well on the constructed dataset.

  • If I want to use the dataset for my experiments, what paper shall I cite?
  • Please cite our CVIU 2014 paper.


  • For the detail information of this dataset, please refer to the technical report.
  • NEW! [RAR package][ZIP package] Version 1.0, released in January, 2014.
  • If you find any mistakes or bugs in the dataset, please contact me.

Related Publications

  • Lingxi Xie, Qi Tian, Wengang Zhou and Bo Zhang, "Fast and Accurate Large-Scale Web Image Search with Affinity Propagation on the ImageWeb", accepted to Computer Vision and Image Understanding (CVIU) Special Issue on Large Scale Multimedia Semantic Indexing (LSMSI), 2014. [PDF (draft)] [BibTeX]
  • Lingxi Xie, Qi Tian, Wengang Zhou and Bo Zhang, "The CarLogo-51 Dataset", Techinical Report, Tsinghua University, 2014. [PDF] [BibTeX]


  • [Zhou, ACMMM12] Wengang Zhou, Yijuan Lu, Houqiang Li and Qi Tian, "Scalar Quantization for Large Scale Image Search", in ACM International Conference on Multimedia (ACMMM), 2012.
  • [Philbin, CVPR08] James Philbin, Ondrej Chum, Michael Isard, Josef Sivic and Andrew Zisserman, "Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases", in IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
  • [Jegou, ECCV08] Harve Jegou, Matthijs Douze and Cordelia Schmid, "Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search", in European Conference on Computer Vision, 2008.
  • [Nister, CVPR06] David Nister and Henrik Stewenius, "Scalable Recognition with a Vocabulary Tree", in IEEE International Conference on Computer Vision and Pattern Recognition, 2006.