skip to main content
10.1145/3626772.3657898acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open access

BRB-KMeans: Enhancing Binary Data Clustering for Binary Product Quantization

Published: 11 July 2024 Publication History

Abstract

In Binary Product Quantization (BPQ), where product quantization is applied to binary data, the traditional k-majority method is used for clustering, with centroids determined based on Hamming distance and majority vote for each bit. However, this approach often leads to a degradation in clustering quality, negatively impacting BPQ's performance. To address these challenges, we introduce Binary-to-Real-and-Back K-Means (BRB-KMeans), a novel method that initially transforms binary data into real-valued vectors, performs k-means clustering on these vectors, and then converts the generated centroids back into binary data. This innovative approach significantly enhances clustering quality by leveraging the high clustering quality of k-means in the real-valued vector space, thereby facilitating future quantization for binary data. Through extensive experiments, we demonstrate that BRB-KMeans significantly enhances clustering quality and overall BPQ performance, notably outperforming traditional methods.

References

[1]
David Arthur and Sergei Vassilvitskii. 2007. K-means the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 1027--1035.
[2]
Artem Babenko and Victor Lempitsky. 2014. Additive quantization for extreme vector compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 931--938.
[3]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2013), 744--755.
[4]
Costantino Grana, Daniele Borghesani, Marco Manfredi, and Rita Cucchiara. 2013. A fast approach for integrating ORB descriptors in the bag of words model. In Multimedia Content and Mobile Devices, Vol. 8667. SPIE, 72--79.
[5]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128.
[6]
Suwon Lee, SuGil Choi, and Hyun S Yang. 2015. Bag-of-binary-features for fast image representation. Electronics Letters 51, 7 (2015), 555--557.
[7]
Stefan Leutenegger, Margarita Chli, and Roland Y Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In 2011 International conference on computer vision. Ieee, 2548--2555.
[8]
Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2019), 1475--1488.
[9]
Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh. 2018. A survey of product quantization. ITE Transactions on Media Technology and Applications 6, 1 (2018), 2--10.
[10]
David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. Ieee, 2161--2168.
[11]
Ting Zhang, Chao Du, and Jingdong Wang. 2014. Composite quantization for approximate nearest neighbor search. In International Conference on Machine Learning. PMLR, 838--846.

Index Terms

  1. BRB-KMeans: Enhancing Binary Data Clustering for Binary Product Quantization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2024
    3164 pages
    ISBN:9798400704314
    DOI:10.1145/3626772
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2024

    Check for updates

    Author Tags

    1. binary clustering
    2. binary data
    3. binary vector
    4. product quantization

    Qualifiers

    • Short-paper

    Funding Sources

    • This research was supported by Regional innovation Strategy (RIS) through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(MOE)
    • This research was partially supported by Regional innovation Strategy (RIS) through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(MOE) and the NRF grant.

    Conference

    SIGIR 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 185
      Total Downloads
    • Downloads (Last 12 months)185
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media