{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T01:38:22Z","timestamp":1768700302875,"version":"3.49.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>\n            There is a growing realization that modern database management systems (DBMSs) must be able to manage data that contains\n            <jats:italic>uncertainties<\/jats:italic>\n            that are represented in the form of probabilistic relations. Consequently, the design of each core DBMS component must be revisited in the presence of uncertain and probabilistic information. In this paper, we study how to build histogram synopses for probabilistic relations, for the purposes of enabling both DBMS-internal decisions (such as indexing and query planning), and (possibly, user-facing) approximate query processing tools. In contrast to initial work in this area, our\n            <jats:italic>probabilistic histograms<\/jats:italic>\n            retain the key\n            <jats:italic>possible-worlds semantics<\/jats:italic>\n            of probabilistic data, allowing for more accurate, yet concise, representation of the uncertainty characteristics of data and query results. We present a variety of techniques for building optimal probabilistic histograms, each one tuned to a different choice of approximation-error metric. We show that these can be incorporated into a general Dynamic Programming (DP) framework, which generalizes that used for existing histogram constructions. The end result is a histogram where each \"bucket\" is approximately represented by a compact probability distribution function (PDF), which can be used as the basis for query planning and approximate query answering. We present novel, polynomial-time algorithms to find optimal probabilistic histograms for a variety of PDF-error metrics (including variation distance, sum squared error, max error and EMD\n            <jats:sub>1<\/jats:sub>\n            ). Our experimental study shows that our probabilistic histogram synopses can accurately capture the key statistical properties of uncertain data, while being much more compact to store and work with than the original uncertain relations.\n          <\/jats:p>","DOI":"10.14778\/1687627.1687687","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"526-537","source":"Crossref","is-referenced-by-count":17,"title":["Probabilistic histograms for probabilistic data"],"prefix":"10.14778","volume":"2","author":[{"given":"Graham","family":"Cormode","sequence":"first","affiliation":[{"name":"AT&amp;T Labs--Research"}]},{"given":"Antonios","family":"Deligiannakis","sequence":"additional","affiliation":[{"name":"Technical University of Crete"}]},{"given":"Minos","family":"Garofalakis","sequence":"additional","affiliation":[{"name":"Technical University of Crete"}]},{"given":"Andrew","family":"McGregor","sequence":"additional","affiliation":[{"name":"University of Massachusetts, Amherst"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/1513922"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/237814.237823"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497507"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218195906001951"},{"issue":"1","key":"e_1_2_1_5_1","first-page":"5","article-title":"An introduction to ULDBs and the Trio system","volume":"29","author":"Benjelloun O.","year":"2006","unstructured":"O. Benjelloun , A. D. Sarma , C. Hayworth , and J. Widomn . An introduction to ULDBs and the Trio system . IEEE Data Engineering Bulletin , 29 ( 1 ): 5 -- 16 , Mar. 2006 . O. Benjelloun, A. D. Sarma, C. Hayworth, and J. Widomn. An introduction to ULDBs and the Trio system. IEEE Data Engineering Bulletin, 29(1):5--16, Mar. 2006.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066277"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247513"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.74"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1316689.1316764"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1265530.1265531"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944973"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87744-8_37"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/800057.808675"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132863.1132873"},{"key":"e_1_2_1_15_1","volume-title":"International Conference on Very Large Data Bases","author":"Guha S.","year":"2004","unstructured":"S. Guha , K. Shim , and J. Woo . REHIST: Relative error histogram construction algorithms . In International Conference on Very Large Data Bases , 2004 . S. Guha, K. Shim, and J. Woo. REHIST: Relative error histogram construction algorithms. In International Conference on Very Large Data Bases, 2004."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315451.1315455"},{"key":"e_1_2_1_17_1","volume-title":"International Conference on Very Large Data Bases","author":"Jagadish H. V.","year":"1999","unstructured":"H. V. Jagadish , N. Koudas , and S. Muthukrishnan . Mining deviants in a time-series database . In International Conference on Very Large Data Bases , 1999 . H. V. Jagadish, N. Koudas, and S. Muthukrishnan. Mining deviants in a time-series database. In International Conference on Very Large Data Bases, 1999."},{"key":"e_1_2_1_18_1","volume-title":"Int. Conf. on Very Large Data Bases","author":"Jagadish H. V.","year":"1998","unstructured":"H. V. Jagadish , N. Koudas , S. Muthukrishnan , V. Poosala , K. Sevcik , and T. Suel . Optimal histograms with quality guarantees . In Int. Conf. on Very Large Data Bases , 1998 . H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In Int. Conf. on Very Large Data Bases, 1998."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376686"},{"key":"e_1_2_1_20_1","volume-title":"ACM-SIAM Symposium on Discrete Algorithms","author":"Jayram T. S.","year":"2007","unstructured":"T. S. Jayram , S. Kale , and E. Vee . Efficient aggregation algorithms for probabilistic data . In ACM-SIAM Symposium on Discrete Algorithms , 2007 . T. S. Jayram, S. Kale, and E. Vee. Efficient aggregation algorithms for probabilistic data. In ACM-SIAM Symposium on Discrete Algorithms, 2007."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1265530.1265565"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/645503.656259"},{"key":"e_1_2_1_23_1","volume-title":"International Conference on Very Large Data Bases","author":"Poosala V.","year":"1997","unstructured":"V. Poosala and Y. E. Ioannidis . Selectivity estimation without the attribute value independence assumption . In International Conference on Very Large Data Bases , 1997 . V. Poosala and Y. E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In International Conference on Very Large Data Bases, 1997."},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4612-1098-6","volume-title":"Computational Geometry: An Introduction","author":"Preparata F. P.","year":"1985","unstructured":"F. P. Preparata and M. Shamos . Computational Geometry: An Introduction . Springer-Verlag , 2 nd edition, 1985 . F. P. Preparata and M. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 2nd edition, 1985.","edition":"2"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the Workshop on Management of Uncertain Data (MUD)","author":"Sarma A. D.","year":"2008","unstructured":"A. D. Sarma , P. Agrawal , S. Nabar , and J. Widom . Towards special-purpose indexes and statistics for uncertain data . In Proceedings of the Workshop on Management of Uncertain Data (MUD) , 2008 . A. D. Sarma, P. Agrawal, S. Nabar, and J. Widom. Towards special-purpose indexes and statistics for uncertain data. In Proceedings of the Workshop on Management of Uncertain Data (MUD), 2008."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69497-7_7"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497514"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687687","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:29:17Z","timestamp":1672226957000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687687"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687687"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687687","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}