{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T20:00:15Z","timestamp":1759694415692},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2013,8,18]],"date-time":"2013-08-18T00:00:00Z","timestamp":1376784000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2013,8,18]],"date-time":"2013-08-18T00:00:00Z","timestamp":1376784000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Inf Retrieval"],"published-print":{"date-parts":[[2014,6]]},"DOI":"10.1007\/s10791-013-9230-7","type":"journal-article","created":{"date-parts":[[2013,8,17]],"date-time":"2013-08-17T07:27:03Z","timestamp":1376724423000},"page":"203-228","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Discover hidden web properties by random walk on bipartite graph"],"prefix":"10.1007","volume":"17","author":[{"given":"Yan","family":"Wang","sequence":"first","affiliation":[]},{"given":"Jie","family":"Liang","sequence":"additional","affiliation":[]},{"given":"Jianguo","family":"Lu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,8,18]]},"reference":[{"key":"9230_CR1","volume-title":"Handbook of capture\u2013recapture analysis","author":"S. Amstrup","year":"2005","unstructured":"Amstrup, S., McDonald, T. & Manly, B. (2005). Handbook of capture\u2013recapture analysis. Princeton, NJ: Princeton University Press."},{"key":"9230_CR2","doi-asserted-by":"crossref","unstructured":"Bar-Yossef, Z. & Gurevich, M. (2006). Random sampling from a search engine\u2019s index. In Proceedings of the 15th international conference on World Wide Web (pp. 367\u2013376) Edinburgh, Scotland: ACM.","DOI":"10.1145\/1135777.1135833"},{"issue":"5","key":"9230_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1411509.1411514","volume":"55","author":"Z. Bar-Yossef","year":"2008","unstructured":"Bar-Yossef, Z. & Gurevich, M. (2008). Random sampling from a search engine\u2019s index. Journal of the ACM, 55(5), 1\u201374.","journal-title":"Journal of the ACM"},{"issue":"4","key":"9230_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2019643.2019645","volume":"5","author":"Z. Bar-Yossef","year":"2011","unstructured":"Bar-Yossef, Z. & Gurevich, M. (2011). Efficient search engine measurements. ACM Transactions on the Web (TWEB), 5(4), 1\u201348.","journal-title":"ACM Transactions on the Web (TWEB)"},{"key":"9230_CR5","doi-asserted-by":"crossref","unstructured":"Bergman, M. K. (2001). White paper: The deep web: Surfacing hidden value. Journal of Electronic Publishing, 7(1).","DOI":"10.3998\/3336451.0007.104"},{"issue":"1\u20137","key":"9230_CR6","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1016\/S0169-7552(98)00127-5","volume":"30","author":"K. Bharat","year":"1998","unstructured":"Bharat, K., & Broder, A. (1998). A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems, 30(1\u20137), 379\u2013388.","journal-title":"Computer Networks and ISDN Systems"},{"key":"9230_CR7","doi-asserted-by":"crossref","unstructured":"Broder, A., et\u00a0al. (2006). Estimating corpus size via queries. In CIKM (pp. 594\u2013603). ACM.","DOI":"10.1145\/1183614.1183699"},{"issue":"2","key":"9230_CR8","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1145\/382979.383040","volume":"19","author":"J. Callan","year":"2001","unstructured":"Callan, J., & Connell, M. (2001). Query-based sampling of text databases. ACM Transactions on Information Systems (TOIS), 19(2), 97\u2013130.","journal-title":"ACM Transactions on Information Systems (TOIS)"},{"issue":"2","key":"9230_CR9","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1145\/304181.304224","volume":"28","author":"J. Callan","year":"1999","unstructured":"Callan, J., Connell, M., & Du, A. (1999). Automatic discovery of language models for text databases. ACM SIGMOD Record, 28(2), 479\u2013490.","journal-title":"ACM SIGMOD Record"},{"issue":"1","key":"9230_CR10","doi-asserted-by":"publisher","first-page":"201","DOI":"10.2307\/2532750","volume":"48","author":"A Chao","year":"1992","unstructured":"Chao, A., Lee, S. & Jeng, S. (1992). Estimating population size for capture\u2013recapture data when capture probabilities vary by time and individual animal. Biometrics, 48(1), 201\u2013216.","journal-title":"Biometrics"},{"key":"9230_CR11","volume-title":"Sampling techniques","author":"W. Cochran","year":"1977","unstructured":"Cochran, W. (1977). Sampling techniques. New York: Wiley."},{"issue":"3\/4","key":"9230_CR12","doi-asserted-by":"publisher","first-page":"343","DOI":"10.2307\/2333183","volume":"45","author":"J. Darroch","year":"1958","unstructured":"Darroch, J. (1958). The multiple-recapture census: I. Estimation of a closed population. Biometrika, 45(3\/4), 343\u2013359.","journal-title":"Biometrika"},{"key":"9230_CR13","doi-asserted-by":"crossref","unstructured":"Dasgupta, A., Das, G. & Mannila H. (2007). A random walk approach to sampling hidden databases. In SIGMOD (pp. 629\u2013640). ACM.","DOI":"10.1145\/1247480.1247550"},{"key":"9230_CR14","doi-asserted-by":"crossref","unstructured":"Dasgupta, A., Jin, X., Jewell, B., Zhang, N. & Das, G. (2010). Unbiased estimation of size and other aggregates over hidden web databases. In SIGMOD (pp. 855\u2013866). ACM.","DOI":"10.1145\/1807167.1807259"},{"key":"9230_CR15","unstructured":"Gjoka, M., Kurant, M., Butts, C. & Markopoulou A. (2009). A walk in facebook: Uniform sampling of users in online social networks. Arxiv preprint [arXiv:0906.0060]."},{"issue":"9","key":"9230_CR16","doi-asserted-by":"publisher","first-page":"1872","DOI":"10.1109\/JSAC.2011.111011","volume":"29","author":"M. Gjoka","year":"2011","unstructured":"Gjoka, M., Kurant, M., Butts, C., & Markopoulou A. (2011). Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications, 29(9), 1872\u20131892.","journal-title":"IEEE Journal on Selected Areas in Communications"},{"key":"9230_CR17","doi-asserted-by":"crossref","unstructured":"Gulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. In Special interest tracks and posters of the 14th international conference on World Wide Web (pp 902\u2013903). ACM.","DOI":"10.1145\/1062745.1062789"},{"key":"9230_CR18","unstructured":"Haas P. J., Naughton J.F., Seshadri S., & Stokes L. (1995). Sampling-Based estimation of the number of distinct values of an attribute. In VLDB (pp. 311\u2013322)."},{"issue":"4","key":"9230_CR19","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1214\/aoms\/1177731356","volume":"14","author":"M. Hansen","year":"1943","unstructured":"Hansen, M. & Hurwitz, W. (1943). On the theory of sampling from finite populations. The Annals of Mathematical Statistics, 14(4), 333\u2013362.","journal-title":"The Annals of Mathematical Statistics"},{"issue":"1\u20136","key":"9230_CR20","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1016\/S1389-1286(00)00055-4","volume":"33","author":"M. Henzinger","year":"2000","unstructured":"Henzinger, M., Heydon, A., Mitzenmacher, M., Najork, M. (2000). On near-uniform URL sampling. Computer Networks, 33(1\u20136), 295\u2013308.","journal-title":"Computer Networks"},{"key":"9230_CR21","doi-asserted-by":"crossref","unstructured":"Ipeirotis, P. G., Gravano, L., & Sahami, M. (2001). Probe, count, and classify: categorizing hidden web databases. In SIGMOD (pp. 67\u201378). ACM.","DOI":"10.1145\/376284.375671"},{"key":"9230_CR22","doi-asserted-by":"crossref","unstructured":"Katzir, L., Liberty, E., & Somekh, O. (2011). Estimating sizes of social networks via biased sampling. In WWW (pp. 597\u2013606). ACM.","DOI":"10.1145\/1963405.1963489"},{"issue":"9","key":"9230_CR23","doi-asserted-by":"publisher","first-page":"1799","DOI":"10.1109\/JSAC.2011.111005","volume":"29","author":"M. Kurant","year":"2011","unstructured":"Kurant, M., Markopoulou, A., & Thiran, P. (2011). Towards unbiased bfs sampling. IEEE Journal on Selected Areas in Communications, 29(9), 1799\u20131809.","journal-title":"IEEE Journal on Selected Areas in Communications"},{"issue":"5360","key":"9230_CR24","first-page":"98","volume":"280","author":"S. Lawrence","year":"1998","unstructured":"Lawrence, S., & Giles, C. L.  (1998). Searching the world wide web. Science, 280(5360), 98\u2013100.","journal-title":"HortScience"},{"key":"9230_CR26","doi-asserted-by":"crossref","unstructured":"Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In SIGKDD pp. 631\u2013636. ACM.","DOI":"10.1145\/1150402.1150479"},{"key":"9230_CR27","volume-title":"Monte Carlo strategies in scientific computing","author":"J. Liu","year":"2008","unstructured":"Liu, J. (2008). Monte Carlo strategies in scientific computing. New York: Springer."},{"issue":"1","key":"9230_CR28","first-page":"1","volume":"2","author":"L. Lov\u00e1sz","year":"1993","unstructured":"Lov\u00e1sz, L. (1993). Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2(1), 1\u201346.","journal-title":"Combinatorics, Paul Erdos is Eighty"},{"key":"9230_CR29","doi-asserted-by":"crossref","unstructured":"Lu, J. (2008). Efficient estimation of the size of text deep web data source. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 1485\u20131486). ACM.","DOI":"10.1145\/1458082.1458346"},{"issue":"8","key":"9230_CR30","doi-asserted-by":"publisher","first-page":"866","DOI":"10.1016\/j.datak.2010.03.007","volume":"69","author":"J. Lu","year":"2010","unstructured":"Lu, J. (2010). Ranking bias in deep web size estimation using capture recapture method. Data & Knowledge Engineering, 69(8), 866\u2013879.","journal-title":"Data & Knowledge Engineering"},{"issue":"1","key":"9230_CR31","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1007\/s10791-009-9107-y","volume":"13","author":"J. Lu","year":"2010","unstructured":"Lu, J., & Li, D. (2010). Estimating deep web data source size by capture\u2013recapture method. Information Retrieval, 13(1), 70\u201395.","journal-title":"Information Retrieval"},{"key":"9230_CR32","doi-asserted-by":"crossref","unstructured":"Lu, J., & Li, D. (2012). Sampling online social networks by random walk. In ACM SIGKDD workshop on hot topics in online social networks (pp. 33\u201340). ACM.","DOI":"10.1145\/2392622.2392628"},{"key":"9230_CR33","doi-asserted-by":"crossref","unstructured":"Lu, J. & Li, D. (2013, in press). Bias correction in small sample from big data. IEEE Transactions of Knowledge and Data Engineering, TKDE.","DOI":"10.1109\/TKDE.2012.220"},{"key":"9230_CR34","doi-asserted-by":"crossref","unstructured":"Lu, J., Wang, Y., Liang, J., Chen, J., & Liu, J. (2008). An approach to deep web crawling by sampling. In IEEE\/WIC\/ACM international conference on web intelligence and intelligent agent technology, 2008. WI-IAT\u201908(Vol. 1, pp. 718\u2013724).","DOI":"10.1109\/WIIAT.2008.392"},{"issue":"2","key":"9230_CR35","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.14778\/1454159.1454163","volume":"1","author":"J. Madhavan","year":"2008","unstructured":"Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., & Halevy, A. (2008). Google\u2019s deep web crawl. Proceedings of the VLDB Endowment 1(2), 1241\u20131252.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"9230_CR36","doi-asserted-by":"publisher","first-page":"1087","DOI":"10.1063\/1.1699114","volume":"21","author":"N. Metropolis","year":"1953","unstructured":"Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 1087.","journal-title":"The Journal of Chemical Physics"},{"issue":"3","key":"9230_CR37","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1016\/S0378-4371(01)00355-7","volume":"300","author":"M. Montemurro","year":"2001","unstructured":"Montemurro, M. (2001). Beyond the Zipf\u2013Mandelbrot law in quantitative linguistics. Physica A: Statistical Mechanics and Its Applications, 300(3), 567\u2013578.","journal-title":"Physica A: Statistical Mechanics and Its Applications"},{"key":"9230_CR38","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780199206650.001.0001","volume-title":"Networks: An introduction","author":"M. Newman","year":"2010","unstructured":"Newman, M. (2010). Networks: An introduction. Oxford: Oxford University Press."},{"issue":"3","key":"9230_CR39","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1561\/1500000017","volume":"4","author":"C. Olston","year":"2010","unstructured":"Olston, C., & Najork, M. (2010). Web Crawling. Foundations and Trends in Information Retrieval, 4(3), 175\u2013246.","journal-title":"Foundations and Trends in Information Retrieval"},{"key":"9230_CR40","first-page":"1","volume":"99","author":"M. Papagelis","year":"2011","unstructured":"Papagelis, M., Das, G., & Koudas, N. (2011). Sampling online social networks. IEEE Transactions on Knowledge and Data Engineering, 99, 1\u20131.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"9230_CR41","unstructured":"Raghavan, S., & Garcia-Molina, H. (2001). Crawling the hidden web. In VLDB (pp. 129\u2013138). Morgan Kaufmann Publishers Inc."},{"key":"9230_CR42","doi-asserted-by":"crossref","unstructured":"Rasti, A., Torkjazi, M., Rejaie, R., Duffield, N., Willinger, W. & Stutzbach, D. (2009) Respondent-driven sampling for characterizing unstructured overlays. In INFOCOM, IEEE (pp. 2701\u20132705).","DOI":"10.1109\/INFCOM.2009.5062215"},{"key":"9230_CR43","unstructured":"Reuters, T. (2008). Reuters coprus. http:\/\/about.reuters.com\/researchandstandards\/corpus\/, December 2008."},{"issue":"1","key":"9230_CR44","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1111\/j.0081-1750.2004.00152.x","volume":"34","author":"M. Salganik","year":"2004","unstructured":"Salganik, M., & Heckathorn, D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling. Sociological methodology, 34(1), 193\u2013240.","journal-title":"Sociological Methodology"},{"key":"9230_CR45","doi-asserted-by":"crossref","unstructured":"Shokouhi, M., & Si, L. (2011). Federated search. Hanover, MA: Now Publishers.","DOI":"10.1561\/9781601984234"},{"key":"9230_CR46","doi-asserted-by":"crossref","unstructured":"Shokouhi, M., Zobel, J., Scholer, F., & Tahaghoghi, S. M. M. (2006). Capturing collection size for distributed non-cooperative retrieval. In SIGIR (pp. 316\u2013323). ACM.","DOI":"10.1145\/1148170.1148227"},{"key":"9230_CR47","doi-asserted-by":"crossref","unstructured":"Si, L., Jin, R., Callan, J., & Ogilvie P. (2002). A language modeling framework for resource selection and results merging. In Proceedings of the 11th CIKM (pp. 391\u2013397). ACM.","DOI":"10.1145\/584792.584856"},{"key":"9230_CR48","doi-asserted-by":"publisher","DOI":"10.1002\/9781118162934","volume-title":"Sampling","author":"S. Thompson","year":"2012","unstructured":"Thompson, S. (2012). Sampling. New York: Wiley."},{"issue":"1","key":"9230_CR49","doi-asserted-by":"crossref","first-page":"75","DOI":"10.3233\/WIA-2012-0232","volume":"10","author":"Y. Wang","year":"2012","unstructured":"Wang, Y., Lu, J., Liang, J., Chen, J. & Liu, J. (2012). Selecting queries from sample to crawl deep web data sources. Web Intelligence and Agent Systems, 10(1), 75\u201388.","journal-title":"Web Intelligence and Agent Systems"},{"issue":"1","key":"9230_CR50","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1177\/0049124108318333","volume":"37","author":"C. Wejnert","year":"2008","unstructured":"Wejnert, C., & Heckathorn, D. (2008). Web-based network sampling. Sociological Methods & Research, 37(1), 105\u2013134.","journal-title":"Sociological Methods & Research"},{"key":"9230_CR51","unstructured":"Wu, P., Wen, J., Liu, H., & Ma, W. (2006). Query selection techniques for efficient crawling of structured web sources. In ICDE, IEEE."},{"issue":"2","key":"9230_CR52","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1504\/IJSCCPS.2011.044172","volume":"1","author":"S. Ye","year":"2011","unstructured":"Ye, S., & Wu, S. (2011). Estimating the size of online social networks. International Journal of Social Computing and Cyber-Physical Systems, 1(2), 160\u2013179.","journal-title":"International Journal of Social Computing and Cyber-Physical Systems"},{"key":"9230_CR53","doi-asserted-by":"crossref","unstructured":"Zhang, M., Zhang, M. N., & Das, G. (2011). Mining a search engine\u2019s corpus: efficient yet unbiased sampling and aggregate estimation. In SIGMOD (pp. 793\u2013804). ACM.","DOI":"10.1145\/1989323.1989406"},{"key":"9230_CR54","doi-asserted-by":"crossref","unstructured":"Zhou, J., Li, Y., Adhikari, V., & Zhang, Z. (2011). Counting youtube videos via random prefix sampling. In SIGCOMM (pp. 371\u2013380). ACM.","DOI":"10.1145\/2068816.2068851"},{"key":"9230_CR55","unstructured":"Zipf, G. (1949). Human behavior and the principle of least effort."}],"container-title":["Information Retrieval"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10791-013-9230-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10791-013-9230-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10791-013-9230-7","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10791-013-9230-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T14:10:47Z","timestamp":1704204647000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10791-013-9230-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,18]]},"references-count":54,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,6]]}},"alternative-id":["9230"],"URL":"https:\/\/doi.org\/10.1007\/s10791-013-9230-7","relation":{},"ISSN":["1386-4564","1573-7659"],"issn-type":[{"value":"1386-4564","type":"print"},{"value":"1573-7659","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,8,18]]},"assertion":[{"value":"24 October 2012","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 July 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 August 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}