Big Data Indexing
Big Data Indexing
net/publication/350574309
CITATIONS READS
3 2,503
3 authors, including:
All content following this page was uploaded by Usman Ali on 03 October 2021.
Abstract. Big data analytics is one of the best ways of extracting val-
ues and benefits from the hugely accumulated data. The rate at which
the global data is accumulating and the rapid and continuous intercon-
necting of people and devices is overwhelming. This has further poses
additional challenge to finding even faster techniques of analyzing and
mining the big data despite the emergence of specific big data tools. In-
dexing and indexing data structures have played an important role in
providing faster and improved ways of achieving data processing, mining
and retrieval in relational database management systems. In doing so, in-
dex have aided in data mining by taking lees time to process and retrieve
data. The indexing techniques and data structures have the potential of
bring the same benefits to big data analytics if properly integrated into
the big data analytical platforms. A lot of researches have been con-
ducted in that direction, and this paper attempts to bring forward how
the indexing techniques have been used to benefit the big data mining
and analytic. Hence, this has can bring the impact that indexing have
on RDBMS to the folds of big data mining and analytics.
Keywords: Big data · Big data analytics · Data mining · Indexing tech-
niques · Index.
1 Intorduction
The exponential growing nature of global data as was collated and presented by
the Statista in their report titled -Volume of data/information created worldwide
from 2010 to 2025. The report indicated that the world overall volume of both
created and copied data as of that 2020 year would be 50.5ZB . Also, this volume
is expected to rise three folds within five years estimated to be 175 by 2025 [40].
This fact gives the sense of the hugeness of data size that is termed as Big data
[10], [9]. This global data is accumulated from various forms, comprising of
structured, semi-structured and unstructured masses of data that need analysis
so dearly. Since, the data are collated and stored into datasets, then, those
enormous datasets are also referred to as Big Data/Datasets [36], [5], [46].
According to Chen et al[10] and Chen et al[9], the increase in volume indi-
cates how big the generation, the collection and the scaling of data masses have
become. While increase in velocity indicates the need for timely and rapid collec-
tion and analysis of Big data in order to maximally utilize its commercial value.
The increase in Variety indicates that big data comprises of various forms which
may include unstructured and semi-structured besides the usual structured data
[4].
The IDC on its part presented a different opinion on Big Data. In its 2011
report, IDC viewed big data from its technological architectural angle as the data
designed for extraction of economic value. The economic value comes from the
very large volume and widely varied data, which have high-velocity of discovery,
capture, and analysis [46].
The IDC definition added another ’V’ to make it 4Vs model with the fourth
’V’ been Value. Therefore, a broader definition of the big data could be derived
from all earlier ones as the term used in describing enormous data sets, which
contents are characterized by the Four Vs: i. Volume - very large amount of
data; ii. Variety - different forms of the data, i.e. structured, semi-structured,
and unstructured gathered from different sources including images, documents
and complex record; iii. Velocity - the data has constantly changing contents,
which come from complementary data collection, archived and streamed data
[10] [6] [47]; iv. Value - very huge value and very low density [10].
In addition, recent literatures include Veracity as the fifth ’V’ characteristic
of the big data after been convinced that none of the earlier described char-
acteristics of big data have covered that. By Veracity, it means that there are
uncertainty and effect of accuracy on the quality of collected data [5] [4] [44]
[46].
The value in big data is extracted by proper and efficient retrieval of the
data from the big datasets. Thus, fast processing of the retrieved data is deter-
minant to faster and timely analysis and access to the required data. Indexing
is one of the most useful techniques for faster data retrieval during processing
and accessing. Therefore, the industrial 4.0 data-driven processes are in need
of such faster data retrieval. Once the retrieval and access processes involving
big data usage are made faster a lot of benefits are achieved. These benefits in-
clude energy/power saving and improving hardware durability and reduce heat
generation.
The main objective of this chapter was to present an indexing techniques that
used in searching and effective retrieval of data during data mining. The strategy
of all the indexing techniques is to restrict the amount of the input data to
be processed during the mining of any dataset. The chapter highlights those
indexing techniques that have the potentials of working better with big data.
1.2 Taxonomy of the Chapter
The focus of the chapter is to include in it all relevant literatures that are
searched and download from the major publication databases. The databases in-
clude IEEEEXplore, DBLP, Scopus, Springer and others.Then, titles, abstracts,
introductions and conclusions of paper that covered application of indexing in
big data mining and analytics was conducted. This was done using a selection
criteria in order to pick the papers that matched and have highlighted the struc-
ture of various indexing approaches used for big data mining and analytics. In
addition, the chapter identifies the indexing approaches that have potentials bet-
ter for big data mining and those that have less potentials. Fig. 1 depicted the
diagrammatic sketch of the Taxonomy used for the chapter.
The index has two architectures: Non-Clustered and Clustered index as shown
in Figure 3. The Non-Clustered architecture presents data in an arbitrary order,
but maintains a logical ordering in which rows may be spread out in a file/table
without considering the indexed column expression. This architecture uses a tree
based indexing that has a sorted index keys and pointers to records at its leaf
node level. The Non-clustered index architecture is characterized by having the
order of the physical rows of the indexed data differing with the order in the
index [34]. The Non-clustered index sketch is displayed in Figure 3.
On the other hand, Clustered index architecture changes the blocks of data
into a certain distinct order to match the index. This results in the ordering of
the row data. It can greatly increase the overall speed of retrieval in a sequential
accessed data or reverse order of the index or among a selected range of items.
The major characteristic of the Clustered index architecture is that the ordering
of the physical data rows is in accordance with that of the index blocks that
points to them [31]. The clustered index sketch is displayed in Figure 4.
The Bitmap index uses bit array called bitmaps to store the bulk of its data.
Bitmap is a special type of index that uses bitwise logical operation on the
bitmaps to answer most of the queries run against it. The Bitmap index works
basically in situations where index values are repeated very frequently unlike
other index types commonly used. The other types are most efficient when in-
dexed values are not repeated at all or they are repeated smaller number of times.
The gender field of a database table is a good example for a bitmap index. This
is so, because no matter how many tuples there are in a database table, the field
will only have 2 possible values: male or female. A typical bitmap index data
structure is shown in Figure 5
The bitmap index has been used in wide variety of areas and in big data an-
alytics. For instance, bitmap index application was used in the aspect of index
compression. The approach helps the compression to work better with high cardi-
nality attribute data. Wu el at [41], presents a study and analysis of some of the
compression techniques that use bitmap indexing namely: Byte-Aligned Bitmap
Compression (BBC) and Word Aligned Hybrid (WAH). These techniques were
able to reduce compressed data sizes and improved their performance. The au-
thors motivation was the fact that most of the empirical researches do not include
comparative analysis amongst these different techniques and the result of their
own work showed that compressed bitmap indexes appeared to be smaller in size
compared to that of B+-Tree with WAH occupying half of the space of B+-Tree
while the BBC occupies half the space of WAH.
Another work was done by Fusco et al [13], using bitmap index as a com-
pression approach to minimize CPU workload and consumption rate of disk. The
platform used for the work was a streaming network data, which require a real-
time indexing. The authors introduced what they called COMPAX, a variant of
compressed bitmap index that supersedes the Word-Aligned-Hybrid (WAH) in
terms of throughput of indexing, shorter retrieval time and higher compression
rate. The NETwork Flow Index (NET-FLI), which highly optimizes real-time
indexing as well as data retrieval from larger-scale repositories of network was
used. The NET-FLI synergies COMPEX and Locality Sensitive Hashing (LSH),
used for streaming reordering in an online set up to achieve the target of the
research. This combination results in higher insertion rates of up to 1 million
flows per second many folds over what is obtainable in typical commercial net-
work flow. The ISPB, also allows the performance of complex analysis jobs by
administrators.
The Bitmap technique as a candidate of indexing in big data have been tried
and in some specific situation and it gives the some improved results. However,
the Bitmap index does generate large volume of data as its data structure along
side the volume of the big data itself. Hence, cannot be used for general data
retrieval in big data, because the big data contains variety of data values.
The Dense indexing approach uses a file/table with pair of keys and pointers
to each record in the data file/table, which is sorted. Every key in the dense
index is associated to a specific pointer to one of such records. If the underlining
architecture of the indexes is a clustered one with duplicate keys, dense index
just points to the first record with the said key [15]. Figure 6 displayed a sample
of The Dense index. In the dense index, each index entry consists of a search-key
and a pointer. The search-key holds the value to be searched while the pointer
stored the identifier to the disk location containing the corresponding record and
the offset that identifies the point where the record starts within the block.
Fig. 6: A Dense Index on Employee Table
The dense index as an ordered index either stores index entry for all records
when the approach it is using is Non-Clustered, or it just stores the index entry
to the first search-key when using the clustered index approach [37]. The dense
index uses file in storing its index data structure and there is a dedicated pointer
to each record and the data has to be ordered. These two facts suggested that
the index is going to grow so big that it will be close to the size to that of the
stored data. Hence, there are no studies using dense index in big data retrieval.
The sparse indexing method also uses a file/table that contains pair of search-
keys and pointers. The pointers are pointing to blocks instead of individual
records in the data file/table, which is sorted in the order of the search-keys.
In sparse index, index entries appeared for some of search-key values. An index
entry is associated to a specific pointer to one of such blocks. If the underlining
architecture of the indexes is a clustered one with duplicate key then the index
just points to the lowest search-key in each block [15]. To locate any search-key’s
record, an index entry is searched with the largest value that is less than or equal
to the given search-key value. Then, the searching starts at the record pointed
to by the index entry and move down until the target is found [34] and [37].
Figure 7 depicts the explanation given above for sparse index.
Fig. 7: A Sparse Index on Employee Table
The dense and sparse indexes are the most common type of ordered index that
most relational databases use for generating query execution plans. However, for
both the dense and sparse index types, the use of files/tables to keep the pairs of
search-keys and pointers may make them very unsuitable for big data indexing
due to two reasons: 1. The records and block of data file for big data will be
distributed over different clusters, which will make it very difficult to maintain
such files. 2. The volume of the big data will make the size of such index files to
be unnecessarily very large, which may lead to unreasonable costs of space and
maintenance time.
3 Online Indexes
The last three indexing methods also referred to as Oblivious indexes work
by creating the index on-the-fly and automatically. The indexes use the concept
of monitoring any issued query and then build the index as the side effect of the
query’s execution. However, all the above-discussed indexing techniques work
with RDBMS and in an OLTP approach. The OLTP usually has short queries
that are frequently posed to the database as the main operational system. So,
there are number of arriving queries to increment and /or optimize an index if
the need arises. On the contrary, the situation is different in the case of OLAP,
especially in batch-oriented situations, which are the most common analysis
when it comes to big data. Also, the mentioned cases are mostly different when
MapReduce is to be used for such analysis.
The drawbacks of the RDBMS in processing big data implies that their cor-
responding index types have the same drawbacks. This prompted researchers of
big data to customize and in some cases, develop different indexing strategies for
the big data analysis as mentioned earlier. These developed indexing strategies
for big data information retrieval systems are being used by big data analytics.
On the other hand, the Per-Term indexing uses the map function to emit
tuple in the form <term, (doc-ID, tf)>this reduces the number of emit operation
as only unique term per document are emitted. The reduce function in this
inter-operation, only sorts the instances by document to obtain the final posting
list sorted by ascending doc-ID. Ivory Information Retrieval System uses this
approach. Also, a combiner function can be used to generate tfs by performing
a localized merge on each map task’s output.
This is the inverted indexing technique used by Nutch platform on the Hadoop
to index document for faster search. Nutch tokenized the document during map
phase and the map function emits tuples in form of <document, doc-ID >while
the reduce phase writes all index structures. Though, this strategy emits less,
but the value of each emit use to have more data and have reduced intermediate
results. Thus, achieve higher levels of compression than single terms. Documents
are indexed on same reduce task easily due to the sorting of document names
[27].
taking up of intermediate results by reduce task, the flushed partial indexes are
sorted and stored on disk first by map number and then by flushed numbers.
In order to achieved globally correct ordering of posting list for each term, the
posting lists are merged by map number and flushed number. The term, posting
lists are merged together by the reduce function to form the standard index
comprising of full posting lists. The standard index is compressed using Elias-
Gamma technique by storing only the distance doc-IDs [27].
All of the four strategies of inverted indexing in MapReduce discussed above,
are the main task focused on and carried out by the MapReduce job. This is
against the primary function of indexing in RDBMS, which is to speed-up access
of stored data in order to improve the performance of other processes. Thus, the
aim of indexing is not only to use to improve parallel processing of the document
contents. Rather, the primary aim of indexing is to improves the performance of
parallel processor itself. This improvement is to be achieved in addition to the
underlining parallel processor that MapReduce is programmed to accomplish.
Thus, there is need for additional indexing scheme that works with the parallel
processor and serves the same purpose with what index does in RDBMS.
For the user defined indexing used in MapReduce, and big data analytics the
Yang and Parker [42], have employed HDFS’s file component as B-Tree nodes to
achieve indexing. In their approach each file contains data and pointer to lower
files in the tree hierarchy, which is considered its children. During query process-
ing the tree is traversed to locate the required segment of data to be processed
using an improved Map-Reduce-Merge-Traverse version of MapReduce. After
locating the data then the map, the reduce and the merge tasks are performed
on it to return the record set that answers the given query.
Also, An, Wang and Wang [3], have used blockIds from the HDFS as the
search keys of their B+-Tree based index. When a given query is to be processed
the B+-Tree based index is first searched to determine the start and the end of
contiguous blocks that formed the index and the result of the search formed the
input data to be scanned. Then, only the blockIds that are returned from such
search are used by MapReduce for main query processing. Hence, through this
process preventing the full scan of the input data is done. In addition, Richter
et al [32], used the copies of replica stored by HDFS, to index different data
attributes, which may likely be used as incoming query’s predicates. When a
MapReduce query arrives, their library checks the fields contained in the query’s
predicates and used the clustered index built on that field to return the blockIds
of the data required to answer the given query.
Furthermore, in all the mentioned studies, the authors used indexing data
structure that scale logarithmically thereby improving data processing and re-
trieval. This is done by preventing the MapReduce from full scan of input data
by guiding the process to just scanning and processing the data that corresponds
to the output of the indexes. Moreover, there many other researches that were
conducted on indexing using different types of big data, however, those research
are closely tied to that type of big data as the indexing data structure and the
index implementations are determined by the nature of the data itself [21]. Table
2 displays the summary of the index approaches, their memory requirement and
big data potentials.
6 Conclusion
Moreover, it can be simply deducted from the above reviews that user defined
index and content indexing as tool for optimizing the performance of information
retrieval has been successful. It can be added also, that there is high potential for
improving big data analytics through the use of advanced indexing techniques
and data structures. Particularly, if these techniques and data structures are
hybridized, improved and customized to work with MapReduce they will surely
improve the performance of big data analytics.
It has been highlighted that, MapReduce is one of the most popular tools
for big data analytics. However, the low-level nature of its implementations has
given rise to the development of the HLQLs. HLQLs ease the programmers
task of handling the analysis. The Review also highlighted the different types of
index in both RDBMS and those used in big data analytics using different big
data analytics platforms including MapReduce and its index approaches. The
improvement can be achieved by changing the indexing approach or using more
efficient data structure.
Table 2: Comparison Between Big Data Analytics Approaches
S/No Indexing Data Memory/ Storage Potentials For Big
Ap- Struc- Data Mining
proach ture
Wu el at [41],Fusco Bitmp Tabular Requires large space Has good potentials
et al [13] of memory and stor- Big data mining
age
Gracia,Ullmanand Dense Tabular Requires large space Not a good approach
Widom [15], Sil- of memory and stor- for big data mining
berschez et.al.,[37] age
Gracia,Ullmanand Sparse Tabular Requires large space Not a good approach
Widom[15],Rys of memory and stor- for big data mining
[34] and Silberschez age
et.al. [37]
Chaudhuri, surajit, Online In- Vectors
Requires small space A good approach for
Narasayya and dexing of memory and stor- big data mining
Vivek [8] age
Idreos et al [22], [22]
Database Arrays Requires small space A good approach for
Cracking of memory and stor- big data mining
age
Graefe, Goetz, Adaptive Tree Requires Average A very good approach
Kuno and Harumi Merge Based space of memory and for big data mining
[18] storage
Yang and Parker B-Tree Tree Requires Average A good approach for
[42] based space of memory and big data mining
storage
An, Wang and B+-Tree Tree Requires small space A good approach for
Wang [3] Based of memory and stor- big data mining
age
Richter et al [32] Hail File based Requires large space A good approach for
of memory and stor- big data mining
age
References
1. Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V., Bu, Y., Carey, M.,
Cetindil, I., Cheelangi, M., Faraaz, K., et al.: Asterixdb: A scalable, open source
bdms. Proceedings of the VLDB Endowment 7(14), 1905–1916 (2014)
2. Amir, A., Franceschini, G., Grossi, R., Kopelowitz, T., Lewenstein, M., Lewenstein,
N.: Managing unbounded-length keys in comparison-driven data structures with
applications to online indexing. SIAM Journal on Computing 43(4), 1396–1416
(2014)
3. An, M., Wang, Y., Wang, W.: Using index in the mapreduce framework. In:
Web Conference (APWEB), 2010 12th International Asia-Pacific. pp. 52–58. IEEE
(2010)
4. Bachlechner, D., Leimbach, T.: Big data challenges: Impact, potential responses
and research needs. In: Emerging Technologies and Innovative Business Practices
for the Transformation of Societies (EmergiTech), IEEE International Conference
on. pp. 257–264. IEEE (2016)
5. Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data
2.0 processing systems: Taxonomy and open challenges. Journal of Grid Computing
14(3), 379–405 (2016)
6. Berman, J.J.: Principles of big data: preparing, sharing, and analyzing complex
information. Newnes (2013)
7. Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: Hyper-pipelining query exe-
cution. In: Cidr. vol. 5, pp. 225–237 (2005)
8. Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress.
In: Proceedings of the 33rd international conference on Very large data bases. pp.
3–14. VLDB Endowment (2007)
9. Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and
technologies: A survey on big data. Information Sciences 275, 314–347 (2014)
10. Chen, M., Mao, S., Liu, Y.: Big data: A survey. Mobile Networks and Applications
19(2), 171–209 (2014)
11. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Communica-
tions of the ACM 53(1), 72–77 (2010)
12. Dias, J., Ogasawara, E., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.:
Algebraic dataflows for big data analysis. In: Big Data, 2013 IEEE International
Conference on. pp. 150–155. IEEE (2013)
13. Fusco, F., Vlachos, M., Stoecklin, M.P.: Real-time creation of bitmap indexes on
streaming network data. The VLDB JournalThe International Journal on Very
Large Data Bases 21(3), 287–307 (2012)
14. Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing tech-
niques for big data: taxonomy and performance evaluation. Knowledge and Infor-
mation Systems 46(2), 241–284 (2016)
15. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database system implementation,
vol. 654. Prentice Hall Upper Saddle River, NJ:, 2nd ed edn. (2014)
16. Glombiewski, N., Seeger, B., Graefe, G.: Waves of misery after index creation.
BTW 2019 (2019)
17. Graefe, G., Idreos, S., Kuno, H., Manegold, S.: Benchmarking adaptive indexing,
pp. 169–184. Springer (2011)
18. Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes.
In: Proceedings of the 13th International Conference on Extending Database Tech-
nology. pp. 371–381. ACM (2010)
19. Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes.
In: Proceedings of the 13th International Conference on Extending Database Tech-
nology. pp. 371–381. ACM (2010)
20. Hong, Z., Xiao-Ming, W., Jie, C., Yan-Hong, M., Yi-Rong, G., Min, W.: A opti-
mized model for mapreduce based on hadoop. TELKOMNIKA (Telecommunica-
tion Computing Electronics and Control) 14(4) (2016)
21. Ibrahim, H., Sani, N.F.M., Yaakob, R., et al.: Analyses of indexing techniques on
uncertain data with high dimensionality. IEEE Access 8, 74101–74117 (2020)
22. Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR. vol. 7, pp.
7–10 (2017)
23. Idreos, S., Manegold, S., Kuno, H., Graefe, G.: Merging what’s cracked, cracking
what’s merged: adaptive indexing in main-memory column-stores. Proceedings of
the VLDB Endowment 4(9), 586–597 (2011)
24. John, S.: Indexing in apache hive- facebook (2011), https :
//www.f acebook.com/note.php?notei d = 1015016842773390
25. Khasawneh, T.N., AL-Sahlee, M.H., Safia, A.A.: Sql, newsql, and nosql databases:
A comparative survey. In: 2020 11th International Conference on Information and
Communication Systems (ICICS). pp. 013–021 (2020)
26. Lee, S., Jo, J.Y., Kim, Y.: Performance improvement of mapreduce process by
promoting deep data locality. In: Data Science and Advanced Analytics (DSAA),
2016 IEEE International Conference on. pp. 292–301. IEEE (2016)
27. McCreadie, R., Macdonald, C., Ounis, I.: On single-pass indexing with mapreduce.
In: Proceedings of the 32nd international ACM SIGIR conference on Research and
development in information retrieval. pp. 742–743. ACM (2009)
28. McCreadie, R., Macdonald, C., Ounis, I.: Mapreduce indexing strategies: Studying
scalability and efficiency. Information Processing and Management 48(5), 873–888
(2012)
29. Nang, J., Park, J.: An efficient indexing structure for content based multimedia
retrieval with relevance feedback. In: Proceedings of the 2007 ACM symposium on
Applied computing. pp. 517–524. ACM (2007)
30. Pirk, H., Petraki, E., Idreos, S., Manegold, S., Kersten, M.: Database cracking:
fancy scan, not poor man’s sort! In: Proceedings of the Tenth International Work-
shop on Data Management on New Hardware. p. 4. ACM (2014)
31. Ramakrishnan, R., Gehrke, J., Gehrke, J.: Database management systems, vol. 3.
McGraw-Hill New York (2010)
32. Richter, S., Quiané-Ruiz, J.A., Schuh, S., Dittrich, J.: Towards zero-overhead static
and adaptive indexing in hadoop. The VLDB Journal 23(3), 469–494 (2014)
33. Roy, S., Mitra, R.: A survey of data structures and algorithms used in the context
of compression upon biological sequence. Sustainable Humanosphere 16(1), 1951–
1963 (2020)
34. Rys, M.: Xml and relational database management systems: inside microsoft sql
server 2005. In: Proceedings of the 2005 ACM SIGMOD international conference
on Management of data. pp. 958–962. ACM (2005)
35. Sevugan, P., Shankar, K.: Spatial data indexing and query processing in geocloud.
Journal of Testing and Evaluation 47(6) (2019)
36. Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data
analytics. IJACTA 4(1), 152–158 (2016)
37. Silberschatz, A., Korth, H.F., Sudarshan, S., et al.: Database system concepts,
vol. 4. McGraw-Hill New York (1997)
38. Silva, Y.N., Almeida, I., Queiroz, M.: Sql: From traditional databases to big data.
In: Proceedings of the 47th ACM Technical Symposium on Computing Science
Education. pp. 413–418. ACM (2016)
39. Sozykin, A., Epanchintsev, T.: Mipr-a framework for distributed image processing
using hadoop. In: Application of Information and Communication Technologies
(AICT), 2015 9th International Conference on. pp. 35–39. IEEE (2015)
40. Statista: Volume of data worldwide from 2010-2025.
https://www.statista.com/statistics/871513/worldwide-data-created/ (2020)
41. Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high
cardinality attributes. In: Proceedings of the Thirtieth international conference on
Very large data bases-Volume 30. pp. 24–35. VLDB Endowment (2004)
42. Yang, H.C., Parker, D.S.: Traverse: simplified indexing on large map-reduce-merge
clusters. In: International Conference on Database Systems for Advanced Applica-
tions. pp. 308–322. Springer (2009)
43. Ydraios, E., et al.: Database cracking: towards auto-tunning database kernels. SIKS
(2010)
44. Zakir, J., Seymour, T., Berg, K.: Big data analytics. Issues in Information Systems
16(2), 81–90 (2015)
45. Zhang, Q., He, A., Liu, C., Lo, E.: Closest interval join using mapreduce. In: Data
Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on.
pp. 302–311. IEEE (2016)
46. Zhang, Y., Ren, J., Liu, J., Xu, C., Guo, H., Liu, Y.: A survey on emerging com-
puting paradigms for big data. Chinese Journal of Electronics 26(1) (2017)
47. Zikopoulos, P., Eaton, C.: Understanding big data: Analytics for enterprise class
hadoop and streaming data. McGraw-Hill Osborne Media (2011)