Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
                                                                                   13
providers offer resources in the cloud environment for                       of data through economical, industry-standard servers that stores
organisations to reap the benefits of their business data.                   and processes the data. A HDFS storage layer is used and
                                                                             MapReduce component executes a variety of analytic functions to
Popular Big Data technology providers offer great processing                 analyse thedata efficiently. Hadoop uses YARN for cluster
potential and management of large quantities of data through                 management and scheduling applications for the user. In-depth
platforms such as Apache Hadoop and Google MapReduce. The                    analysis of data using machine learning algorithms could be done
Big data technology is growing, and the vision in the use of big data        using SPARK on top of HDFS. However, with such opensource
is the way relevant, large and fast-growing data can be captured             technologies there are additional risks of ongoing maintenance and
from any source and analysed to assist the organisation to gain              support.
useful insight towards helping their overall goal. Businesses look
towards using big data to gain competitive advantages and to help            3.2 NoSQL
them achieve their business goal such as increase revenues,                  NoSQL stands for ‘Not Only SQL’ and refers to non-relational
improve customer satisfaction, and enhance their productivity [14].          database technologies such as Cassandra, Neo4j, Redis, and
As businesses continue to store large volumes of data, they look             MongoDB, which are also effective and economic choices for Big
towards more sophisticated tools to mine and analyse data into               Data infrastructure. NoSQL databases are better tailored to handle
meaningful way. Organisations are starting to realise that big data          dynamic and semi-structured data with low latency. While NoSQL
is more about business transformation and making the change to               is better suited for operational and analytical tasks to process
exploit data. Big data allows businesses to gain a deeper                    selective criteria-based data in real-time, Hadoop is more employed
understanding of the dynamics of their business by analysing and             for harnessing all data and in-depth analysis with high-throughput.
visualising big data and integrating the results with traditional            Since both Hadoop and NoSQL have different advantages and
information so as to get a new perspective on their day-to-day               purposes, both can be used simultaneously as in the case of HBase.
operations. However, Big Data pose various challenges [15][16].              However, security is one of the major concerns of NoSQL.
The first and foremost challenge is the lack of scalability due to
poor infrastructure management, whether it is on-premise                     3.3 In-Memory Database (IMDB)
infrastructure or on the cloud [4]. Organisations do not want to             An IMDB is also known as a main memory database system
maintain and pay for substantially more Big Data infrastructure              (MMDB) that is popularly used in high-volume environments
when it is underutilized currently. However, if the infrastructure           where response time is very critical. Since data resides in the
does not increase in size as the business data grows, they will not          memory of the system rather than in the disk storage, data access
be able to gain any value from Big Data. Another issue is that many          time and processing is very fast. Hence, IMDBs have become
applications and data analytics software tools do not make us of             popular in recent years for handling High-Performance Computing
optimal data transformation, efficient analysis and appropriate data         (HPC) and Big Data applications. Related to this is also in-memory
visualisation [3][6]. While performing data transformation, if the           data grid (IMDG), which is the real-time analytics engine that
quality of data is lost, whatever be the infrastructure used, it will        produces real-time changes to data providing smart grid features.
not meet the organisation’s Big Data needs. Above all these                  Using such technologies, In Big Data applications, huge quantities
challenges, the security and privacy issue is the most compelling            of data for processing can be stored in-memory, while the original
one since any data hosted by third party can raise questions about           and persistent data could be residing on an external disk.
the security and privacy of an organisation’s confidential
information [17][18].
                                                                             3.4 Massively Parallel Processing (MPP)
                                                                             MPP technology is a form of collective processing of massive
With new technology emerging for Big Data, organisations must be             amounts of data using several processors working on different parts
prepared to face challenges in supporting their dependence on Big            of the same program. Each processor takes up different threads of
Data due to the high costs involved, technological complexities,             the program to execute its own operating system and memory. A
data availability, privacy and integrity concerns. Using Big Data            messaging interface in necessary to organize and manage the
infrastructure without understanding these issues may not                    thread handling of the different processes involved in the MPP
necessarily be the right way for any organisation as Big Data forms          architecture. Many MPP technologies have partnerships with other
an essential component of management decision making that                    major players among the Big Data technology providers. Hence,
requires new capabilities, as well as organisational and culture             MPP technologies also have a crossover with other Big Data
change [19]. The next section describes the industry standards and           technologies.
tools in Big Data infrastructure that can benefit organisations to
create an implementation plan.                                               3.5 Cloud Computing
                                                                             Big players of Big Data infrastructure providers offer cloud
3. BIG DATA INFRASTRUCTURE                                                   computing that cover a range of products, technologies and services
The first and foremost requirement of an orgnaisation before                 to various organisations in order to jump start with their Big Data
plunging into the Big Data landscape is to understand the                    ventures. All the resources and applications are hosted in cloud, and
infrastructural tools and technologies: what are they, how they              is considered to have minimal cost implications as organisations
operate and what is best used for. Some of the popular technologies          can pay based on the infrastructure, platform or software services
for Big Data architecture are described below:                               used. Amazon, Microsoft, Oracle, IBM are some of the big players
                                                                             offering cost-effective Big Data architectures in the cloud. While
3.1 Hadoop                                                                   cloud computing can deliver data insights seamlessly for
 Hadoop is a readily available open source framework that uses a             organisations to benefit from, security and privacy issues of
coste-effective programming model to allow distributed processing            confidential and sensitive data are of great concern.
of big datasets by efficiently breaking it and distributing smaller
parts for parallel or concurrent processing and analysing of them.           The rapid technological developments in Big Data could
Hadoop permits distributed parallel processing of gigantic amounts           overwhelm traditional computing frameworks in businesses. Hence,
                                                                             the National Institute of Standards and Technology (NIST) has
                                                                        14
provided a high-level conceptual framework as shown in Figure 1              4. DATA VISUALISATION
[20]. The purpose of this framework is to serve as a reference model         With the fast developments in Big Data technologies and
to facilitate understanding of the operational intricacies, design           application solutions, organisations are gaining meaningful data
structures and requirements in Big Data. The advantage of the                insights that can transform their businesses by utilising the large
model is that it can be adopted by any organization as it is not tied        volumes of data for efficient decision-making and management.
to any specific vendor products, services, or reference                      Organisations can use different analytical strategies such as
implementation.                                                              predictive analytics to reveal patterns and provide decision making
                                                                             effectively. However, Big Data can add value only if the following
                                                                             key elements are planned well:
                                                                                            •    data collection,
                                                                                            •    data storage,
                                                                                            •    data analysis, and
                                                                                            •    data visualisation/output.
                                                                        15
decision-making (based on varying paraments and different criteria          artefacts. Hence, even visual security is an important concern in Big
for analysis from the chosen models). The output of the analysis            Data.
must be visually comprehensible.
Data visualisation is the key to the success of Big Data. Unless the
final output is in the form acceptable by people who need the data
to be analysed, the whole Big Data venture is of no value. Huge
reports or complicated graphics that seldom people understand will
result in no meaningful decision-making or actions. There are
various visualisation tools that include management dashboards
and commercial data visualisation platforms that output attractive
charts and graphs for clear and concise communication in order to
gain data insights.
Figure 4 gives a simple data visualisation chart showing that the
peak flu season in Australia did not occur until August from data            Figure 6. Data insights - multiple views/decision parameters.
collected for 6 years. However, the chart does show that there is a
dramatic increase in number of patients affected by flu in 2017 as          5. CONCLUSIONS AND FUTURE WORK
compared to previous years. Hence, hospitals across Australia can           Big Data plays a critical role in the industry and continues to grow
plan increasing staff in hospital workforce accordingly.                    exponentially. The data-driven business revolution adds new levels
                                                                            of complexity for analysing data to match with the velocity at which
                                                                            data is generated from diverse sources. Big data technology
                                                                            developments have driven the transformation of organisations that
                                                                            look to leveraging Big Data for competitive advantage and to
                                                                            facilitate in achieving business goals. However, understanding Big
                                                                            Data technology, and modelling data visualisation with the data
                                                                            captured and analysed in a meaningful and intelligent way are
                                                                            important for the planning and management of Big Data in an
                                                                            organization. This paper provided guidelines for Big Data
                                                                            infrastructure using NIST framework and the importance of data
                                                                            visualisation for effective decision-making using illustrations from
                                                                            industry scenarios.
                                                                            This paper has made a modest initial step to bring out the
                                                                            opportunities and challenges of Big Data. Future work would have
                                                                            a focus on the security and privacy concerns, in particular, with
                                                                            reference to the proliferation of IoT and blockchain technologies.
Figure 4. Data visualisation of time series data of flu patients.
                                                                            6. REFERENCES
                                                                            [1] Frizzo-Barker J, Chow-White PA, Mozafari M, Ha D An
                                                                                empirical study of the rise of big data in business scholarship.
                                                                                International Journal of Information Management 36(3),
                                                                                (2016), 403–413.
                                                                            [2] Chen M. et al., Big Data: A Survey, Mobile Networks and
                                                                                Applications, 19(2), (2014), 171-209.
                                                                            [3] Gorodov E. Y. and Gubarev V. V. Analytical review of data
                                                                                visualization methods in application to big data. Journal of
                                                                                Electrical and Computer Engineering (4), (2013), 1-7
                                                                            [4] Tian W. and Zhao Y., Big data technologies and cloud
                                                                                computing, Optimized Cloud Resource Management and
                                                                                Scheduling Theory and Practice, (2015), 17–49.
    Figure 5. Data visualisation of movie genres prediction.                [5] McNeely CL, Hahm, J. The big (data) bang: policy, prospects,
                                                                                and challenges. Review of Policy Research 31(4), (2014),
                                                                                304–310.
Figure 5 shows a visualisation of the prediction of movie genres for
                                                                            [6] Gandomi A, Haider M. Beyond the hype: Big data concepts,
box office hit in a particular year using a forecasting model. It
                                                                                methods, and analytics. International Journal of Information
shows that while action movies have an increasing trend, horror
                                                                                Management 35(2), (2015),137–144.
movie genre is the worst, showing highest decreasing trend.
Another example in Figure 6 provides rich data about influenza flu          [7] Xindong W., Xingquan Z., Gong-Qing W., Wei, D. Data
demographics in a country. Various sensitive information and                    Mining with Big Data, IEEE Transactions on Knowledge and
decision parameters behind the data visualisation are used by the               data Engineering, 26(1), (2014), 97-107.
data analytics model to provide big data insights to aid in various         [8] Chang, V. A. A model to compare cloud and non-cloud
drill-down analysis and decision-making. However, they are just a               storage of Big Data, Future Generation Computer Systems, 57,
click away to anyone who has access to the visual tools and                     (2016), 56–76.
                                                                       16
[9] Goli-Malekabadi, Z. Sargolzaei-Javan, M. Akbari, M. K. An              [16] Nelson B, Olovsson T Security and privacy for big data: A
    effective model for store and retrieve big health data in cloud             systematic literature review. In: Big Data (Big Data), 2016
    computing, Computer Methods and Programs in Biomedicine,                    IEEE International Conference on, IEEE, (2016), 3693–3702
    132, (2016), 75–82.                                                    [17] Li-chuan M., Qing-qi P., Hao L., Hong-ning L.. Survey of
[10] Kumar, N. Vasilakos, A. V. and Rodrigues, J. J. A multi-tenant             Security Issues in Big Data, Radio Communications
     cloud-based DC nano grid for self-sustained smart buildings                Technology, 41(1), (2015), 1-7.
     in smart cities, IEEE Communications Magazine, 55(3),                 [18] Deng-Guo F., Min Z., Hao L. Big Data Security and Privacy
     (2017), 14–21.                                                             Protection, Chinese Journal of Computers, 37(1), (2014),
[11] Laney, D. 3D Data Management: Controlling Data Volume                      246-258.
     Velocity and Variety, Tech. rep. META Group, (2001).                  [19] Jina, X.. Waha B., Chenga X., and Wanga Y., Significance and
[12] Gronwald, K.-D. Big Data Analytics, In: Integrated Business                challenges of big data research, Big Data Research, 2, (2015),
     Information Systems A Holistic View of the Linked Business                 59–64.
     Process Chain ERP-SCM-CRM-BI-Big Data, (2017), 127-                   [20] NIST, Big Data Interoperability Framework: Volume 6,
     157.                                                                       Reference Architecture, NIST, USA (2018).
[13] Huang T., Lan L., Fang X., An P., Min J., and Wang F.,                [21] Subashini S. and Kavitha V., A survey on security issues in
     Promises and challenges of big data computing in health                    service delivery models of cloud computing, Journal of
     sciences, Big Data Research, 2(1), (2015), 2–11.                           Network and Computer Applications, 34(1), (2011), 1–11.
[14] Kshetri N. The emerging role of Big Data in key development           [22] Cheng H., Wang W., and Rong C., Privacy protection beyond
     issues: Opportunities, challenges, and concerns. Big Data &                encryption for cloud big data, in Proceedings of the 2nd
     Society 1(2), (2014), 1-20.                                                International Conference on Information Technology and
[15] Jing, P. A new model of data protection on cloud storage,                  Electronic Commerce, (ICITEC ’14), (2014), 188–191, IEEE,
     Journal of Networks, 9( 3), (2014), 666–671.                               Dalian, China.
17