International Journal of Computer Engineering and Applications,
Volume XII, Issue III, March 18, www.ijcea.com ISSN 2321-3469
BIG DATA SECURITY AND PRIVACY IN SMART INDUSTRY
Alice Joseph1, Dr. Mathew Cherian2
1
Department Of Computer Science And Engineering, Cochin University College of Engineering
Kuttanad, India, 2Department Of Mechanical Engineering, Cochin University College of
Engineering Kuttanad, India
ABSTRACT
Big data is commonly defined as a huge volume of different variety of data arriving with erratic
velocity. Industry deals with the production or manufacture of items. Big data analytics provides
enormous benefit to industry. While big data has been used in many industries, the concern about
big data security and privacy has been increased. In this paper, we first introduced certain
features of smart industry. Next section we reviewed the enormous benefits and challenges of big
data usage in smart industry. In the next section we discuss various securities and privacy threats
of big data usage in smart industries and also mention certain solution methods explained by
various researchers.
[1] INTRODUCTION
Due to industrial evolutions, the industrial processes result in significantly higher productivity. In
the first industrial revolution, from the end of the 18th century to the beginning of the 19th century,
mechanization emerged i.e. it replaced agriculture with industry as the foundations of the economic
structure of society. Mass extraction of coal along with the invention of the steam engine created a
new type of energy. The Second Industrial Revolution at the end of 19th century and at the
beginning of the 20th century came into existence through the emergence of a new source of energy
Alice Joseph and Dr. Mathew Cherian 1
BIG DATA SECURITY AND PRIVACY IN SMART INDUSTRY
- electricity, gas and oil. In the second half of the 20th century, a third industrial revolution
appeared with the emergence of a new type of energy whose potential surpassed its predecessors -
nuclear energy. The fourth revolution is with the emergence of the Internet and entrenched in a new
digitalization technology. This digitalization leads to virtualization of the world.
The industry of today and tomorrow connect all production resources to enable their interaction in
real time. Thus smart industries start with the new technologies like Cloud, Big Data Analytics,
Cognitive Computing and the Internet of Things. The current industrial revolution is based on
automation of production. Business model of emerging IT technologies become more customer
focused and ensure their profitability in the future. Smart Industries are industries that have
following certain key characteristics:
Production
Specifications, quality, design of Product needs
What volume is needed
When the product is needed
What is required to improve the Resource efficiency and to reduce cost
Ability to fine tune to customer needs and make use of the entire supply chain for value
creation.
These Smart Industries are working in a network-centric approach, making use of the value of
information, driven by ICT and the latest available proven manufacturing techniques to enable their
success [1]. If the data integrity is guaranteed, then only the benefits of Smart Industry can be
achieved to the level to provide intelligent services to the customer through internet-based services.
[2] BENEFITS AND CHALLENGES OF BIG DATA
Internet of Things enables the modern industries to adopt new data-driven policies and handle the
global competitive stress more easily. However, the adoption of Internet of Things increases the
total volume of the data generated from industries and thus transforming the data into industrial big
data. With the advent of the fourth industrial revolution, manufacturing systems are transformed
into digital ecosystems. In this transformation, the Internet of Things and Big Data pose a major
role. Towards its end, industrial enterprises have entered a new age of “Big Data”, where the
volume, velocity and variety of data they manage is report at really high rates. More and more
devices, manufacturing tools, plants, vehicles, as well as manufacturing equipment are equipped
with sensors. The digital transformation of industry enabled by the Internet of Things, big data, and
machine learning adoption allows new ways for businesses to connect and create value [2]. New
data-driven steps will support the industries to optimize their performance by gathering and
analyzing data through the whole product lifecycle.
With the introduction of big data in industrial applications a fault-free and cost efficient running of
the process is achieved, while maintaining the desired performance levels, especially with respect
to quality. Moreover, forward-looking industries can utilise the generated data in order to create a
benefit and increase their competitiveness through predictive analysis. One of the main tasks of Big
Data analytics in industries is the visualization of the results. New approaches for context-aware
visualization should be followed to sufficiently support the decision-making in the different levels
of the industry.
Alice Joseph and Dr. Mathew Cherian 2
International Journal of Computer Engineering and Applications,
Volume XII, Issue III, March 18, www.ijcea.com ISSN 2321-3469
The manufacturing industry and researchers continually make efforts to improve the industry’s
competitiveness with a recent focus on smart manufacturing. Smart Manufacturing has been
defined as integrating “network-based data and information that comprises the real-time
understanding, reasoning, design, planning and management of all aspects of the manufacturing and
supply chain enterprise, i.e., manufacturing intelligence. This is achieved through pervasive,
comprehensive and orchestrated use of advanced sensor-based data analytics, modeling and
simulation, and integrated performance metrics constructed for real-time action” [3].
[3] BIG DATA PRIVACY AND SECURITY
Figure 1: stages of big data life cycle
Big data from smart industry contains diverse type of data, i.e., they may contain text, audio, image,
or video etc. This differing quality of data is denoted by variety. In order to ensure big data privacy
and security, various mechanisms have been developed in recent years. These mechanisms can be
grouped based on the stages of big data life cycle Figure. 1, i.e., data generation, storage, and
processing [4].
3.1 IN DATA GENERATION AND DATA ACQUISITION STAGE
Large scale, highly assorted, and multifaceted datasets are generated from smart industries. These
data collected, transmitted, and pre-processed in data acquisition phase. An efficient wired or
wireless transmission mechanism should be used to send this data to a reliable storage management
system to support different analytical applications. Majority of data generated and collected from
smart industries are from various sensors which are connected to various machines and tools. So the
datasets transmitted may contain redundant, useless or error data. It takes more storage space and
affects the efficiency of data analysis process. Data compression techniques can be applied to
reduce the redundancy. To capture precise information, smart filters are required. These filters
could capture useful information and discard useless that contains imprecise or inconsistent values.
Then efficient analytical algorithms are required to understand the origin of data, to process the vast
Alice Joseph and Dr. Mathew Cherian 3
BIG DATA SECURITY AND PRIVACY IN SMART INDUSTRY
streaming data and to reduce data before storing. Due to the noisy, effervescent, assorted, correlated
and erratic features of Big Data, the data mining, cleansing and analysis proves to be very
challenging. Therefore, data pre-processing operations are indispensable to ensure efficient data
storage and exploitation.
For the data generation stage, the first challenge is equipment and process complexity. The various
processes which are accomplishing in the smart industry often cannot be defined concisely with
chemical or physics equations. A considerable number of interactions between components exist.
Large data archives are needed to capture and characterize all forms of processes and process
interactions. There are a large number of unknown or unmeasured quantities complicating the
characterization of the operation of the machines, tools and process. Scientific sensor systems and
wireless sensor network applications produce a variety of enormous data sets in real time through
various monitored activities. With the dramatic increment of huge information produced from smart
industries, to find and locate the errors in big data sets turns to be difficult with normal computing
and network systems. Hence, many new models and approaches are proposed by many researchers
based on neural network technique [5], statistical methods etc. Acquiring data from diverse sources
like various machines and tools, machine operators, users etc. of smart industry and storing for data
analytics is really a challenge.
Most often the sensors produce analog signals, which are then converted into understandable digital
data for processing and storage. Sensory data may be categorized as vibration, vocal sound, electric
waves, pressure, sound wave, meteorological conditions, and temperature. Sensed data or
information is transferred to a collection point through wired or wireless networks. The wired
sensor network obtains related information conveniently for easy deployment and is suitable for
management applications, such as video surveillance system in the smart industry [6]. When
position is inaccurate, when a specific phenomenon is unknown, and when power and
communication have not been set up in the environment, wireless communication can enable data
transmission within limited capabilities. Currently, the wireless sensor network has gained
significant attention and has been applied in smart industries also. The data through any application
is assembled in various sensor nodes and sent back to the base location for further handling [7].
In data generation phase, for the protection of privacy, access restriction as well as falsifying data
techniques are used. The conventional security mechanisms to protect data can be divided into four
categories. They are file level data security schemes, database level data security schemes, media
level security schemes and application level encryption schemes. [4]
3.2 IN DATA STORAGE
Industrial big data is stored in several nodes of many clusters which are distributed all over the
world. Public and/or private networks are used for communicating between these nodes and
clusters. If someone tries to modify the data during inter-node communication, it may lead to
extract valuable information of industry. Therefore, secure network protocols for big data tools to
be used to protect interactions between different nodes and clusters [8].
When data are stored locally or on the cloud, data security predominantly has three objectives of
security, confidentiality, integrity and availability [9]. The confidentiality and integrity has to be
maintained to keep data privacy. Availability of information refers to ensuring that authorized
parties or machines are able to access the information when needed.
Alice Joseph and Dr. Mathew Cherian 4
International Journal of Computer Engineering and Applications,
Volume XII, Issue III, March 18, www.ijcea.com ISSN 2321-3469
Most of the industries used cloud computing for big data storage, industry misses control over data.
These cloud providers should have certain security and privacy features like authentication,
authorization, confidentiality, integrity, availability, check the identity for data access, enforcing
rules for certain actions, continuous monitoring to guarantee agreement with consumer security
policies and auditing requirements, protect personal identity information [10]. If these basic
features are not accomplished, the data on the cloud server is at risk. The owner of the cloud server
should be protected the data properly according to the service level contract.
Trust Chain in Clouds, Loss of Control and Multi-tenancy are the security issues in cloud
computing. Smart industries use the storage from cloud and also do some complex computations on
the cloud that need cloud processing power. As the cloud provider who is acting as a black box is
not transparent to the user, no one will ascertain whether the computation integrity is intact or not.
There are many problems from the cloud systems, vulnerable code or misconfiguration that is
owned by the provider. Loss of control is another security issue that occurs when all the industrial
data and applications are hosted at the cloud that has the full control of the cloud provider. Cloud
providers can do any data analytics operations or do any play on this data which can lead to security
issues. Multi-tenancy is the backbone architecture of cloud computing. Multiple users are sharing
the computing resources in a public or private cloud. The sharing of resources means that it can be
easier for an attacker to gain access to the target’s data. SecCloud is one of the models for data
security in the cloud that jointly considers both of data storage security and computation auditing
security in the cloud. [11].
A basic requirement for big data storage system is to protect the privacy of an individual. There are
many existing mechanisms such as encryption method, Obfuscation [12] to fulfill that requirement.
These approaches safeguard the privacy of the user when data are stored on the cloud. The
integrity of data storage in cloud systems can be verified through number of ways i.e., Proof of
Retrievability (POR), Provable Data Possession (PDP), Dynamic Provable Data Possession
(DPDP), Fuzzy Identity-Based Data Integrity Auditing for Reliable Cloud Storage Systems etc.[13,
14]. The integrity verification should be conducted regularly to provide highest level of data
protection.
3.2 IN DATA PROCESSING
Big data processing usually accomplished to extract useful insights, correlations or patterns by
performing specific computations. The data computation involves the movement of big data over
the network which, if not carefully managed, might impair the performance of the computation
itself. Security and protection of these computations are vital to avoid any risk or effort to change or
twist the extracted results. Shielding the systems from any endeavor to spy on the process or the
number of performed computations is also very important.
Smart industries have three types of analytical operations. The summarization and description of
some knowledge patterns using simple statistical methods, such as mean, median, mode, standard
deviation, variance, and frequency measurement of specific events in industrial big data streams is
called descriptive analysis. Descriptive analytics are considered backward looking and reveal what
has already occurred. Forecasting and statistical modeling to determine the future possibilities
Alice Joseph and Dr. Mathew Cherian 5
BIG DATA SECURITY AND PRIVACY IN SMART INDUSTRY
based on supervised, unsupervised, and semi-supervised learning model is called predictive
analytics. To determine the cause-effect relationship among analytic results and business process
optimization policies are performed by Prescriptive analytics.
The legitimate use of these services is prevented Denial-of-Service attack. A DDoS attack includes
an overwhelming quantity of packets sent from multiple attack sites to a victim site. These packets
arrive in such a high quantity that some key resource at the victim is quickly exhausted. The victim
either crashes or spends so much time handling the attack traffic that it cannot attend to its real
work. Infrastructure level attacks and application level attacks are the two levels of Denial-of-
Service attack.
[4] CONCLUSION
The proposed work presents how the adoption of big data and Internet of Things in manufacturing
will generate Industrial Big Data. New monitoring services and the concept of Internet of Things
that tends to transform the machine tools into smart tools that will be generated high volume and
variety of data. The Internet of Things paradigm and big data alter the industries into “smart
industries” capable of being flexible and adaptive and fully aware on the production conditions.
However, new way of filtering and processing the data should be considering in order to reduce the
produced and transmitted data.
Alice Joseph and Dr. Mathew Cherian 6
International Journal of Computer Engineering and Applications,
Volume XII, Issue III, March 18, www.ijcea.com ISSN 2321-3469
REFERENCES
[1] BlueCielo, an Accruent Company, https://www.bluecieloecm.com/smart-industry/
[2] D. Mourtzis, E. Vlachou, N. Milas, “Industrial Big Data as a result of IoT adoption in Manufacturing”,
ScienceDirect, 5th CIRP Global Web Conference Research and Innovation for Future Production,
Procedia CIRP 55 ( 2016 ) 290 – 295.
[3] Sanjay Jain, Guodong Shao, “Virtual Factory Revisited for Manufacturing Data Analytics”,
Proceedings of the 2014 Winter Simulation Conference
[4] Priyank Jain, Manasi Gyanchandani and Nilay Khare, “Big data privacy: a technological perspective
and review”, Journal of Big Data, (2016) 3:25
[5] Vishakha Bajad, Jayshree R.Pansare, “Effective Error Detection System”, Volume 6, Issue 12,
December 2016
[6] Mark A. Perillo and Wendi B. Heinzelman, “Wireless Sensor Network Protocols”, Handbook of
Algorithms for Wireless Networking and Mobile Computing, 2005.
[7] Feng Wang and Jiangchuan Liu, “Networked Wireless Sensor Data Collection: Issues, Challenges, and
Approaches”, IEEE Communications Surveys & Tutotials, Vol. 13, No. 4, Fourth Quarter 2011
[8] Youssef Gahi, Mouhcine Guennoun, Hussein T. Mouftah, “Big Data Analytics: Security and Privacy
Challenges”, IEEE Symposium on Computers and Communication (ISCC), 2016
[9] Ann Cavoukian ,” The Security–Privacy Paradox: Issues, Misconceptions, and Strategies”, A Joint
Report by The Information and Privacy Commissioner/Ontario and Deloitte & Touche, August 2003
[10] Min Li, Wanyu Zang, Kun Bai, Meng Yu, “MyCloud – Supporting User-Configured Privacy
Protection in Cloud Computing”, ACSAC ’13 Dec. 9-13, 2013
[11] Sultan Aldossary, William Allen, “Data Security, Privacy, Availability and Integrity in Cloud
Computing: Issues and Current Solutions”, (IJACSA) International Journal of Advanced Computer
Science and Applications, Vol. 7, No. 4, 2016
[12] Dr. L. Arockiam, S. Monikandan, “Efficient Cloud Storage Confidentiality to Ensure Data Security”,
2014 International Conference on Computer Communication and Informatics (ICCCI -2014), Jan. 03 –
05, 2014, Coimbatore, INDIA
[13] Zhai Guanghui, Li Juan, “Research on the Security of Massive Data Storage”, Proceedings of the 2nd
International Symposium on Computer, Communication, Control and Automation (ISCCCA-13)
[14] Yannan Li, Yong Yu_, Geyong Min, Willy Susilo, Jianbing Ni and Kim-Kwang Raymond Choo,
Fuzzy Identity-Based Data Integrity Auditing for Reliable Cloud Storage Systems, JOURNAL OF
LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
Alice Joseph and Dr. Mathew Cherian 7