Keywords: IBE, ABE, KGC, HE, RSA, SHA, MD5, IP,sql,xss
INTRODUCTION
Introduction to Big Data
Big data helps to grab knowledge by understanding the patterns,
associations and trends. The data mentioned as a big data can be either
structured data or unstructured data. In terms of big data, quality is
given more importance than the quantity. The main idea behind gathering
of data, processing them and perform analysis is to get important
information about the pattern of data. In order to perform that big data
analysis is performed. The definition of big data by the analysts is in
terms of three V’s, which are volume, velocity and variety. Volume
indicates the quantity of data that are collected by different
organizations and companies. These data include transaction data,
machine and other device data. In order to handle huge volume of data,
Hadoop is used in the industries. Velocity indicates the rate at which
the data is getting transferred from one source to other. In order to
control the data flow thereby minimizing the unauthorized access by the
third party, data handling and security measures are to be incorporated.
Variety is defined as different types of formats, which can be
structured mathematical data, documents in the form of texts, video,
audio, email.
Big Data Challenges
The challenges with the big data [3] are the methods and techniques
that are framed for the security of small size data is
not practical for large size data. The most relevant challenges of big
data in the modern society are volume, variety and combination of many
datasets, velocity, relevance and quality, security, privacy,
scalability. The volume of data in the recent years are getting
increased in terms of geometric progression. Hence scientists use the
word as data explosion. The data is expected to reach zeta bytes. The
main source of data is generated by the social media and mobile phones.
The most common types of data that are available for data analysis is in
the form of unstructured or semi structured formats. The data is really
hard to be obtained in the form of structured data. The complexity of
data increases the increase in the quantity of data. The data come from
different sources such web pages, documents which are in the text,
audio, video formats, emails and multimedia data. One of the most common
challenges is to analyze the source of data and how to control the flow
of information. The data flow happens in excess and can be accessed by
any person. In terms of quality, the data analysis can be performed in
an effective manner if and only if the data has clarity. The machine
learning algorithms such as supervised, unsupervised and reinforcement
learning can be implemented only if the data that is analyzed has high
quality. If quality of data is compromised, then the performance rate of
data analysis will get affected. The data once released should be
accesses by the authenticated users, hence encryption should be
performed to prevent the unauthorized access of data by the third party.
In order to maintain the integrity of data, hashing is to be implemented
using different hash algorithms such Secure Hash Algorithms (SHA-1),
SHA-512, Message Digest (MD5) etc. The scalability of data is another
challenge faced by big data in today’s world. Dynamically upon demand
from the users, scaling up and scaling down of large volume of data is
highly crucial. The projects based on big data should be able to analyze
the direction of progress of data. Hence the other resources that are
also to be included as a part of the project should have space.
Existing General Solutions to overcome Big Data Challenges
Big data and its Storage have created multiple privacy and security
threats. The interruption from unauthorized third party and the server
viewing the data each time for processing
them leads to a high chance of vulnerability. This creates a way for the
data to get hacked. Hence there is no privacy and security for the data,
which leads to various ethical issues. This needs to be handled legally.
Big Data faces many challenges with respect to the security while
storing them on a cloud server. There is high chance of getting data
viewed by the hacker and the server for performing various operations.
In order to provide high level of confidentiality and integrity, a new
encryption technique is introduced known as ABE, which instead of making
use of the receiver’s public key for encryption, uses various
attributes. In order to provide more privacy of data, web log Analyzer
is used to find out the loopholes and the unauthorized access to the web
data. Based on the base research works done on how to secure the data in
an effective way, two solutions are put forward. First solution is, in
order to provide security while data is stored and accessed by cloud
users, encryption is performed. Encryption provides confidentiality of
data. In order to perform a real time monitoring of web data access and
to find the types of attack, there is web log analysis. There are
certain general potential solutions for privacy and security challenges.
The solutions are cloud providers have to be examined in a proper way.
One of the most appropriate way to store large volume of data is in the
cloud. But the cloud should be made secure so that only authorized users
will be given the permission to access the data. In order to ensure that
the cloud providers should be timely monitored, and enough security
audits are to be conducted. The necessary control policies in terms of
access should be included to maintain the integrity and confidentiality
of data. The data should be protected using encryption mechanisms. Hence
the different stages of data such as from the stage of data collection
till data analytics will be secure. Data while getting transferred from
source to destination should be protected. The access of data should be
monitored on real time basis. The frequent threat analysis should be
performed to make sure that data is not leaked. The mechanism of data
anonymization should be included so that from the dataset, the most
confidential information will be hidden from the common users. Threat
intelligence mechanism is to be included to make sure that security
monitoring in terms on real time basis is performed. Authentication and
authorization methods are to be performed in to prevent illegal log in
and access to the applications and the data associated. The effective
key management scenarios are to be included such that while encryption
using symmetric or public key technique is used, the keys can be shared
in a proper manner. In normal cases for the ease of access the key will
be stored in the disks drives which are local. The activity logs are to
be analyzed on regular basis such that unwanted login attempts and
unusual access to data will be registered in the activity log and it can
be identified. In order to incorporate secure data communication, a
secure network is highly essential. Secure Socket Layer or Transport
Layer Security protocols should be used while creating a network for the
secure communication. Restriction towards the data access and the data
anonymization are the two effective approaches for protecting big data.
Out of these different mechanisms for protecting the big data, the
research work mentioned in this paper focus on web log Analyzer to
analyze different kinds of attacks that happened to the big data by
considering the log files. This helps to get a better understanding of
existing attacks and its patterns which helps to develop remedies from
the same attacks happening again. In order to maintain the
confidentiality of data while it is getting broadcasted to a group of
users, by restricting the access only to genuine authorized group, ABE
is implemented. Section 5 of this paper illustrates the different
automated web log Analyzer tools to detect different attacks from the
log file. Section 6 indicates ABE to provide data confidentiality in
terms of encryption. Section 7 gives a detailed description about the
implementation code of ABE.
LITERATURE SURVEY
The data volume is getting increased marginally due to the excess growth
in the field of Internet of Things, cloud computing, mobile internet.
Huge volume of data is generated from industries. In order to manage
industrial data [1], the enterprise manager should take care of data
in an efficient manner. Cloud computing provides a better solution for
man- aging the industrial data. The main advantages of making use of
cloud computing for the effective data management are configuration is
highly flexible, purchase in the form of on demand, maintenance of the
data in the cloud is easy. By making use of cloud computing, the
enterprise is able to concentrate more on business rather than spending
more time of data management. In order to provide confidentiality to the
data, it should be encrypted before it is uploaded to the cloud. Even
though data storage in cloud is really easy, there are many issues
related to privacy and security. ABE is used to provide a secure access
mechanism by encrypting the data. ABE make use of attributes which can
be any string of information related to the user. Any public key
encryption system can be used to perform attribute-based encryption,
which prevents from creating a database of public keys. The protection
in terms of privacy at the time of the key generation is taken care by
ABE. This is because Key Generation Center (KGC), knows all the
attributes and the secret keys that are generated for each of the users.
If the KGC is hacked, then the secret generation will be compromised.
This research work indicates an ABE system, which is more secure in
terms of key generation. The process of attribute auditing and
generation of key modules are separated in order to make KGC about the
key for each user it generates, and the audit mechanisms performed too.
More amount of data is getting stored and shared on the internet by the
third parties, hence there is no guarantee that the data is secure and
is accessed only by the genuine users. Hence an efficient encryption
technique [2] is to be incorporated to ensure the data
confidentiality and maintain the integrity by using hashing techniques.
Considering the drawback of encrypting data is that it can shared only
on selective basis at the level of course grain. The research works
illustrates a new technique of ABE known as Key Based ABE.
This helps to perform sharing of the cipher text on fine grain basis.
The attributes set helps in labelling the cipher texts generated. The
structures used for access are linked with private keys. Using these
keys which are private, the users can decrypt the data. The scheme
developed supports the audit log content sharing, and the encryption of
data which is broadcasted. The paper also illustrates the details of IBE
which is Hierarchy based.
HE [4] indicates a secure way of transferring data such that the
third party who has to process the data to generate the results for the
genuine users will be given a chance to see the data in the human
readable form. The manipulation of the data to generate the outcome will
on cipher text. In the cloud environment, four types of HE which are
single encryption algorithm are abstracted. The entire HE is divided
into fully HE and partial HE. Where the common operations performed are
addition and multiplication. Five different types of fully HE is
discussed in detail and the comparative table is illustrated to give a
detailed explanation on how fully HE is better than any normal
encryption scheme. Hence as a future progressive work, attribute based
encryption can be combined with the HE to prevent the loss of integrity
of data to the server or the third party and that the server can perform
the operations as instructed by the user on the cipher text without even
seeing the actual plain text.
WEB LOG ANALYZER
Web log analyzers are tools that are used to scan for vulnerabilities in
web sites and their servers through the use of log files. It is a sort
of log file analyzing mechanism that checks the log reports generated
from the sites and visualizes them for the admin to look through and
make decisions based on the reports. Traditionally, without these tools,
people would had needed to go through tones of log records from a file
and figure out the anomaly which sometimes might be too hard to find if
done manually. But with the help of these log file analyzer tools, this
has become easy as it is just a few step process and it can interpret
various types of information just from one log dump file that can be
found in the main root folder of the web server that is hosting the
site. These are applications that are used on the server side of the
system and needs proper access to the root directory of the servers that
are hosting so that they can access the log files from within the
servers. We have styled our solution with two applications where each
does their own unique stuff and helps together to provide a better
secured solution. The two tools that have been used for the web log
analysis:
Snort
AWStats
Snort is an open sourced intrusion detection application that used on
the server side of the system to detect any sort of intrusions that can
occur and detect the system admin about the situation so that they can
take proper actions against them and
prevent the loss of any data. This is a small-scale application that
runs on the networking device and is a console-based application, so no
additional GUI is provided to the user for any other purpose. It uses a
rule-based technique to prevent or alert the user where if the
conditions are met, it will provide an alert prompt to the admin to take
action. Snort keeps track on all the traffic whether its incoming or
outgoing so that it can check whether any data transfer contains
anything that might trigger one of the rule policies set by the admin.
The following five types of action is performed by these rules:
- Alert: Generates an alert message
- Log: Logs the specified IP packet
- Pass: Ignores the specified IP packet
- Activate: Sends an alert message and then activates a dynamic rule
- Dynamic: Activated by an activate rule, this rule then acts as a log
rule
Snort just doesn’t only alert for intrusion detection but also helps to
create log files which comes in handy in the later processes. They would
log all the records of the incidents that happens from the intrusions
and save it in a log file which can later be used to provide further
action decided by the admin.