Keywords: IBE, ABE, KGC, HE, RSA, SHA, MD5, IP,sql,xss
INTRODUCTION
Introduction to Big Data
Big data helps to grab knowledge by understanding the patterns, associations and trends. The data mentioned as a big data can be either structured data or unstructured data. In terms of big data, quality is given more importance than the quantity. The main idea behind gathering of data, processing them and perform analysis is to get important information about the pattern of data. In order to perform that big data analysis is performed. The definition of big data by the analysts is in terms of three V’s, which are volume, velocity and variety. Volume indicates the quantity of data that are collected by different organizations and companies. These data include transaction data, machine and other device data. In order to handle huge volume of data, Hadoop is used in the industries. Velocity indicates the rate at which the data is getting transferred from one source to other. In order to control the data flow thereby minimizing the unauthorized access by the third party, data handling and security measures are to be incorporated. Variety is defined as different types of formats, which can be structured mathematical data, documents in the form of texts, video, audio, email.
Big Data Challenges
The challenges with the big data [3] are the methods and techniques that are framed for the security of small size data is
not practical for large size data. The most relevant challenges of big data in the modern society are volume, variety and combination of many datasets, velocity, relevance and quality, security, privacy, scalability. The volume of data in the recent years are getting increased in terms of geometric progression. Hence scientists use the word as data explosion. The data is expected to reach zeta bytes. The main source of data is generated by the social media and mobile phones. The most common types of data that are available for data analysis is in the form of unstructured or semi structured formats. The data is really hard to be obtained in the form of structured data. The complexity of data increases the increase in the quantity of data. The data come from different sources such web pages, documents which are in the text, audio, video formats, emails and multimedia data. One of the most common challenges is to analyze the source of data and how to control the flow of information. The data flow happens in excess and can be accessed by any person. In terms of quality, the data analysis can be performed in an effective manner if and only if the data has clarity. The machine learning algorithms such as supervised, unsupervised and reinforcement learning can be implemented only if the data that is analyzed has high quality. If quality of data is compromised, then the performance rate of data analysis will get affected. The data once released should be accesses by the authenticated users, hence encryption should be performed to prevent the unauthorized access of data by the third party. In order to maintain the integrity of data, hashing is to be implemented using different hash algorithms such Secure Hash Algorithms (SHA-1), SHA-512, Message Digest (MD5) etc. The scalability of data is another challenge faced by big data in today’s world. Dynamically upon demand from the users, scaling up and scaling down of large volume of data is highly crucial. The projects based on big data should be able to analyze the direction of progress of data. Hence the other resources that are also to be included as a part of the project should have space.
Existing General Solutions to overcome Big Data Challenges
Big data and its Storage have created multiple privacy and security threats. The interruption from unauthorized third party and the server viewing the data each time for processing
them leads to a high chance of vulnerability. This creates a way for the data to get hacked. Hence there is no privacy and security for the data, which leads to various ethical issues. This needs to be handled legally. Big Data faces many challenges with respect to the security while storing them on a cloud server. There is high chance of getting data viewed by the hacker and the server for performing various operations. In order to provide high level of confidentiality and integrity, a new encryption technique is introduced known as ABE, which instead of making use of the receiver’s public key for encryption, uses various attributes. In order to provide more privacy of data, web log Analyzer is used to find out the loopholes and the unauthorized access to the web data. Based on the base research works done on how to secure the data in an effective way, two solutions are put forward. First solution is, in order to provide security while data is stored and accessed by cloud users, encryption is performed. Encryption provides confidentiality of data. In order to perform a real time monitoring of web data access and to find the types of attack, there is web log analysis. There are certain general potential solutions for privacy and security challenges. The solutions are cloud providers have to be examined in a proper way. One of the most appropriate way to store large volume of data is in the cloud. But the cloud should be made secure so that only authorized users will be given the permission to access the data. In order to ensure that the cloud providers should be timely monitored, and enough security audits are to be conducted. The necessary control policies in terms of access should be included to maintain the integrity and confidentiality of data. The data should be protected using encryption mechanisms. Hence the different stages of data such as from the stage of data collection till data analytics will be secure. Data while getting transferred from source to destination should be protected. The access of data should be monitored on real time basis. The frequent threat analysis should be performed to make sure that data is not leaked. The mechanism of data anonymization should be included so that from the dataset, the most confidential information will be hidden from the common users. Threat intelligence mechanism is to be included to make sure that security monitoring in terms on real time basis is performed. Authentication and authorization methods are to be performed in to prevent illegal log in and access to the applications and the data associated. The effective key management scenarios are to be included such that while encryption using symmetric or public key technique is used, the keys can be shared in a proper manner. In normal cases for the ease of access the key will be stored in the disks drives which are local. The activity logs are to be analyzed on regular basis such that unwanted login attempts and unusual access to data will be registered in the activity log and it can be identified. In order to incorporate secure data communication, a secure network is highly essential. Secure Socket Layer or Transport Layer Security protocols should be used while creating a network for the secure communication. Restriction towards the data access and the data anonymization are the two effective approaches for protecting big data.
Out of these different mechanisms for protecting the big data, the research work mentioned in this paper focus on web log Analyzer to analyze different kinds of attacks that happened to the big data by considering the log files. This helps to get a better understanding of existing attacks and its patterns which helps to develop remedies from the same attacks happening again. In order to maintain the confidentiality of data while it is getting broadcasted to a group of users, by restricting the access only to genuine authorized group, ABE is implemented. Section 5 of this paper illustrates the different automated web log Analyzer tools to detect different attacks from the log file. Section 6 indicates ABE to provide data confidentiality in terms of encryption. Section 7 gives a detailed description about the implementation code of ABE.
LITERATURE SURVEY
The data volume is getting increased marginally due to the excess growth in the field of Internet of Things, cloud computing, mobile internet. Huge volume of data is generated from industries. In order to manage industrial data [1], the enterprise manager should take care of data in an efficient manner. Cloud computing provides a better solution for man- aging the industrial data. The main advantages of making use of cloud computing for the effective data management are configuration is highly flexible, purchase in the form of on demand, maintenance of the data in the cloud is easy. By making use of cloud computing, the enterprise is able to concentrate more on business rather than spending more time of data management. In order to provide confidentiality to the data, it should be encrypted before it is uploaded to the cloud. Even though data storage in cloud is really easy, there are many issues related to privacy and security. ABE is used to provide a secure access mechanism by encrypting the data. ABE make use of attributes which can be any string of information related to the user. Any public key encryption system can be used to perform attribute-based encryption, which prevents from creating a database of public keys. The protection in terms of privacy at the time of the key generation is taken care by ABE. This is because Key Generation Center (KGC), knows all the attributes and the secret keys that are generated for each of the users. If the KGC is hacked, then the secret generation will be compromised. This research work indicates an ABE system, which is more secure in terms of key generation. The process of attribute auditing and generation of key modules are separated in order to make KGC about the key for each user it generates, and the audit mechanisms performed too.
More amount of data is getting stored and shared on the internet by the third parties, hence there is no guarantee that the data is secure and is accessed only by the genuine users. Hence an efficient encryption technique [2] is to be incorporated to ensure the data confidentiality and maintain the integrity by using hashing techniques. Considering the drawback of encrypting data is that it can shared only on selective basis at the level of course grain. The research works illustrates a new technique of ABE known as Key Based ABE.
This helps to perform sharing of the cipher text on fine grain basis. The attributes set helps in labelling the cipher texts generated. The structures used for access are linked with private keys. Using these keys which are private, the users can decrypt the data. The scheme developed supports the audit log content sharing, and the encryption of data which is broadcasted. The paper also illustrates the details of IBE which is Hierarchy based.
HE [4] indicates a secure way of transferring data such that the third party who has to process the data to generate the results for the genuine users will be given a chance to see the data in the human readable form. The manipulation of the data to generate the outcome will on cipher text. In the cloud environment, four types of HE which are single encryption algorithm are abstracted. The entire HE is divided into fully HE and partial HE. Where the common operations performed are addition and multiplication. Five different types of fully HE is discussed in detail and the comparative table is illustrated to give a detailed explanation on how fully HE is better than any normal encryption scheme. Hence as a future progressive work, attribute based encryption can be combined with the HE to prevent the loss of integrity of data to the server or the third party and that the server can perform the operations as instructed by the user on the cipher text without even seeing the actual plain text.
WEB LOG ANALYZER
Web log analyzers are tools that are used to scan for vulnerabilities in web sites and their servers through the use of log files. It is a sort of log file analyzing mechanism that checks the log reports generated from the sites and visualizes them for the admin to look through and make decisions based on the reports. Traditionally, without these tools, people would had needed to go through tones of log records from a file and figure out the anomaly which sometimes might be too hard to find if done manually. But with the help of these log file analyzer tools, this has become easy as it is just a few step process and it can interpret various types of information just from one log dump file that can be found in the main root folder of the web server that is hosting the site. These are applications that are used on the server side of the system and needs proper access to the root directory of the servers that are hosting so that they can access the log files from within the servers. We have styled our solution with two applications where each does their own unique stuff and helps together to provide a better secured solution. The two tools that have been used for the web log analysis:
Snort
AWStats
A. Snort
Snort is an open sourced intrusion detection application that used on the server side of the system to detect any sort of intrusions that can occur and detect the system admin about the situation so that they can take proper actions against them and
prevent the loss of any data. This is a small-scale application that runs on the networking device and is a console-based application, so no additional GUI is provided to the user for any other purpose. It uses a rule-based technique to prevent or alert the user where if the conditions are met, it will provide an alert prompt to the admin to take action. Snort keeps track on all the traffic whether its incoming or outgoing so that it can check whether any data transfer contains anything that might trigger one of the rule policies set by the admin. The following five types of action is performed by these rules:
Snort just doesn’t only alert for intrusion detection but also helps to create log files which comes in handy in the later processes. They would log all the records of the incidents that happens from the intrusions and save it in a log file which can later be used to provide further action decided by the admin.