Automata Based String Analysis in JSAI

Abstract:The Automata Based String Analysis in JSAI revolves around the idea of interfacing the Lib Stranger tool with the JSAI implementation. LibStranger stands for STRing AutomatoN GEneratoR Library (Li 2012). The LibStranger tool implemented in C language. LibStranger is an Automata-Based Symbolic String Analysis Library. You can use LibStranger to solve string constraints and/or compute pre and post-images of string manipulation operations such as concatenation and replacement. It can handle complex regular-expression based replace operations such as PHP’s preg replace and approximate these operations in the presence of unbounded loops with high precision and smooth performance. In addition, LibStranger provides fast and precise modeling for common string functions such as trim, substring, toUpperCase and toLowerCase and complex sanitization functions such as PHP’s addslashes and html specialchars. The Lib Stranger is built on top of the MONA (Klarlund 2001, Møller 2001, Brabrand 2000, Møller 2000, Klarlund 2000, Elgaard 2000). MONA is a tool that translates formulas to finite-state automata. The formulas may express search patterns, temporal properties of reactive systems, parse tree constraints, etc. MONA analyses the automaton resulting from the compilation and prints out “valid” or a counter-example. MONA implements decision procedures for the Weak Second-order Theory of One or Two successors (WS1S/WS2S). The theory of one successor, known as WS1S, is a fragment of arithmetic augmented with second-order quantification over finite sets of natural numbers. Its first-order terms denote just natural numbers. The theory has no addition, since that would make it undecidable, but it has a unary operation +1, known as the successor function. WS2S is a generalization to tree structures. Since the theories are monadic second-order logics, we call our tool MONA. We interface the already available functionality of the tool with their specifications to do a backward string analysis of the java-script code to make it void of any code injection.

Introduction

Web applications are getting more an more popular these days with the advent of even replacing many of the existing desk top applications. With the development of the web-based solution it is notorious to have many hackers to get in-between the user and the system and exploit the vulnerabilities of the application. The Common Vulnerabilities Exposure (CVE) generally documents the top vulnerabilities that a web-application is exposed to, they have listed that the vulnerabilities has increased from 2 percent in 2001 to 45 percent in 2007 and is till increasing on a large scale, when the web-appliications are invloved. There were 3 main categories of vulnerabilities, HTML injection, Database code injection and Malicious File Execution. The HTML injection is that when the user injects a malicious content as a part of a user defined field. The database injection happens when the hacker injects a query along with a form data so as to introduce credentials of the hacker, which can in future aid him to login and make use of the functios provided. The Malicious File Execution is when the developer executes a file without validating its credentials, this will introduce bug inn the hosted platform there by crashing the entire system and teh data associated with the application. This was the list of vulnerabilities early in 2001, however the first two categories of the vulnerabilites are still in existince and proves threatful to the web application hosting platform. LibStranger tool is developed with a view to sliminate these kind of vulnerabilities existing in the web application development. The LibStranger implemented in C provided a rich set of functionalities for functioning with string automaton. The main focus on performing a backward analysis can be demonstrated with the help of the implementation available in these library. Complex sanitization functions available in the library aids in a concrete realization of the expected concept. The functions that forms the bridge for our analysis is discussed in the sections 4-5. Section 2-3 generalizes the specialities of LibStranger tool and focuses on the utility of that over our need, It also includes utilizing JNI (Java Native Interface) and SWIG (Smplified Wrapper Interface Generator). The main Implemetation of this existing module invloves identification of the appropriate functions for interfacing. The end result is interfacing of the approximated functions to the scala code, by using the interfacing tool. The final implementation generated will be effective to perform the backward analysis of a javascript code to verify its vulnerability. This can make the server end service provider be prepared of the situation and reject the vulnerable requests to avoid code injection (Shahriar 2011).

Tools and Implementation Overview

The library has function implementation for generating a DFA to accept strings, these strings are defined in the function specification as a parameter. There are functions for generating an automata from another automata. Functions to describe automata transformations and many more. The base definition and data-structure for the constructing the DFA is derived from MONA library.The Mona Library provides the data-structures for the DFA construction and primary modeling with DFA based operations. The MONA library is linked along with the stranger library during the execution of the code. The linking process will bring together both the tool libraries and operational specifics of both MONA and LibStranger to work in-line with each other. This linking pose a challenge during our interfacing step, which will require us to write java wrapper classes for every function involving in the working of the code (when JNI is used). Luckily, this can be eliminated using the SWIG. The future work of this implementation will describe the details of the interfacing of the LibStrnager tool with the scala code.

LibStranger

The stranger.h file in LibStranger library provides the list of all the functions available at our disposal. The letters are represented in a 8-bit ascii format with 1-bit reserved for replacement. The basic string operations can be performed by constructing DFA. There are specific functions that are required for the complete implementation of the scala code which will be discussed in the following sections. These functions can be utilized for performing a backward analysis of the javascript under consideration. an example scenario for the backward analysis would be considering a <script>

Functions under consideration

CharAt():

Returns the character at the “n” position within the string.
//charAt(x)
var myString = 'jQuery FTW!!!';
console.log(myString.charAt(7));
//output: F
This has no precise module in the LibStranger application, however necessary study is undertaken to find an equivalent functionality available. While dealing with the DFA and automata which generalizes the concept of the having a regular expression for the construction, it is difficult to extract a particularly specific value of a generalized solution.

CharCodeAt():

Returns the Unicode value of the character at position “n” within the string.
//charAt(position)
var message="jquery4u"
//alerts "q"
alert(message.charAt(1))
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

Concat():

Combines one or more strings (arguments v1, v2 etc) into the existing one and returns the combined string. Original string is not modified.
//concat(v1, v2,..)
var message="Sam"
var final=message.concat(" is a"," hopeless romantic.")
//alerts "Sam is a hopeless romantic."
alert(final)
The solution offered by the LibStranger tool for the concatenation is  dfa_concat .This returns a DFA that will accept the concatenated value of the provided DFA, the arguments for this function are 2 DFA’s the var, which denotes the number of bits we use to represent our ASCII format, that we are making use of an UTF-8 format.
 DFA *dfa_concat(DFA *M1, DFA *M2, int var, int *indices);

FromCharCode():

Returns a string created by using the specified sequence of Unicode values (arguments n1, n2 etc). Method of String object, not String instance. For example: String.fromCharCode().
//fromCharCode(c1, c2,...)
console.log(String.fromCharCode(97,99,98,120,121,122))
//output: abcxyz
console.log(String.fromCharCode(72,69,76,76,79))
//output: HELLO
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

Indexof():

Searches and (if found) returns the index number of the searched character or substring within the string. If not found, -1 is returned. “Start” is an optional argument specifying the position within string to begin the search. Default is 0.
//indexOf(char/substring)
var sentence="Hi, my name is Sam!"
if (sentence.indexOf("Sam")!=-1)
alert("Sam is in there!")
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

LastIndexof():

Searches and (if found) returns the index number of the searched character or substring within the string. Searches the string from end to beginning. If not found, -1 is returned. “Start” is an optional argument specifying the position within string to begin the search. Default is string.length-1.
//lastIndexOf(substr, [start])
var myString = 'javascript rox';
console.log(myString.lastIndexOf('r'));
//output: 11
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

Match():

Executes a search for a match within a string based on a regular expression. It returns an array of information or null if no match is found.
//match(regexp) //select integers only
var intRegex = /[0-9 -()+]+\$/;
var myNumber = '999';
var myInt = myNumber.match(intRegex);
console.log(isInt);
//output: 999
var myString = '999 JS Coders';
var myInt = myString.match(intRegex);
console.log(isInt);
//output: null
The solution offered by the LibStranger tool is  checkMembership . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
 int checkMembership(DFA* M, char* string, int var, int* indices);

Replace():

Searches and replaces the regular expression (or sub string) portion (match) with the replaced text instead.
//replace(substr, replacetext)
var myString = '999 JavaScript Coders';
console.log(myString.replace(/JavaScript/i, "jQuery"));
//output: 999 jQuery Coders
//replace(regexp, replacetext)
var myString = '999 JavaScript Coders';
console.log(myString.replace(new RegExp( "999", "gi" ), "The"));
//output: The JavaScript Coders
The solution offered by the LibStranger tool is  dfa_replace_char_with_string . This function performs the way we expect with the components and the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
 DFA *dfa_replace_char_with_string(DFA *M, int var, int *oldIndices, char replacedChar, char *string);

Search():

Tests for a match in a string. It returns the index of the match, or -1 if not found.
//replace(substr, replacetext)
var myString = '999 JavaScript Coders';
console.log(myString.replace(/JavaScript/i, "jQuery"));
//output: 999 jQuery Coders
//replace(regexp, replacetext)
var myString = '999 JavaScript Coders';
console.log(myString.replace(new RegExp( "999", "gi" ), "The"));
//output: The JavaScript Coders
The solution offered by the LibStranger tool is  checkMembership . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
 int checkMembership(DFA* M, char* string, int var, int* indices);

Slice():

Returns a substring of the string based on the “start” and “end” index arguments, NOT including the “end” index itself. “End” is optional, and if none is specified, the slice includes all characters from “start” to end of string.
//slice(start, end)
var text="excellent"
text.slice(0,4) //returns "exce"
text.slice(2,4) //returns "ce"
There is no specific solution, however we can use normal C string operations do the necessary functionality and then to construct DFA’s with the processed values. This will be of help in an adverse situation when we want to generalise the solution.

Split():

Splits a string into many according to the specified delimiter, and returns an array containing each element. The optional “limit” is an integer that lets you specify the maximum number of elements to return.
//split(delimiter)
var message="Welcome to jQuery4u"
//word[0] contains "We"
//word[1] contains "lcome to jQuery4u"
var word=message.split("l")
There is no specific solution, however we can use normal C string operations do the necessary functionality and then to construct DFA’s with the processed values. This will be of help in an adverse situation when we want to generalise the solution.

SubString():

Returns the characters in a string beginning at “start” and through the specified number of characters, “length”. “Length” is optional, and if omitted, up to the end of the string is assumed.
//substring(from, [to])
var myString = 'javascript rox';
myString = myString.substring(0,10);
console.log(myString)
//output: javascript

Returns the characters in a string between “from” and “to” indexes, NOT including “to” inself. “To” is optional, and if omitted, up to the end of the string is assumed.
//substring(from, to)
var text="excellent"
text.substring(0,4) //returns "exce"
text.substring(2,4) //returns "ce"
There is no specific solution, however we can use normal C string operations do the necessary functionality and then to construct DFA’s with the processed values. This will be of help in an adverse situation when we want to generalise the solution.

Tolowercase():

Returns the string with all of its characters converted to lowercase.
//toLowerCase()
var myString = 'JAVASCRIPT ROX';
myString = myString.toLowerCase();
console.log(myString)
//output: javascript rox
The solution offered by the LibStranger tool is  dfaToLowerCase . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
 DFA* dfaToLowerCase(DFA* M, int var, int* indices);

Touppercase():

Returns the string with all of its characters converted to uppercase.
//toUpperCase()
var myString = 'javascript rox';
myString = myString.toUpperCase();
console.log(myString)
//output: JAVASCRIPT ROX
The solution offered by the LibStranger tool is  dfaToUpperCase . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
 DFA* dfaToUpperCase(DFA* M, int var, int* indices);

Fundamental Concept

The fundamental concept behind the functioning of the entire system is based on the evaluation of the existing field entries in an application. The error pattern is recoginzed and the application entries are combined wit the error patterns. This is then evaluated for the intersection of the error pattern , if the intersection is evaluated to true then the proper steps are taken for sanitization of the operation. Sanitization can be done in 2 possible ways one is by match and block and the other one is by match and sanitize. The first category of match and block takes place for the most vulnerable category of input fields where the threat can be maximum intense. This way of sanitization will negotiate the application to cease it’s function when the particular category of input vulnerability is spotted. Te other kind of sanitization takes place when the attack pattern involved is responsible for a lower impact pattern and the avoidance can be done by merely removing the occuring attack pattern from the field involved. This occurence of the attack pattern is usually reported for the learning module which can enrich the data set by adding new pattern to the existing attack patters it has.

Conclusion

This work is continued to deliver fruitful result in making use of the automata structure to provide the best possible use of the system to configure the scala code to perform at it’s best in analsing the javascript application code.The SWIG which is the simple wrapper class Interface Generator is the one that will be used in the step to follow to complete the integration of the existing tool with that of the scala code which is more of the interfacing task.