Automata Based String Analysis in JSAI

Abstract:The Automata Based String Analysis in JSAI revolves around the idea of interfacing the Lib Stranger tool with the JSAI implementation. LibStranger stands for STRing AutomatoN GEneratoR Library (Li 2012). The LibStranger tool implemented in C language. LibStranger is an Automata-Based Symbolic String Analysis Library. You can use LibStranger to solve string constraints and/or compute pre and post-images of string manipulation operations such as concatenation and replacement. It can handle complex regular-expression based replace operations such as PHP’s preg replace and approximate these operations in the presence of unbounded loops with high precision and smooth performance. In addition, LibStranger provides fast and precise modeling for common string functions such as trim, substring, toUpperCase and toLowerCase and complex sanitization functions such as PHP’s addslashes and html specialchars. The Lib Stranger is built on top of the MONA (Klarlund 2001, Møller 2001, Brabrand 2000, Møller 2000, Klarlund 2000, Elgaard 2000). MONA is a tool that translates formulas to finite-state automata. The formulas may express search patterns, temporal properties of reactive systems, parse tree constraints, etc. MONA analyses the automaton resulting from the compilation and prints out “valid” or a counter-example. MONA implements decision procedures for the Weak Second-order Theory of One or Two successors (WS1S/WS2S). The theory of one successor, known as WS1S, is a fragment of arithmetic augmented with second-order quantification over finite sets of natural numbers. Its first-order terms denote just natural numbers. The theory has no addition, since that would make it undecidable, but it has a unary operation +1, known as the successor function. WS2S is a generalization to tree structures. Since the theories are monadic second-order logics, we call our tool MONA. We interface the already available functionality of the tool with their specifications to do a backward string analysis of the java-script code to make it void of any code injection.

Introduction

Web applications are getting more an more popular these days with the advent of even replacing many of the existing desk top applications. With the development of the web-based solution it is notorious to have many hackers to get in-between the user and the system and exploit the vulnerabilities of the application. The Common Vulnerabilities Exposure (CVE) generally documents the top vulnerabilities that a web-application is exposed to, they have listed that the vulnerabilities has increased from 2 percent in 2001 to 45 percent in 2007 and is till increasing on a large scale, when the web-appliications are invloved. There were 3 main categories of vulnerabilities, HTML injection, Database code injection and Malicious File Execution. The HTML injection is that when the user injects a malicious content as a part of a user defined field. The database injection happens when the hacker injects a query along with a form data so as to introduce credentials of the hacker, which can in future aid him to login and make use of the functios provided. The Malicious File Execution is when the developer executes a file without validating its credentials, this will introduce bug inn the hosted platform there by crashing the entire system and teh data associated with the application. This was the list of vulnerabilities early in 2001, however the first two categories of the vulnerabilites are still in existince and proves threatful to the web application hosting platform. LibStranger tool is developed with a view to sliminate these kind of vulnerabilities existing in the web application development. The LibStranger implemented in C provided a rich set of functionalities for functioning with string automaton. The main focus on performing a backward analysis can be demonstrated with the help of the implementation available in these library. Complex sanitization functions available in the library aids in a concrete realization of the expected concept. The functions that forms the bridge for our analysis is discussed in the sections 4-5. Section 2-3 generalizes the specialities of LibStranger tool and focuses on the utility of that over our need, It also includes utilizing JNI (Java Native Interface) and SWIG (Smplified Wrapper Interface Generator). The main Implemetation of this existing module invloves identification of the appropriate functions for interfacing. The end result is interfacing of the approximated functions to the scala code, by using the interfacing tool. The final implementation generated will be effective to perform the backward analysis of a javascript code to verify its vulnerability. This can make the server end service provider be prepared of the situation and reject the vulnerable requests to avoid code injection (Shahriar 2011).

Tools and Implementation Overview

The library has function implementation for generating a DFA to accept strings, these strings are defined in the function specification as a parameter. There are functions for generating an automata from another automata. Functions to describe automata transformations and many more. The base definition and data-structure for the constructing the DFA is derived from MONA library.The Mona Library provides the data-structures for the DFA construction and primary modeling with DFA based operations. The MONA library is linked along with the stranger library during the execution of the code. The linking process will bring together both the tool libraries and operational specifics of both MONA and LibStranger to work in-line with each other. This linking pose a challenge during our interfacing step, which will require us to write java wrapper classes for every function involving in the working of the code (when JNI is used). Luckily, this can be eliminated using the SWIG. The future work of this implementation will describe the details of the interfacing of the LibStrnager tool with the scala code.

LibStranger

The stranger.h file in LibStranger library provides the list of all the functions available at our disposal. The letters are represented in a 8-bit ascii format with 1-bit reserved for replacement. The basic string operations can be performed by constructing DFA. There are specific functions that are required for the complete implementation of the scala code which will be discussed in the following sections. These functions can be utilized for performing a backward analysis of the javascript under consideration. an example scenario for the backward analysis would be considering a <script>

Functions under consideration

CharAt():

Returns the character at the “n” position within the string.
//charAt(x)
var myString = 'jQuery FTW!!!';
console.log(myString.charAt(7));
//output: F
This has no precise module in the LibStranger application, however necessary study is undertaken to find an equivalent functionality available. While dealing with the DFA and automata which generalizes the concept of the having a regular expression for the construction, it is difficult to extract a particularly specific value of a generalized solution.

CharCodeAt():

Returns the Unicode value of the character at position “n” within the string.
//charAt(position)
var message="jquery4u"
//alerts "q"
alert(message.charAt(1))
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

Concat():

Combines one or more strings (arguments v1, v2 etc) into the existing one and returns the combined string. Original string is not modified.
//concat(v1, v2,..)
var message="Sam"
var final=message.concat(" is a"," hopeless romantic.")
//alerts "Sam is a hopeless romantic."
alert(final)
The solution offered by the LibStranger tool for the concatenation is dfa_concat .This returns a DFA that will accept the concatenated value of the provided DFA, the arguments for this function are 2 DFA’s the var, which denotes the number of bits we use to represent our ASCII format, that we are making use of an UTF-8 format.
DFA *dfa_concat(DFA *M1, DFA *M2, int var, int *indices);

FromCharCode():

Returns a string created by using the specified sequence of Unicode values (arguments n1, n2 etc). Method of String object, not String instance. For example: String.fromCharCode().
//fromCharCode(c1, c2,...)
console.log(String.fromCharCode(97,99,98,120,121,122))
//output: abcxyz
console.log(String.fromCharCode(72,69,76,76,79))
//output: HELLO
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

Indexof():

Searches and (if found) returns the index number of the searched character or substring within the string. If not found, -1 is returned. “Start” is an optional argument specifying the position within string to begin the search. Default is 0.
//indexOf(char/substring)
var sentence="Hi, my name is Sam!"
if (sentence.indexOf("Sam")!=-1)
alert("Sam is in there!")
This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

LastIndexof():

Searches and (if found) returns the index number of the searched character or substring within the string. Searches the string from end to beginning. If not found, -1 is returned. “Start” is an optional argum