ROUGH DRAFT authorea.com/92843
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • Automata Based String Analysis in JSAI

    Abstract:The Automata Based String Analysis in JSAI revolves around the idea of interfacing the Lib Stranger tool with the JSAI implementation. LibStranger stands for STRing AutomatoN GEneratoR Library (Li 2012). The LibStranger tool implemented in C language. LibStranger is an Automata-Based Symbolic String Analysis Library. You can use LibStranger to solve string constraints and/or compute pre and post-images of string manipulation operations such as concatenation and replacement. It can handle complex regular-expression based replace operations such as PHP’s preg replace and approximate these operations in the presence of unbounded loops with high precision and smooth performance. In addition, LibStranger provides fast and precise modeling for common string functions such as trim, substring, toUpperCase and toLowerCase and complex sanitization functions such as PHP’s addslashes and html specialchars. The Lib Stranger is built on top of the MONA (Klarlund 2001, M 2001, Brabrand 2000, M 2000, Klarlund 2000, Elgaard 2000). MONA is a tool that translates formulas to finite-state automata. The formulas may express search patterns, temporal properties of reactive systems, parse tree constraints, etc. MONA analyses the automaton resulting from the compilation and prints out “valid” or a counter-example. MONA implements decision procedures for the Weak Second-order Theory of One or Two successors (WS1S/WS2S). The theory of one successor, known as WS1S, is a fragment of arithmetic augmented with second-order quantification over finite sets of natural numbers. Its first-order terms denote just natural numbers. The theory has no addition, since that would make it undecidable, but it has a unary operation +1, known as the successor function. WS2S is a generalization to tree structures. Since the theories are monadic second-order logics, we call our tool MONA. We interface the already available functionality of the tool with their specifications to do a backward string analysis of the java-script code to make it void of any code injection.

    Introduction

    Web applications are getting more an more popular these days with the advent of even replacing many of the existing desk top applications. With the development of the web-based solution it is notorious to have many hackers to get in-between the user and the system and exploit the vulnerabilities of the application. The Common Vulnerabilities Exposure (CVE) generally documents the top vulnerabilities that a web-application is exposed to, they have listed that the vulnerabilities has increased from 2 percent in 2001 to 45 percent in 2007 and is till increasing on a large scale, when the web-appliications are invloved. There were 3 main categories of vulnerabilities, HTML injection, Database code injection and Malicious File Execution. The HTML injection is that when the user injects a malicious content as a part of a user defined field. The database injection happens when the hacker injects a query along with a form data so as to introduce credentials of the hacker, which can in future aid him to login and make use of the functios provided. The Malicious File Execution is when the developer executes a file without validating its credentials, this will introduce bug inn the hosted platform there by crashing the entire system and teh data associated with the application. This was the list of vulnerabilities early in 2001, however the first two categories of the vulnerabilites are still in existince and proves threatful to the web application hosting platform. LibStranger tool is developed with a view to sliminate these kind of vulnerabilities existing in the web application development. The LibStranger implemented in C provided a rich set of functionalities for functioning with string automaton. The main focus on performing a backward analysis can be demonstrated with the help of the implementation available in these library. Complex sanitization functions available in the library aids in a concrete realization of the expected concept. The functions that forms the bridge for our analysis is discussed in the sections 4-5. Section 2-3 generalizes the specialities of LibStranger tool and focuses on the utility of that over our need, It also includes utilizing JNI (Java Native Interface) and SWIG (Smplified Wrapper Interface Generator). The main Implemetation of this existing module invloves identification of the appropriate functions for interfacing. The end result is interfacing of the approximated functions to the scala code, by using the interfacing tool. The final implementation generated will be effective to perform the backward analysis of a javascript code to verify its vulnerability. This can make the server end service provider be prepared of the situation and reject the vulnerable requests to avoid code injection (Shahriar 2011).

    Tools and Implementation Overview

    The library has function implementation for generating a DFA to accept strings, these strings are defined in the function specification as a parameter. There are functions for generating an automata from another automata. Functions to describe automata transformations and many more. The base definition and data-structure for the constructing the DFA is derived from MONA library.The Mona Library provides the data-structures for the DFA construction and primary modeling with DFA based operations. The MONA library is linked along with the stranger library during the execution of the code. The linking process will bring together both the tool libraries and operational specifics of both MONA and LibStranger to work in-line with each other. This linking pose a challenge during our interfacing step, which will require us to write java wrapper classes for every function involving in the working of the code (when JNI is used). Luckily, this can be eliminated using the SWIG. The future work of this implementation will describe the details of the interfacing of the LibStrnager tool with the scala code.

    LibStranger

    The stranger.h file in LibStranger library provides the list of all the functions available at our disposal. The letters are represented in a 8-bit ascii format with 1-bit reserved for replacement. The basic string operations can be performed by constructing DFA. There are specific functions that are required for the complete implementation of the scala code which will be discussed in the following sections. These functions can be utilized for performing a backward analysis of the javascript under consideration. an example scenario for the backward analysis would be considering a <script>

    Functions under consideration

    CharAt():

    Returns the character at the “n” position within the string.
    aureplacedverbaa
    aureplacedverbaaa
    aureplacedverbaaaa
    aureplacedverbaaaaa
    This has no precise module in the LibStranger application, however necessary study is undertaken to find an equivalent functionality available. While dealing with the DFA and automata which generalizes the concept of the having a regular expression for the construction, it is difficult to extract a particularly specific value of a generalized solution.

    CharCodeAt():

    Returns the Unicode value of the character at position “n” within the string.
    aureplacedverbaaaaaa
    aureplacedverbaaaaaaa
    aureplacedverbaaaaaaaa
    aureplacedverbaaaaaaaaa
    This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

    Concat():

    Combines one or more strings (arguments v1, v2 etc) into the existing one and returns the combined string. Original string is not modified.
    aureplacedverbaaaaaaaaaa
    aureplacedverbaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaa
    The solution offered by the LibStranger tool for the concatenation is aureplacedverbaaaaaaaaaaaaaaa .This returns a DFA that will accept the concatenated value of the provided DFA, the arguments for this function are 2 DFA’s the var, which denotes the number of bits we use to represent our ASCII format, that we are making use of an UTF-8 format.
    aureplacedverbaaaaaaaaaaaaaaaa

    FromCharCode():

    Returns a string created by using the specified sequence of Unicode values (arguments n1, n2 etc). Method of String object, not String instance. For example: String.fromCharCode().
    aureplacedverbaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaa
    This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

    Indexof():

    Searches and (if found) returns the index number of the searched character or substring within the string. If not found, -1 is returned. “Start” is an optional argument specifying the position within string to begin the search. Default is 0.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaa
    This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

    LastIndexof():

    Searches and (if found) returns the index number of the searched character or substring within the string. Searches the string from end to beginning. If not found, -1 is returned. “Start” is an optional argument specifying the position within string to begin the search. Default is string.length-1.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    This again expects us to provide a specific extraction of a regular expression operating on an automata. Study reveals that this kind of operation is considered ineffective when we have a generalized domain to operate with . However, this kind of function is necessary in analysing a java script code as this forms an elemental component of the need.

    Match():

    Executes a search for a match within a string based on a regular expression. It returns an array of information or null if no match is found.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    The solution offered by the LibStranger tool is aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Replace():

    Searches and replaces the regular expression (or sub string) portion (match) with the replaced text instead.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    The solution offered by the LibStranger tool is aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa . This function performs the way we expect with the components and the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Search():

    Tests for a match in a string. It returns the index of the match, or -1 if not found.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    The solution offered by the LibStranger tool is aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Slice():

    Returns a substring of the string based on the “start” and “end” index arguments, NOT including the “end” index itself. “End” is optional, and if none is specified, the slice includes all characters from “start” to end of string.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    There is no specific solution, however we can use normal C string operations do the necessary functionality and then to construct DFA’s with the processed values. This will be of help in an adverse situation when we want to generalise the solution.

    Split():

    Splits a string into many according to the specified delimiter, and returns an array containing each element. The optional “limit” is an integer that lets you specify the maximum number of elements to return.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    There is no specific solution, however we can use normal C string operations do the necessary functionality and then to construct DFA’s with the processed values. This will be of help in an adverse situation when we want to generalise the solution.

    SubString():

    Returns the characters in a string beginning at “start” and through the specified number of characters, “length”. “Length” is optional, and if omitted, up to the end of the string is assumed.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Returns the characters in a string between “from” and “to” indexes, NOT including “to” inself. “To” is optional, and if omitted, up to the end of the string is assumed.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    There is no specific solution, however we can use normal C string operations do the necessary functionality and then to construct DFA’s with the processed values. This will be of help in an adverse situation when we want to generalise the solution.

    Tolowercase():

    Returns the string with all of its characters converted to lowercase.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    The solution offered by the LibStranger tool is aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Touppercase():

    Returns the string with all of its characters converted to uppercase.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    The solution offered by the LibStranger tool is aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa . This function performs the way we expect it in matching the components of the code with the essential part of the DFA. The input argument of the code is the DFA in which the match is to be made and the string to be matched for it’s containment, the last two arguments are 8 and the indices parameter.
    aureplacedverbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    Fundamental Concept

    The fundamental concept behind the functioning of the entire system is based on the evaluation of the existing field entries in an application. The error pattern is recoginzed and the application entries are combined wit the error patterns. This is then evaluated for the intersection of the error pattern , if the intersection is evaluated to true then the proper steps are taken for sanitization of the operation. Sanitization can be done in 2 possible ways one is by match and block and the other one is by match and sanitize. The first category of match and block takes place for the most vulnerable category of input fields where the threat can be maximum intense. This way of sanitization will negotiate the application to cease it’s function when the particular category of input vulnerability is spotted. Te other kind of sanitization takes place when the attack pattern involved is responsible for a lower impact pattern and the avoidance can be done by merely removing the occuring attack pattern from the field involved. This occurence of the attack pattern is usually reported for the learning module which can enrich the data set by adding new pattern to the existing attack patters it has.

    Conclusion

    This work is continued to deliver fruitful result in making use of the automata structure to provide the best possible use of the system to configure the scala code to perform at it’s best in analsing the javascript application code.The SWIG which is the simple wrapper class Interface Generator is the one that will be used in the step to follow to complete the integration of the existing tool with that of the scala code which is more of the interfacing task.