How to extract and use constraints from source code?

Most static analysis techniques over source code start with constructing an Abstract Syntax Tree (AST), or constructing call graphs.  Eclipse JDT has tools necessary to do these tasks. It is straightforward to traverse through the AST to collect information about the lines, methods, loops, conditions, variables, etc., used in the source code. Here’s a simple example where you can just insert your code whenever Eclipse JDT finds a method declaration node in the AST. If this code template looks unfamiliar to you, you must read more about Visitor pattern (design patterns). A full example is available at the Vogella site.
public class MethodVisitor extends ASTVisitor {
    List methods = new ArrayList();

    @Override
    public boolean visit(MethodDeclaration node) {
        ...your code here...
    }
    
}
Features thus extracted from source code can be used for a variety of purposes. Here, we discuss the process of converting them to First Order Logic (FOL) constraints. For example, consider we extracted a predicate IdentifierLength(id, l) where id is the identifier (consider an identifier as a fancy term for the name of a variable) and l is the number of characters in the identifier. With such predicates and the standard logical operators, it is trivial to define FOL constraints.
Satisfiability Modulo Theories (SMT) problem is in the intersection of computer science and mathematics which deals with such FOL formulae. Visualize an SMT instance as an FOL formula, and our problem at hand is to find whether such a set of formulae is satisfiable. Although SMT provides a much richer set of tools to model decision problems, to keep things simple, let us discuss about boolean satisfiability problems (SAT). All we need to do is create boolean SAT instances and feed them to an SMT solver.
There are many SMT solvers readily available. Z3 is probably a heavily used SMT solver. There are even simpler solvers built on top of Z3 such as Boogie. The input to Z3 is a simple script where you specify the FOL and the predicates. For example,
(assert (> IdentifierLength 10))
specifies that for this FOL to be true the IdentifierLength must be greater than 10.  Let another formula be
(assert (< IdentifierCaseChanges 2))
which means that the variable IdentifierCaseChanges should be less than 2. Of course, these IdentifierLength and IdentifierCaseChanges are what we define as functions with information extracted from source code. I talk about source code here. You may apply the same idea over text as well. Once you have the predicates, just apply
(check-sat)
and Z3 will find if there is at least one interpretation such that all asserted formulae are true. Full tutorial is available here.
So, next time, when you are hunting for a class project, try this out! It should work for courses like Artificial Intelligence, Intelligent Systems, Program Analysis, Information Retrieval, Software Engineering, and so on! Of course, talk to your instructor though 🙂
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s