Most static analysis techniques over source code start with constructing an Abstract Syntax Tree (AST), or constructing call graphs. Eclipse JDT has tools necessary to do these tasks. It is straightforward to traverse through the AST to collect information about the lines, methods, loops, conditions, variables, etc., used in the source code. Here’s a simple example where you can just insert your code whenever Eclipse JDT finds a method declaration node in the AST. If this code template looks unfamiliar to you, you must read more about Visitor pattern (design patterns). A full example is available at the Vogella site.

`public class MethodVisitor extends ASTVisitor { List methods = new ArrayList(); @Override public boolean visit(MethodDeclaration node) { ...your code here... } }`

Features thus extracted from source code can be used for a variety of purposes. Here, we discuss the process of converting them to First Order Logic (FOL) constraints. For example, consider we extracted a predicate

**where***IdentifierLength(id, l)***is the identifier (consider an identifier as a fancy term for the name of a variable) and***id***is the number of characters in the identifier. With such predicates and the standard logical operators, it is trivial to define FOL constraints.***l*Satisfiability Modulo Theories (SMT) problem is in the intersection of computer science and mathematics which deals with such FOL formulae. Visualize an SMT instance as an FOL formula, and our problem at hand is to find whether such a set of formulae is satisfiable. Although SMT provides a much richer set of tools to model decision problems, to keep things simple, let us discuss about boolean satisfiability problems (SAT). All we need to do is create boolean SAT instances and feed them to an SMT solver.

There are many SMT solvers readily available. Z3 is probably a heavily used SMT solver. There are even simpler solvers built on top of Z3 such as Boogie. The input to Z3 is a simple script where you specify the FOL and the predicates. For example,

`(assert (> IdentifierLength 10))`

specifies that for this FOL to be true the

*must be greater than 10. Let another formula be***IdentifierLength**`(assert (< IdentifierCaseChanges 2))`

which means that the variable

*should be less than 2. Of course, these***IdentifierCaseChanges***and***IdentifierLength***are what we define as functions with information extracted from source code. I talk about source code here. You may apply the same idea over text as well. Once you have the predicates, just apply***IdentifierCaseChanges**`(check-sat)`

and Z3 will find if there is at least one interpretation such that all asserted formulae are true. Full tutorial is available here.

So, next time, when you are hunting for a class project, try this out! It should work for courses like Artificial Intelligence, Intelligent Systems, Program Analysis, Information Retrieval, Software Engineering, and so on! Of course, talk to your instructor though 🙂

Advertisements