Getting started with Seal

Introduction

Seal is a research prototype that can infer the potential side-effects of methods in a C# program. The side-effects of a method are the set of heap locations in the program state right before the invocation of the method (referred to as prestate) that are modified (or written) by the method. Intuitively, these are the locations on which the method has an effect and hence can be expected to change after the method invocation. A method with no side-effects is said to be pure. Being a sound static analysis, Seal over-approximates the side-effects of a method but will never miss a true side-effect. However, there are few scenarios in which seal could be unsound (i.e, miss a real side-effect) which are discussed at the end of the tutorial. In the current state, Seal supports many sophisticated (and higher-order) C# features such as LINQ, delegates, event-handler, exceptions and so on.

The theory and algorithms behind are discussed in detail in the following technical papers: Purity Analysis : An Abstract Interpretation Formulation, Modular Heap Analysis For Higher Order Programs

Setting up Seal

To set up seal for the first time in your system follow the instructions given in the READEME.txt file that is distributed with the source code.

The setup script  creates and stores summaries for (parts of) the 3 core .NET libraries namely, mscorlib.dll, system.dll and system.core.dll. These are the only libraries that are implicitly linked to the inputs during the analysis. Other libraries (.NET or user libraries) that may be required by the analysis have to be provided along with the input (see the tutorial on inputs to Seal).

Running Seal (see this tutorial also)

Seal takes as input DLL(s) obtained by compiling C# code bases. The main executable that starts the analysis is called "Checker.exe" and can be found in the "$SEALHOME\Bin" directory.  E.g:  To analyze a dll named input.dll stored in the directory "C:\" use the following command:

$SEALHOME\Bin\Checker.exe /in C:\input.dll

Seal supports analysis of multiple DLLs. Moreover, It also takes as input several parameters for configuring the analysis. To know more about the inputs to Seal, see this tutorial.

Understanding the output

Representation of side-effects

Seal outputs for each method in the input program its side-effects in the form of accesspaths which is a handle to the modified prestate memory location from the variables live at the entry of the method (namely, parameters and static fields).

using System;
class Test {
    Test f;
    String g;
    public void ImpureMethod(Test param){       
        var x = param.f;
        x.g = "Hello";
    }            
}

The above method writes into the g field of the object referred to by the f field of param. In this case, Seal would emit param.f.g as the side-effect of the method. (Try making the param variable a static field !).

In some cases, particularly in the presence of recursive data-structures, a method can modify unbounded number of prestate objects. In such cases, Seal emits only a finite number of accesspaths. For example, try running the following sample program.

using System;
class Test {
    Test f;
    String g;
    public void ImpureMethod(Test param){              
        var x = param.f;
        while (x != null)
        {
            x.g = "Hello";
            x = x.f;
        }        
    }            
}

Handling of method invocations

In general, the side-effects of a method depends on the behavior of the methods called by it. For example, consider the following program in which the methods C and D calls the methods  A and D, respectively. 

using System;
class Test
{
    private int f = 0;
    public void C(Test param)
    {
        param.A().f = 3;
    }
    public void D(Test param)
    {
        param.B().f = 3;
    }
    public Test A()
    {
        return new Test();
    }
    public Test B()
    {
        return this;
    }
}

The method A returns a new object and hence writing its f field will not result in a side-effect whereas writing to the return value of B is basically a write to the parameter object param.

Seal correctly handles such scenarios: it would classify the method C as pure and D as having side-effect param.f

Handling of call-backs

The main issue in dealing with open programs is that it is not possible to determine all the callees of a method  without knowing the calling context (due to the presence of call-backs). Informally, a call-back is an indirect call (which could be a virtual method call, function pointer call or delegate call) whose targets become known only in the context of the callers of the method containing the indirect call. Eg: in the following code, the indirect call param.Equals(temp) is a call-back.

using System;
class Test
{
    //Object::Equals() method is a call-back which can potentially do anything
    public bool ConditionallyPureMethod(Object param)
    {
        var temp = new Test();
        return param.Equals(temp);
    }
}

Note that the actual target of param.Equals(temp) depends on the concrete type of param which would be known only when the callers of the above method are known.

The side-effects of a method having call-backs can be completely inferred only when the targets of the call-backs are known. For such methods, Seal emits all the side-effects inferred for the method independent of the call-back and also all the call-backs made by the method to indicate that the side-effects list is not complete. In some cases, due to the imprecision in the analysis, Seal can report more call-backs than what actually exist (but it would not miss a call-back).

For the above example, Seal would report that the method is conditionally pure and has a call-back param.Equals(temp).

More Examples and Test cases

Many simple examples are available in the test suites distributed with the source code.

Reasons for unsoundness

Sometimes you might find that seal is not able to find the side-effects in your code, it may be because of the following reasons:

  • Concurrency: Seal does not consider the interleavings between the different threads in the program.
  • Reflection: Seal treats reflection unsoundly.
  • Passing fields by reference: C# supports pass by reference mechanism (via the ref or out keyword), Seal, in its current state, does not correctly model the passing of fields by reference (nevertheless it handles most common usage scenarios).  
  • Referencing libraries other than the 3 standard .NET libraries: mscorlib.dll, system.dll and system.core.dll and not specifying them as input

Making Seal scale to large applications:

  • Patch "checker.exe" to use 4G of virtual memory: Unfortunately, Seal is a 32bit application as it depends on Phoenix which is 32bit. So by default it can use only 2GB of memory. However, you can make it use 4GB by using this patch: http://www.ntcore.com/4gb_patch.php. Apply this patch on the "checker.exe" executable before running the tool. But this is the limit and cannot be pushed any further. We will consider porting seal to a better front end in the future.
     
  • Try splitting the application in to multiple DLLs if possible: At present, Seal stores the data (such as method summaries, type informations etc.) of the DLL being analysed completely in memory. However, the information about previously analysed DLLs are serialised to the disk. So Seal has a good chance of scaling better on a bunch of DLLs of the same size compared to a single monolithic DLL. See https://seal.codeplex.com/wikipage?title=Inputs%20to%20Seal for information on analyzing multiple DLLs together.
     
  • Set context-bound to 0 and flow-sensitivity to false in the config file:
    This will make seal imprecise but is the last resort

Last edited Jan 20, 2014 at 8:51 AM by Ravi87, version 14

Comments

No comments yet.