Excelsior Logo Home
Buy   
Download   
Support   
 Forum   
Blog   

Protect Your Java Code — Through Obfuscators And Beyond

Last update: 09-Jul-2014

By Dmitry LESKOV

Reverse engineering of your proprietary applications by unfair competition or malicious hackers may result in highly undesirable exposure of your algorithms and ideas, proprietary data formats, licensing and security mechanisms, and, most importantly, your customers' data. Here is why Java is particularly weak in this respect compared to C++:

Target Instruction Set

C++: Compiles to a low-level instruction set that operates on raw binary data and is specific to the target hardware, such as x86, ARM, or PowerPC.

Java: Compiles to a higher-level portable bytecode that operates on typed values: objects, primitive types, and arrays thereof.

Compiler Optimizations

C++: Numerous code optimizations are performed at compile time. Inline substitution results in copies of the given (member) function being scattered around the binary image; use of the preprocessor combined with compile-time evaluation of expressions may leave no trace of the constants defined in the source code; and so on.

Java: Relies on dynamic (Just-In-Time) compilation for performance improvement. The standard javac compiler is straightforward, it does no compile time optimizations commonly found in C++ compilers. The idea is to enable the JIT compiler to perform all optimizations at run time, taking the execution profile into account.

Linkage

C++: Programs are statically linked, and metaprogramming facilities (reflection) are absent in the core language. So the names of classes, members, and variables need not be present in the compiled and linked program, except for names exported from dynamic libraries (DLLs/shared objects.)

Java: Dependencies are resolved at run time, when classes are loaded. So the name of the class and names of its methods and fields must be present in a class file, as well as names of all imported classes, called methods, and accessed fields.

Delivery Format

C++: An application is delivered as a monolithic executable (maybe with a few dynamic libraries), so it is not easy to identify all member functions of a given class or reconstruct the class hierarchy.

Java: An application is typically delivered as a set of jar files, which are just non-encrypted archives containing individual class files.

As a result, decompilation of Java programs is a much simpler task compared to C++ and therefore may be fully automated. Class hierarchy, high-level statements, names of classes, methods and fields - all this can be retrieved from class files emitted by the standard javac compiler. Any person of ordinary skills in programming can download a Java decompiler, run your application through it and read the source code almost as if it was open source.

Let's see what can be done to prevent that.

Bytecode Encryption - Straightforward But Totally Flawed

The solution that first comes to mind is to encrypt the class files. Unfortunately, this approach is fundamentally flawed, because the JVM may not load and execute encrypted classes, period. A class must be decrypted so that JVM could load it, and it is fairly easy to intercept the original non-encrypted bytecode at that point. This technique is described in details in [1] and [2].

The java.lang.instrument API provides hackers with yet another way to circumvent class file encryption mechanisms.

Moreover, any protection scheme based on bytecode encryption can be defeated without reverse engineering of the decryption routines. As long as someone has both the encrypted application and the decryption key, they can obtain the original classes fairly easily regardless of how they were encrypted.

Yes, the authors of contemporary bytecode encryptors know about those standard mechanisms and try to disable them. However, as Java is now open source, one may simply download the OpenJDK source code, patch it to dump loaded classes to disk and force the -XX:+CompileTheWorld option.

To top it up, not so long ago a security engineer, frustrated by false claims of vendors whose tools implement bytecode encryption, has put together an article [3] showing how easily OpenJDK can be modified to defeat any bytecode encryption scheme.

Two refinements:

  1. Of course, strong encyption does its job when a malicious competitor or hacker gets hold of the encrypted class files only, so it may help reduce exposure of server-side application code running in a controlled environment to some extent. (That is, until the system administrator account gets hacked. :) ) But any code designed to run on third-party systems has to include or be accompanied with the respective decryption key and hence may not be protected through encryption.
  2. The already oversized heading of this section should have read "Software-only Bytecode Encryption...", because with the help of a small piece of silicon Java bytecode may be encrypted very securely.

    Validy SoftNaOS

    The tradeoff? Performance impact several orders of magnitude deep…

Okay, so software-only bytecode encryption makes little sense, and the use of hardware has its own issues and is not always possible. How about making the bytecode less comprehensible? This is what obfuscation is all about — change the binary so that it produces the same results when run, but is much harder to understand when decompiled.

Name Obfuscation

Name obfuscation is the process of replacing the identifiers you have carefully chosen to your company's coding standards, such as com.mycompany.TradeSystem.Security.checkFingerprint(), with meaningless sequences of characters, e.g. a.a0(). (Of course, a name obfuscator must process the entire application to ensure consistency of name changes across all classes and jars.)

The more advanced obfuscators take one step further. As you surely know, a Java class may have more than one method with the same name if their signatures are different, Utilizing that fact, an obfuscator can rename setPos(int x, int y) and setColor(int color) to, say, a(int a, int b) and a(int a).

A nice side effect of name obfuscation is the substantial reduction of class file size, which results in somewhat smaller downloads and faster cold starts of desktop Java applications, and enables your Android smartphone hold more apps games. But just like any other technique, name obfuscation has its limitations and downsides:

  • You may not obfuscate the names of standard Java API classes that are part of the JRE, so all uses of those classes remain clearly seen in the decompiled code.
  • Entities accessed via reflection or JNI at run time may not be renamed. Problem is, you cannot say for sure whether this particular class or method may be accessed dynamically, especially if it belongs to a third-party library, component, or framework, or to a part of your application that was written by someone else.

    Many frameworks and tools rely heavily on reflection. One notable example is the JavaBeans component architecture and the respective visual programming tools. The EJB specification requires (and the container enforces) specific signatures of the callback methods such as ejbCreate().

  • Names of serializable classes may not be obfuscated. Most obfuscators would automatically exclude classes that implement the java.io.Serializable interface. Similarly, RMI suffixes _Stub and _Skel, and classes that extend java.rmi.Remote must trigger the name obfuscator's exclude mechanism.

String Encryption

String encryption is another feature commonly found in Java obfuscators. Replacing string literals with calls to a method that decrypts its parameter makes a hacker's life more interesting, but, unfortunately, not much.

Problem is, the strings must be decrypted at run time, so the respective code must be included in the application. Moreover, in most tools string encryption is so straightforward that the hacker does not even need to reverse-engineer that code! All they have to do is write a program that would call the decrypting method(s) for all the strings.

Code and Data Flow Obfuscation

Simply put, flow obfuscation is about modifying the program so that it yields the same result when run, but is impossible to decompile into a well-structured Java source and/or is more difficult to understand.

Most code obfuscators would replace instructions produced by a Java compiler with gotos and other instructions that may not be decompiled into valid Java source. A decompiler expecting conventional javac output would either fail or produce pseudocode with lots of labels and goto statements. However, not all decompilers are that dumb.

An interesting yet obscured offshoot of the Soot bytecode analysis and optimization framework, developed by the Sable group at McGill University, is the Dava decompiler project [4]. It aims at decompiling Java bytecode produced by any tool, not just the javac compiler, into readable source, so it is effectively an attempt to create a deobfuscator.

(The funny part is that other people in the same group are working on a Java bytecode obfuscator called JBCO. I wonder if they hold an internal "obfuscate-decompile" tournament.)

However, even if you use a code obfuscator that forces all decompilers to fail completely, a bytecode disassembler would still work. Remember that the JVM instruction set consists of higher-level instructions compared to real CPUs such as x86 or ARM, so disassembled Java is easier to understand than disassembled C++. It would therefore make sense to also "distort" the overall structure of the program. The more advanced obfuscation techniques include class hierarchy changes, method inlining and outlining, loop unrolling, array folding/flattening, etc.

Embedding a custom virtual machine into the application and translating the most sensitive methods to its instructions set is perhaps the most effective but at the same time one of the most expensive transformations.

Finally, sophisticated algorithms may also be protected through mathematics transformation. The transformed code would compute the same results using different data types. However, such tools are much more expensive and often available only as part of a custom risk management solution. Another point is that such transformations may easily slow down the code by an order of magnitude and beyond, so you'd better apply them only to most sensitive pieces of code, provided they are not performance-critical.

Limitations:

  • Overobfuscated classes may not pass stricter bytecode verification anticipated in future JVM implementations.
  • Code flow obfuscation has a negative impact on performance.
  • Field engineering may be difficult.
  • As I mentioned above, the standard Java API classes may not be obfuscated, so references to them would be present in the decompiled/disassembled code. An obfuscator may however be capable of replacing some of the basic Java APIs with custom, obfuscated implementations.

Extra capabilities

A few things to keep in mind when choosing an obfuscator.

Incremental obfuscation

If you plan to issue incremental updates to your obfuscated application, you have to ensure that the names of classes in the new version of your application are consistent with the version originally shipped to end users. When choosing an obfuscator make sure it can reproduce the renames it made during the previous obfuscation session.

Class file optimizations

Many obfuscators can optionally optimize class files for size by removing the unused methods, fields, and strings, design-time metadata, etc. However, care must be taken when using this feature, because a method or field may also be accessed using JNI or reflection, and it is not possible to reliably detect all such accesses even by analyzing the running program.

Bytecode optimizations supported by some obfuscators include constant expression evaluation, assignment of static and final attributes, inlining of simple methods such as getters and setters, peephole optimizations, and so on. However, the benefits of such optimizations are only substantial in constrained Java ME CLDC environments. The more sophisticated JVMs, such as Sun/Oracle HotSpot, IBM J9, and BEA (now also Oracle) JRockit, would apply these and many other optimizations during JIT compilation. You'd better not stay in their way.

Debug info obfuscation

By default, the javac compiler writes source file names and, optionally, line number information (with -g option) to the resulting class files. Those are required to get meaningful stack traces. An obfuscator may remove that information altogether, or change file names to meaningless strings. If you rely on stack traces when resolving customer issues, make sure your obfuscator comes with a reverse mapping utility that can reconstruct the original stack trace with unobfuscated names of classes and source files.

Note also that certain third-party libraries and frameworks require stack trace information to function properly. One example is Apache log4j.

Watermarking

Some obfuscators may embed a hidden customer or distributor ID into your class files, just like in digital media, enabling you to track down software pirates.

Source code obfuscation

Suppose your proprietary Java source triggers an annoying bug in your favorite IDE, and you have decided to reduce your source code to a test case. Before sending it to the IDE vendor, you may wish to run it through a source code obfuscator in order to replace identifiers with nonsense and remove comments.

Strengthening Protection with Ahead-Of-Time Compilation

As you may see, all three main approaches to Java code obfuscation have certain drawbacks and limitations, and don't solve the fundamental problems listed in the introduction. Fortunately, there exist a class of tools originally developed with the goal of improving performance of Java applications. These tools are Ahead-Of-Time native code compilers, which take your jars and classes as input, compile them to optimized native code, and produce a conventional executable.

Remember the C++ to Java comparison at the top of the article? Most statements from the C++ column apply to AOT-compiled Java:

  • native, low-level instruction set
  • highly optimized code
  • static linking into a monolithic executable

This naturally leads us to the idea of a two-step approach to Java code protection:

  1. Obfuscate names and encrypt strings using the tools not relying on the application being delivered in bytecode form. Make sure to disable control/data flow obfuscations.
  2. Compile the obfuscated application down to optimized native code.

Refer to my other article for more information about AOT compilers.

Popular Obfuscators

An Internet search for "Java obfuscator" would return way too many results. For your convenience, I have reduced the list to a handful of actively maintained products, both commercial and free.

Product License Price
Allatori Proprietary $
Dash-O-Pro Proprietary $$$?
GuardIT for Java Proprietary $$$?
Proguard Open Source -
Stringer Proprietary $
yGuard Freeware -
Zelix Klassmaster Proprietary $

There are two products that stand out of the above crowd:

Stringer does not only encrypt strings as its name might suggest. It can also encrypt resource files — images, media clips, etc. — found in your jar files, hide (standard) library calls, and inject integrity checks in your application, making sure that only the original, unmodified versions of your class files can execute properly.

GuardIT for Java does code/name obfuscation and string encryption, but then goes above and beyond to wrap your application code into an active protection system capable of detecting tampering in real-time. These advanced techniques operate on bytecode thus making GuardIT for Java non-interoperable with AOT compilers.

If you need to obfuscate your Java source, Semantic Designs' Thicket™ family of source code formatters includes a Java formatter with obfuscation capability.

Drop me an email if you know of a tool worth adding to this list.

As I have already mentioned above, the Sable group at McGill University has among its research projects a Java bytecode obfuscator called JBCO. It is not usable commercially, and will likely never be, but is worth looking at if you aim at Building a Better Obfuscator.

Further Reading

Books

The links to publishers and stores are not affiliate links.

Despite its title, Decompiling Java by Godfrey Nolan has a chapter on code protection, most of which is in turn devoted to obfuscation.

Update 22-May-2013: Godfrey has recently published a new book, Decompiling Android, which also has a section on protection.

Alex Kalinovsky in his Covert Java: Techniques for Decompiling, Patching, and Reverse Engineering again mostly covers the topics listed in the book title, but has also included a chapter on obfuscation and cracking obfuscated code. By coincidence, that particular chapter is available online, so I have just saved you twenty dollars.

Reversing: Secrets of Reverse Engineering by Eldad Eilam is not Java-specific, so it offers a broader view.

Popular articles

If you want to learn more about code and data flow obfuscation techniques and how they rank against each other in terms of potency, resilience and cost, the three-part series by Sonali Gupta, appeared in the Palisade Magazine in Aug-Oct 2005, would make a good start:

Research publications

A group of Rowan University researchers led by Prof. Ravi P. Ramachandran had studied commercial Java Obfuscators. Their findings are summarized in two articles:

If you want some theory and controversy, the paper "On the (Im)possibility of Obfuscating Programs" by Boaz Barak et al, found in the 21st Annual International Cryptology Conference Proceedings, proves that obfuscation is impossible. It sparked quite some discussion and confusion, so Boaz had later written an essay explaining what that result really means, in his opinion.

For more information about software protection in general, refer to Watermarking, Tamper-Proofing, and Obfuscation - Tools for Software Protection by C. Collberg and C. Thomborson, or the more recent Revisiting Software Protection by P.C. van Oorschot of Carleton University, Canada. Dozens of works listed in their References sections may keep you busy reading for many hours.

Vendor publications

PreEmptive Solutions, the maker of the Dash-O-Pro Java obfuscator, maintains a collection of whitepapers.

References

  1. A. Sundararajan. Retrieving .class files from a running app, 2007.
  2. V. Roubtsov. Cracking Java byte-code encryption, JavaWorld.com, 09-May-2003
  3. I. Kinash. A Gun is a Great Equalizer: OpenJDK Hack vs. Class Encryption, Java.DZone.com, 25-Jun-2012
  4. N. A. Naeem, L. Hendren. Programmer-friendly Decompiled Java, 2006 (PDF)

Obfuscation Examples

Let's consider a fictional application that stores user passwords as SHA digests:

import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;

public class Authentication {

    public static byte[] encryptPassword(String password) 
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        String saltedPassword = password + "Add-Some-Salt";
        byte[] digestive = saltedPassword.getBytes("ISO-8859-1");
        MessageDigest md = MessageDigest.getInstance("SHA");
        md.update(digestive);
        return md.digest();
    }

    public static boolean checkPassword(String password, byte[] digest) 
        throws UnsupportedEncodingException, NoSuchAlgorithmException {
        
        if (Arrays.equals(encryptPassword(password),digest)) return true;
        System.out.println("Wrong password");
        return false;
    }
}

Compiling this class with the standard javac compiler and decompiling the resulting class file using one of the freely available decompilers yields:

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;

public class Authentication
{
    public static byte[] encryptPassword(String s)
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        String s1 = (new StringBuilder()).append(s).append("Add-Some-Salt").toString();
        byte abyte0[] = s1.getBytes("ISO-8859-1");
        MessageDigest messagedigest = MessageDigest.getInstance("SHA");
        messagedigest.update(abyte0);
        return messagedigest.digest();
    }

    public static boolean checkPassword(String s, byte abyte0[])
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        if(Arrays.equals(encryptPassword(s), abyte0))
        {
            return true;
        } else
        {
            System.out.println("Wrong password");
            return false;
        }
    }
}

As you may see, the only major differences in the decompiled source code are automatically generated names of parameters and local variables.

Let's run the above sample through a name obfuscator and decompile the resulting class:

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;

public class a
{
    public static byte[] a(String a)
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        String s = (new StringBuilder()).append(a).append("Add-Some-Salt").toString();
        byte abyte0[] = s.getBytes("ISO-8859-1");
        MessageDigest messagedigest = MessageDigest.getInstance("SHA");
        messagedigest.update(abyte0);
        return messagedigest.digest();
    }
    public static boolean a(String a, byte a[])
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        if(Arrays.equals(a(a), a))
        {
            return true;
        } else
        {
            System.out.println("Wrong password");
            return false;
        }
    }

Even though the obfuscator has replaced the public identifiers Authentication, encryptPassword() and checkPassword with meaningless, overloaded a, it is clear that these methods deal with the Security API and use the SHA algorithm. The salt string is also exposed.

Now, enabling string encryption makes the decompiled code a little bit more, well, cryptic:

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;

public class b
{
    public static byte[] a(String a)
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        String s = (new StringBuilder()).append(a).append(a.a("X|~4Nws|<Ksu!")).toString();
        byte abyte0[] = s.getBytes(a.a("]FX9(-&-1d"));
        MessageDigest messagedigest = MessageDigest.getInstance(a.a("E_\024"));
        messagedigest.update(abyte0);
        return messagedigest.digest();
    }
    public static boolean a(String a, byte a[])
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        if(Arrays.equals(a(a), a))
        {
            return true;
        } else
        {
            System.out.println(a.a("Cgxzw5cuofh{j1"));
            return false;
        }
    }
}

Okay, strings are now encrypted, but the import list is still there and it is perfectly clear that these methods use the Java Security API. So a hacker would still have little doubt over where to look for sensitive code, and they don't even need to reverse the encryption algorithm. All they need to do is extract the calls of the decryption method from the decompiled source:

// crack.java

public class crack {

    public static void main( String[] args ) {
        System.out.println(a.a("X|~4Nws|<Ksu!"));
        System.out.println(a.a("]FX9(-&-1d"));
        System.out.println(a.a("E_\024"));
        System.out.println(a.a("Cgxzw5cuofh{j1"));
    }
}

... and then compile and run the resulting code:

$ javac crack.java
$ java crack
Add-Some-Salt
ISO-8859-1
SHA
Wrong password
$

Voila!

That said, the more advanced tools such as Stringer take serious countermeasures against the above attack.

Let's try code flow obfuscation now. On first sight, there is not much code to obfuscate here, and code and data flows are pretty simple: just a bunch of standard API calls without any loops or exception handling. Indeed, the obfuscator I was using could only make one change after I enabled code obfuscation:

        // Code ofbuscation disabled
        String s = (new StringBuilder()).append(a).append(a.a("X|~4Nws|<Ksu!")).toString();
        byte abyte0[] = s.getBytes(a.a("]FX9(-&-1d"));
        // Code ofbuscation enabled
        byte abyte0[] = (new StringBuilder()).append(a).append(a.a("X|~4Nws|<Ksu!")).toString().getBytes(a.a("]FX9(-&-1d"));

Perhaps that is just a weakness of the code obfuscation features implemented in a particular product? Indeed, it is possible to make the result of decompilation much less readable with JBCO. But before I move on, a word of caution:

DO NOT TRY THIS AT HOME!

or, more seriously, do not try to use JBCO in a production environment. It is a research project, and as such is aimed at enabling researchers to try their ideas. It is not meant to be scalable, robust, and well documented.

Update 09-Jul-2014: I've upgraded to the latest version of JBCO and found it substantially more stable. I only had to disable one transformation, bb.jbco_dcc (Disobey Constructor Conventions) to avoid JBCO crashes.

I was able to push JBCO to the limits on the original version of Authentication.class (with a main() method containing a couple of units tests added) using the following command line:

java -Xmx384m ^
  -cp sootclasses-2.5.0.jar;polyglotclasses-1.3.5.jar;jasminclasses-2.5.0.jar;. ^
  soot.jbco.Main ^
  -cp .;"%JAVA_HOME%\lib\rt.jar";"%JAVA_HOME%\lib\jce.jar" ^
  -t:9:wjtp.jbco_cr ^
  -t:9:wjtp.jbco_mr ^
  -t:9:wjtp.jbco_fr ^
  -t:9:wjtp.jbco_bapibm ^
  -t:9:wjtp.jbco_blbc ^
  -t:9:jtp.jbco_gia ^
  -t:9:jtp.jbco_adss ^
  -t:9:jtp.jbco_cae2bo ^
  -t:9:bb.jbco_cb2ji ^
  -t:9:bb.jbco_rds ^
  -t:9:bb.jbco_riitcb ^
  -t:9:bb.jbco_iii ^
  -t:9:bb.jbco_plvb ^
  -t:9:bb.jbco_rlaii ^
  -t:9:bb.jbco_ctbcb ^
  -t:9:bb.jbco_ecvf ^
  -t:9:bb.jbco_ptss ^
  -main-class Authentication ^
  Authentication

Here are the messages emitted by the confused decompiler:

Couldn't fully decompile method main
Couldn't resolve all exception handlers in method main
Couldn't fully decompile method $S5
Couldn't resolve all exception handlers in method $S5
Couldn't fully decompile method I1l
Couldn't resolve all exception handlers in method I1l
Couldn't fully decompile method <clinit>
Couldn't resolve all exception handlers in method <clinit>

And here is the source it produced:

import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;

public class Authentication
{

    public Authentication()
    {
    }

    public static void main(String args[])
    {
_L3:
        args = args;
_L1:
        JVM INSTR pop ;
        return;
        if(!S$5)
        {
            args = $S5(S5$);
            if(!I1l(S5$, args))
                throw new AssertionError();
        }
        JVM INSTR pop ;
        if(!S$5)
        {
            args = $S5(I1l);
            if(I1l($$5S, args))
                throw new AssertionError();
        }
        JVM INSTR pop ;
          goto _L1
        args;
        if(true) goto _L3; else goto _L2
_L2:
        args;
        if((byte)0x2073a663 % 3 == 0)
            break MISSING_BLOCK_LABEL_98;
        null;
        throw ;
        throw args;
    }

    public static byte[] $S5(String s)
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
_L2:
        StringBuilder stringbuilder;
        I1l(stringbuilder, stringbuilder = main($S5));
        stringbuilder = I1l(stringbuilder);
        return stringbuilder;
        stringbuilder = JVM INSTR new #94  <Class StringBuilder>;
        stringbuilder.StringBuilder();
        stringbuilder = I1l(s, stringbuilder);
        stringbuilder = main(I1l($$$S, stringbuilder));
        stringbuilder = main(l1I, stringbuilder);
        if(true) goto _L2; else goto _L1
_L1:
        throw ;
    }

    public static boolean I1l(String s, byte abyte0[])
        throws UnsupportedEncodingException, NoSuchAlgorithmException
    {
        if(main($S5(s), abyte0))
            return SS$5;
        JVM INSTR pop ;
        I1l(I1I, System.out);
        ll1.booleanValue();
        JVM INSTR ifge 39;
           goto _L1 _L2
_L1:
        break MISSING_BLOCK_LABEL_37;
_L2:
        break MISSING_BLOCK_LABEL_39;
        null;
        throw ;
        return ____;
    }

    public static boolean $S5(Class class1)
    {
        return class1.desiredAssertionStatus();
    }

    public static StringBuilder I1l(String s, StringBuilder stringbuilder)
    {
        return stringbuilder.append(s);
    }

    public static String main(StringBuilder stringbuilder)
    {
        return stringbuilder.toString();
    }

    public static byte[] main(String s, String s1)
    {
        return s1.getBytes(s);
    }

    public static MessageDigest main(String s)
    {
        return MessageDigest.getInstance(s);
    }

    public static void I1l(byte abyte0[], MessageDigest messagedigest)
    {
        // Placing a breakpoint here would reveal the salt string
        messagedigest.update(abyte0);
    }

    public static byte[] I1l(MessageDigest messagedigest)
    {
        return messagedigest.digest();
    }

    public static boolean main(byte abyte0[], byte abyte1[])
    {
        return Arrays.equals(abyte0, abyte1);
    }

    public static void I1l(String s, PrintStream printstream)
    {
        printstream.println(s);
    }

    static final boolean S$5;
    public static Boolean ll1;
    public static boolean ___;
    public static int ____;
    public static int SS$5;
    public static String $$$S;
    public static String l1I;
    public static String $S5;
    public static String I1I;
    public static String S5$;
    public static String I1l;
    public static String $$5S = "foo";

    static 
    {
          goto _L1
_L3:
        Boolean boolean1;
        S$5 = boolean1;
        return;
_L1:
        I1l = "bar";
        S5$ = "pazzw0rd";
        I1I = "Wrong password";
        $S5 = "SHA";
        l1I = "ISO-8859-1";
        // JBCO does not encrypt strings
        $$$S = "Add-Some-Salt";
        SS$5 = 1;
        boolean1 = JVM INSTR new #68  <Class Boolean>;
        if((byte)0x3bbe5594 % 3 == 0)
            break MISSING_BLOCK_LABEL_61;
        null;
        throw ;
        boolean1.Boolean(true);
        ll1 = boolean1;
        if(!$S5(Authentication))
        {
            boolean1 = 1;
            continue; /* Loop/switch isn't completed */
        }
        JVM INSTR pop ;
        boolean1 = 0;
        if(true) goto _L3; else goto _L2
_L2:
        throw ;
    }
}

Reverting the above to a piece of Java source resembling the original Authentication.java is a very time-consuming task. But that is not necessarily what an attacker might want to achieve. Running the application under a debugger with a breakpoint set on MessageDigest.update() may reveal enough information about the password encryption scheme used in this fictional app.

Note also that JBCO does not encrypt strings.

Impact of Flow Obfuscation on Performance

You may now wonder what might be the degree of impact of such extensive transformations on application performance. So did I, therefore my next step was running a well-known benchmark suite through JBCO.

I have selected the SciMark 2.0a benchmark. It measures the performance of numerical computations typically found in scientific and engineering applications. These are the types of applications one may wish to protect against decompilation.

Another good thing about SciMark is that it validates the result of each test, which is useful for checking whether the transformations made by the obfuscator preserved the semantics of the original code. (Strictly speaking, I also had to disable obfuscation of the validation code so that it could serve as 100% proof.)

Update 09-Jul-2014: I've updated to the latest version of JBCO and re-run the tests on the Oracle JRE 7. (JBCO seems to be non-interoperable with Java 8.)

Here is the JBCO command line that I used:

java -Xmx384m ^
  -cp sootclasses-2.5.0.jar;polyglotclasses-1.3.5.jar;jasminclasses-2.5.0.jar;. ^
  soot.jbco.Main ^
  -cp .;scimark2lib.jar;"%JAVA_HOME%\jre\lib\rt.jar";"%JAVA_HOME%\jre\lib\jce.jar" ^
  -t:9:wjtp.jbco_cr -t:9:wjtp.jbco_mr -t:9:wjtp.jbco_fr ^
  -t:9:wjtp.jbco_bapibm -t:9:wjtp.jbco_blbc ^
  -t:9:jtp.jbco_gia -t:9:jtp.jbco_adss -t:9:jtp.jbco_cae2bo ^
  -t:9:bb.jbco_cb2ji -t:9:bb.jbco_rds -t:9:bb.jbco_riitcb ^
  -t:9:bb.jbco_iii -t:9:bb.jbco_plvb -t:9:bb.jbco_rlaii ^
  -t:9:bb.jbco_ctbcb -t:9:bb.jbco_ecvf -t:9:bb.jbco_ptss ^
  -app jnt.scimark2.commandline >log 2>err

SciMark reports measurement results in terms of scores. A higher score is better. The original, non-obfuscated version produced the following output on 64‑bit Oracle HotSpot Server 7 Update 55:

SciMark 2.0a

Composite Score: 800.1861429549165
FFT (1024): 489.95521101714513
SOR (100x100):   779.5125051960619
Monte Carlo : 340.00691152439737
Sparse matmult (N=1000, nz=5000): 809.8362710126169
LU (100x100): 1581.6198160243612
   .  .  .

Compared to that, the obfuscated version is slow like a worm:

SciMark 2.0a

Composite Score: 243.4813684046077
FFT (1024): 126.6880218231721
SOR (100x100):   549.6164050131032
Monte Carlo : 54.273241127590865
Sparse matmult (N=1000, nz=5000): 286.80964716134747
LU (100x100): 200.01952689782485
   .  .  .

The slowdown ranges from 1.4x for the SOR test to 7.9x for LU. The composite score is 3.3x lower for the obfuscated version!

In fact, this is a huge improvement from J2SE 5.0, on which I had run the tests for the original version of this article. Back then, the slowdown factors ranged from 3.3x for the Monte Carlo test to over 30x for SOR and LU, and the composite score was 22.8x worse for the obfuscated version.

On the one hand, this means you have to be careful when obfuscating performance-sensitive code. On the other hand, if flow obfuscation has little or no impact on performance, it may be an indicator of obfuscation weakness. What an optimizing compiler such as HotSpot may figure out, may also be figured out by a person of ordinary programming skills, especially if equipped with something like Understand for Java.

All that being said, I claim again that ahead-of-time compilation to native code is way better than flow obfuscation, and invite you to try Excelsior JET, a Java SE 7 compilant JVM with AOT compiler that my company makes. What else would you have expected to find at the end of an article on a vendor site, anyway?

An open-source alternative to Excelsior JET is GNU Compiler for Java (GCJ). GCJ supports more platforms, but is way behind in terms of standard compliance, and has enjoyed little activity after the OpenJDK announcement. Please refer to my other article for a head-to-head comparison.

Let me reiterate that AOT compilers are interoperable with Java code protection tools that do not rely on the protected application remaining in the bytecode form. If you are totally paranoid For maximum protection, use such tools to obfuscate class/field/method names and encrypt strings before native compilation.

I update this article regularly, so if you have any comments or questions, or know of any resource/tool URLs which I should have added, please send them to me.

Was the above article useful? If yes, we have more content for you!

Check out other articles written by Excelsior staff members:

Home | Company | Products | Services | Resources | Contact

Store | Downloads | Support | Forum | Blog | Sitemap

© 1999-2014 Excelsior LLC. All Rights Reserved.