|
Was the above article useful? If yes, we have more content for you! Check out other articles written by Excelsior staff members: |
|
CONTENTS Bytecode Encryption- Straightforward But Totally Flawed Code and Data Flow Obfuscation Strengthening Protection with Ahead-Of-Time Compilation Appendices: Protect Your Java Code - Through Obfuscators And BeyondLast update: 19-Feb-2008 Reverse engineering of your proprietary applications by unfair competition or malicious hackers may result in highly undesirable exposure of your algorithms and ideas, proprietary data formats, licensing and security mechanisms, and, most importantly, your customer's data. Here is why Java is particularly weak in this respect compared to C++:
As a result, the decompilation of Java programs is a much simpler task compared to C++ and therefore may be fully automated. Class hierarchy, high-level statements, names of classes, methods and fields - all this can be retrieved from class files emitted by the standard javac compiler. Any person of ordinary skills in programming can download a Java decompiler, run your program through it and read the source code almost as if it was open source. Let's see what can be done to prevent that. Bytecode Encryption - Straightforward But Totally FlawedThe solution that first comes to mind is to encrypt the class files. However, this approach is fundamentally flawed, because the JVM may not load and execute encrypted classes. It is fairly easy to intercept the decrypted classes upon load. This technique is described in details in [1]. Moreover, as Java is now open source, one may simply download OpenJDK and patch the Classloader.defineClass() method. For the sake of completeness, I may add that the introduction of the java.lang.instrument API in Java 5 has provided hackers with one more way to circumvent any and all class file encryption mechanisms. Okay, so encrypting the bytecode makes little sense, how about making it less comprehensible? This is what obfuscation is all about - change the program so that it produces the same results when run on the JVM, but its decompiled source is substantially harder to understand. Name ObfuscationName obfuscation is the process of replacing the identifiers you have carefully chosen to your company's coding standards, such as com.mycompany.TradeSystem.Security.checkFingerprint(), with meaningless sequences of characters, i.e. a.a0(). The obfuscator must process the entire application to ensure consistency of name changes across all classes and jars. The more advanced obfuscators take one step further. As you surely know, a Java class may have more than one method with the same name if their signatures are different, Utilizing that fact, an obfuscator can rename setPos(int x, int y) and setColor(int color) to, say, a(int a, int b) and a(int a).
A nice side effect of name obfuscation is the substantial reduction
of class file size, which results in somewhat smaller downloads and
faster cold starts of desktop Java applications, and lets your cellphone
hold more of all those fancy Java ME apps
String EncryptionString encryption is another feature commonly found in Java obfuscators. Replacing string literals with calls to a method that decrypts its parameter makes the hacker's life more interesting, but unfortunately not too much. Problem is, the strings must be decrypted at run time, so the respective code must be included in the application. Moreover, the hacker even does not need to reverse-engineer that code, all he or she has to do is write a program that would call the decrypting method(s) for all the strings. Code and Data Flow ObfuscationSimply put, flow obfuscation is about modifying the program so that it yields the same result when run, but is impossible to decompile into a well-structured Java source and/or is more difficult to understand. Most code obfuscators would replace instructions produced by a Java compiler with gotos and other instructions that may not be decompiled into valid Java source. A decompiler expecting conventional javac output would either fail or produce pseudocode with lots of labels and goto statements. However, not all decompilers are that dumb. An interesting yet obscured offshoot of the Soot bytecode analysis and optimization framework, developed by the Sable group at McGill University, is the Dava decompiler project [2]. It aims at decompiling Java bytecode produced by any tool, not just the javac compiler, into readable source, so it is effectively an attempt to create a deobfuscator. (The funny part is that other people in the same group are working on a Java bytecode obfuscator called JBCO. I wonder if they hold an internal "obfuscate-decompile" tournament.) However, even if you use a code obfuscator that forces all decompilers to fail completely, a bytecode disassembler would still work. Remember that the JVM instruction set includes high-level instructions, as opposed to real CPUs such as x86 or ARM, so disassembled Java is easier to understand than disassembled C++. It would therefore make sense to also "distort" the overall structure of the program. The more advanced obfuscation techniques include class hierarchy changes, method inlining and outlining, loop unrolling, array folding/flattening, etc. Embedding a custom virtual machine into the application and translating the most sensitive methods to its instructions set is perhaps the most effective but at the same time one of the most expensive transformations. Finally, sophisticated algorithms may also be protected through mathematics transformation. The transformed code would compute the same results using different data types. However, such tools are much more expensive and often available only as part of a custom risk management solution. Another point is that such transformations may easily slow down the code by order of magnitude and beyond, so you'd better apply them only to most sensitive pieces of code, provided they are not performance-critical. Limitations:
Extra capabilitiesA few things to keep in mind when choosing an obfuscator. Incremental obfuscationIf you plan to issue updates to your obfuscated application, you have to ensure that the names of classes in the new version of your application are consistent with the version originally shipped to end users. When choosing an obfuscator make sure it can reproduce the renames made during the previous obfuscation session. Class file optimizationsMany obfuscators can optionally optimize the class files for size by removing the unneeded elements such as unused methods, fields, and strings, design-time metadata, etc. However, care must be taken when using this feature, because a method or field may also be accessed using JNI or reflection, and it is not possible to reliably detect all such accesses even by analyzing the running program. Bytecode optimizations supported by some obfuscators include constant expression evaluation, assignment of static and final attributes, inlining of simple methods such as getters and setters, peephole optimizations, and so on. However, the benefits of such optimizations are only substantial in constrained Java ME CLDC environments. The more sophisticated JVMs, such as Sun HotSpot, IBM J9, and BEA (now Oracle?) JRockit, would apply these and many other optimizations during JIT compilation. You'd better not stay in their way. Debug info obfuscationBy default, the javac compiler writes source file names and, optionally, line number information (with -g option) to the resulting class files. Those are required to get meaningful stack traces. An obfuscator may remove that information altogether, or change file names to meaningless strings. If you rely on stack traces when resolving customer issues, make sure your obfuscator comes with a reverse mapping utility that can reconstruct the original stack trace with unobfuscated names of classes and source files. Note also that certain third-party libraries and frameworks require stack trace information to function properly. One example is Apache log4j. WatermarkingSome obfuscators may embed a hidden customer or distributor ID into your class files, just like in digital media, enabling you to track down software pirates. Source code obfuscationSuppose your proprietary Java source triggers an annoying bug in your favorite IDE, and you have decided to reduce your source code to a test case. Before sending it to the IDE vendor, you may wish to run it through a source code obfuscator in order to replace identifiers with nonsense and remove comments. Strengthening Protection with Ahead-Of-Time CompilationAs you may see, all three main approaches to Java code obfuscation have certain drawbacks and limitations, and don't solve the fundamental problems listed in the introduction. Fortunately, there exist a class of tools originally developed with the goal of improving performance of Java applications. These tools are Ahead-Of-Time native code compilers, which take your jars and classes as input, compile them to optimized native code, and produce a conventional executable. Remember the C++ to Java comparison at the top of the article? Most statements from the C++ column apply to AOT-compiled Java:
This naturally leads us to the idea of a two-step approach to Java code protection:
Refer to my other article for more information about AOT compilers. Popular ObfuscatorsAn Internet search for "Java obfuscator" would return way too many results. For your convenience, I have reduced the list to a handful of actively maintained products, both commercial and free.
If you need to obfuscate your Java source, Semantic Designs' Thicket™ family of source code formatters includes a Java formatter with obfuscation capability. Drop me an email if you know of a tool worth adding to this list. As I have already mentioned above, the Sable group at McGill University has among its research projects a Java bytecode obfuscator called JBCO. It is not usable commercially, and will likely never be, but is worth looking at if you aim at Building a Better Obfuscator. Further ReadingBooksDespite its title, Decompiling Java by Godfrey Nolan has a chapter on code protection, most of which is in turn devoted to obfuscation. (That particular chapter is featured on Apress' page for that book, so I have just saved you $20 or so.) Alex Kalinovsky in his Covert Java: Techniques for Decompiling, Patching, and Reverse Engineering again mostly covers the topics listed in the book title, but has also included a chapter on obfuscation and cracking obfuscated code. By coincidence, that particular chapter is also available online - another $20+ in saves. :) Reversing: Secrets of Reverse Engineering by Eldad Eilam is not Java-specific, so it offers a broader view. In case you wonder now, I could not find a legal copy online, but, as you might expect, warez sites do carry it, so the malicious hackers waiting for your next release must have already read that book. :) Popular articlesIf you want to learn more about code and data flow obfuscation techniques and how they rank against each other in terms of potency, resilience and cost, the three-part series by Sonali Gupta, appeared in the Palisade Magazine in Aug-Oct 2005, would make a good start:
Code Obfuscation - Part 2: Obfuscating Data Structures Code Obfuscation - Part 3: Hiding Control Flows Research publicationsA group of Rowan University researchers led by Prof. Ravi P. Ramachandran has recently studied commercial Java Obfuscators. Their findings are summarized in two articles:
A Qualitative Analysis of Java Obfuscattion (Allatori, Dash-O-Pro, SmokeScreen, and Zelix Klassmaster) If you want some theory and controversy, the paper "On the (Im)possibility of Obfuscating Programs" by Boaz Barak et al, found in the 21st Annual International Cryptology Conference Proceedings, proves that obfuscation is impossible. It sparked quite some discussion and confusion, so Boaz had later written an essay explaining what that result really means, in his opinion. For more information about software protection in general, refer to Watermarking, Tamper-Proofing, and Obfuscation - Tools for Software Protection by C. Collberg and C. Thomborson, or the more recent Revisiting Software Protection by P.C. van Oorschot of Carleton University, Canada. Dozens of works listed in their References sections may keep you busy reading for many hours. Vendor publicationsPreEmptive Solutions, the maker of the Dash-O-Pro Java obfuscator, maintains a nice collection of whitepapers, presentations, articles, demos, reference information, and links related to obfuscation and software protection in general. References
Obfuscation ExamplesLet's consider a fictional application that stores user passwords as SHA digests:
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;
public class Authentication {
public static byte[] encryptPassword(String password)
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
String saltedPassword = password + "Add-Some-Salt";
byte[] digestive = saltedPassword.getBytes("ISO-8859-1");
MessageDigest md = MessageDigest.getInstance("SHA");
md.update(digestive);
return md.digest();
}
public static boolean checkPassword(String password, byte[] digest)
throws UnsupportedEncodingException, NoSuchAlgorithmException {
if (Arrays.equals(encryptPassword(password),digest)) return true;
System.out.println("Wrong password");
return false;
}
}
Compiling this class with the standard javac compiler and decompiling the resulting class file using one of the freely available decompilers yields:
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
public class Authentication
{
public static byte[] encryptPassword(String s)
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
String s1 = (new StringBuilder()).append(s).append("Add-Some-Salt").toString();
byte abyte0[] = s1.getBytes("ISO-8859-1");
MessageDigest messagedigest = MessageDigest.getInstance("SHA");
messagedigest.update(abyte0);
return messagedigest.digest();
}
public static boolean checkPassword(String s, byte abyte0[])
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
if(Arrays.equals(encryptPassword(s), abyte0))
{
return true;
} else
{
System.out.println("Wrong password");
return false;
}
}
}
As you may see, the only major differences in the decompiled source code are automatically generated names of parameters and local variables. Let's run the above sample through a name obfuscator and decompile the resulting class:
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
public class a
{
public static byte[] a(String a)
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
String s = (new StringBuilder()).append(a).append("Add-Some-Salt").toString();
byte abyte0[] = s.getBytes("ISO-8859-1");
MessageDigest messagedigest = MessageDigest.getInstance("SHA");
messagedigest.update(abyte0);
return messagedigest.digest();
}
public static boolean a(String a, byte a[])
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
if(Arrays.equals(a(a), a))
{
return true;
} else
{
System.out.println("Wrong password");
return false;
}
}
Even though the obfuscator has replaced the public identifiers Authentication, encryptPassword() and checkPassword with meaningless, overloaded a, it is clear that these methods deal with the Security API and use the SHA algorithm. The salt string is also exposed. Now, enabling string encryption makes the decompiled code a little bit more, well, cryptic:
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
public class b
{
public static byte[] a(String a)
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
String s = (new StringBuilder()).append(a).append(a.a("X|~4Nws|<Ksu!")).toString();
byte abyte0[] = s.getBytes(a.a("]FX9(-&-1d"));
MessageDigest messagedigest = MessageDigest.getInstance(a.a("E_\024"));
messagedigest.update(abyte0);
return messagedigest.digest();
}
public static boolean a(String a, byte a[])
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
if(Arrays.equals(a(a), a))
{
return true;
} else
{
System.out.println(a.a("Cgxzw5cuofh{j1"));
return false;
}
}
}
Okay, strings are now encrypted, but the import list is still there and it is perfectly clear that these methods use the Java Security API. So the hacker would still have little doubt over where to look for sensitive code, and s/he does not even need to reverse the encryption algorithm. All s/he needs is to extract the calls of the decryption method from the decompiled source:
// crack.java
public class crack {
public static void main( String[] args ) {
System.out.println(a.a("X|~4Nws|<Ksu!"));
System.out.println(a.a("]FX9(-&-1d"));
System.out.println(a.a("E_\024"));
System.out.println(a.a("Cgxzw5cuofh{j1"));
}
}
... and then compile and run the resulting code: $ javac crack.java $ java crack Add-Some-Salt ISO-8859-1 SHA Wrong password $ Voila! Let's try code flow obfuscation now. On first sight, there is not much code to obfuscate here, and code and data flows are pretty simple: just a bunch of standard API calls without any loops or exception handling. Indeed, the obfuscator I was using could only make one change after I enabled code obfuscation:
// Code ofbuscation disabled
String s = (new StringBuilder()).append(a).append(a.a("X|~4Nws|<Ksu!")).toString();
byte abyte0[] = s.getBytes(a.a("]FX9(-&-1d"));
// Code ofbuscation enabled
byte abyte0[] = (new StringBuilder()).append(a).append(a.a("X|~4Nws|<Ksu!")).toString().getBytes(a.a("]FX9(-&-1d"));
Perhaps that is just a weakness of the code obfuscation features implemented in a particular product? Indeed, it is possible to make the result of decompilation much less readable with JBCO. But before I move on, a word of caution: DO NOT TRY THIS AT HOME! or, more seriously, do not try to use JBCO in a production environment. It is a research project, and as such is aimed at enabling researchers to try their ideas. It is not meant to be scalable, robust, and well documented. Anyway, I was able to push JBCO to the limits on the original version of Authentication.class using the following command line: java -Xmx256m -cp sootclasses-2.2.4.jar;polyglotclasses-1.3.4.jar;jasminclasses-2.2.4.jar;. soot.jbco.Main -cp .;scimark2lib.jar;"C:\Program Files\Java\jre1.5.0_13\lib\rt.jar";"C:\Program Files\Java\jre1.5.0_13\lib\jce.jar" -t:9:wjtp.jbco_cr -t:9:wjtp.jbco_mr -t:9:wjtp.jbco_fr -t:9:wjtp.jbco_bapibm -t:9:wjtp.jbco_blbc -t:9:jtp.jbco_gia -t:9:jtp.jbco_adss -t:9:jtp.jbco_cae2bo -t:9:bb.jbco_cb2ji -t:9:bb.jbco_dcc -t:9:bb.jbco_rds -t:9:bb.jbco_riitcb -t:9:bb.jbco_iii -t:9:bb.jbco_plvb -t:9:bb.jbco_rlaii -t:9:bb.jbco_ctbcb -t:9:bb.jbco_ecvf -t:9:bb.jbco_ptss Authentication Here is the output produced by the decompiler:
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
public class Authentication
{
// JavaClassFileOutputException: Stack underflow
public static byte[] II1(String s)
throws UnsupportedEncodingException, NoSuchAlgorithmException
{
_L2:
StringBuilder stringbuilder;
S$$(stringbuilder, s);
stringbuilder = S$$(s);
return stringbuilder;
stringbuilder = JVM INSTR new #121
Reverting the above to a piece of Java source resembling the original Authentication.java is a very time-consuming task. But that is not necessarily what the attacker wants to achieve. Running the application under a debugger with a breakpoint set on MessageDigest.update() may reveal enough information about the password encryption scheme used in this fictional app. Impact of Flow Obfuscation on PerformanceYou may now wonder what might be the degree of impact of such extensive transformations on application performance. So did I, therefore my next step was running a well-known benchmark suite through JBCO. I have selected the SciMark 2.0a benchmark. It measures the performance of numerical computations typically found in scientific and engineering applications. These are the types of applications one may wish to protect against decompilation. Another good thing about SciMark is that it validates the result of each test, which is useful for checking whether the transformations made by the obfuscator preserved the semantics of the original code. (Strictly speaking, I also had to disable obfuscation of the validation code so that it could serve as 100% proof.) Here is the JBCO command line that I used: java -Xmx384m -cp sootclasses-2.2.4.jar;polyglotclasses-1.3.4.jar;jasminclasses-2.2.4.jar;. soot.jbco.Main -cp .;scimark2lib.jar;"C:\Program Files\Java\jre1.5.0_13\lib\rt.jar";"C:\Program Files\Java\jre1.5.0_13\lib\jce.jar" -t:9:wjtp.jbco_cr -t:9:wjtp.jbco_mr -t:9:wjtp.jbco_fr -t:9:wjtp.jbco_bapibm -t:9:wjtp.jbco_blbc -t:9:jtp.jbco_gia -t:9:jtp.jbco_cae2bo -t:9:bb.jbco_cb2ji -t:9:bb.jbco_rds -t:9:bb.jbco_riitcb -t:9:bb.jbco_iii -app jnt.scimark2.commandline (As you may see, I had to disable some of the transformations until finding a combination that does not cause JBCO to crash or hang on SciMark classes and results in emission of correct, verifiable bytecode. As I said above, JBCO is not meant to be production-ready.) SciMark reports measurement results in terms of scores. A higher score is better. The original, non-obfuscated version produced the following output on Sun HotSpot 1.5.0_13: SciMark 2.0a Composite Score: 235.0564919826097 FFT (1024): 95.00221034941461 SOR (100x100): 457.27459020598775 Monte Carlo : 40.94500403786447 Sparse matmult (N=1000, nz=5000): 217.5829952655074 LU (100x100): 364.4776600542744 . . . Compared to that, the obfuscated version is slow like a worm: SciMark 2.0a Composite Score: 10.297235073178708 FFT (1024): 4.571244265099369 SOR (100x100): 14.39229057848735 Monte Carlo : 12.282002422350816 Sparse matmult (N=1000, nz=5000): 8.355773238345918 LU (100x100): 11.884864861610085 . . . The slowdown ranges from 3.3x for the Monte Carlo test to over 30x for SOR and LU. The composite score is 22.8x lower for the obfuscated version! On the one hand, this means you have to be careful when obfuscating performance-sensitive code. On the other hand, if flow obfuscation has little or no impact on performance, it may be an indicator of obfuscation weakness. What an optimizing compiler such as Sun Hotspot may figure out, may also be figured out by a person of ordinary programming skills, especially if equipped with something like Understand for Java. All that being said, I claim again that ahead-of-time compilation to native code is way better than flow obfuscation, and invite you to try Excelsior JET, a certified Java SE 6 AOT compiler that my company makes. What else you would have expected to find at the end of an article on a vendor site, anyway? I update this article regularly, so if you have any comments or questions, or know of resource/tool URLs which I should have added, please send them to me.
|
|||||||||||||||||||||||||||||||
|
Home | Company | Products | Services | Resources | Blog | Contact | Request a Call Site: Search | Sitemap | Forum | Credits © 1999-2008 Excelsior LLC. All Rights Reserved. |
|||||||||||||||||||||||||||||||