Developers are often interested in learning about the software industry’s best practices, so that they can improve the robustness and efficiency of their code. The best way to learn is by reading the source code of programs that are in production and running heavy workloads.

While reading the sources are an option for open source projects, most company’s code is closed source and proprietary. They also take great lengths to avoid unauthorized access to the source code. In the event that the source code is stolen or compromised, what are the options available to prevent a hacker from reading the source code and finding loopholes or bugs that could be exploited?

One commonly used approach is to make the code very difficult to understand and trace. Even if the user tries to understand the logic it will be a time consuming process. One such process is Obfuscation.

What is Obfuscation?

Wiki defines obfuscation as “Obfuscation is the obscuring of intended meaning in communication, making the message confusing, willfully ambiguous, or harder to understand

Example of websites using obfuscated code to protect their business logic:

Example of websites using obfuscated code to protect their business logic

Is Obfuscation a synonym for Encryption?

No. Although Obfuscation and Encryption are tools that are used to hide or secure information, they are used in different scenarios. Encryption is used to lock a document with a key, which can then be opened with the key alone. In most cases, the program or virtual machine needs the code to adhere to a certain format, and it cannot read encrypted code because the format will be unreadable to the machine. Hence obfuscation is the technique used to jumble code into a format that is readable by the program or VM, but hard to read by a human. An execution engine can directly execute any obfuscated code but encrypted code needs to be decrypted before execution. Encryption vs. Obfuscation

How is Obfuscation done?

Obfuscation can be done on any languages. A code can be obfuscated by different approaches. Some of them are

  1. Changing the Identifier name
    1. Using lots of similar identifiers
    2. Using characters and digits that are easily misinterpreted like capital o’s and Zero like (O,0)
    3. Using shorter names for classes, methods, and fields
  2. Adding Noise to the code
    1. Adding wrong comments to confuse
    2. Adding unnecessary codes to the logic
  3. Intends and spaces are used to obfuscate the code even in an artistic way as shown below
How Obfuscation is done
Source: mtgomes.files

Tools used for Obfuscation:

A code can be converted into an Obfuscated code manually or by using tools that are available in the market. Depending on the language different tools are available.

Few of the commonly used obfuscator tools (Both Server Side and Client Side Languages) are:

Java                       : ProGuard, JODE, JavaGuard

C / C++                 : Stunnix

PHP                        : PHPProtect

JAVASCRIPT       : jsscrambler

VBA                       : CrunchCode

Example Hello world Program in C

Code before Obfuscation

Example Hello world Program - Code before Obfuscation
Source: A Simple Introduction to Obfuscated Code





After Obfuscation

Example Hello world Program in C - After Obfuscation
Source: A Simple Introduction to Obfuscated Code




The above program is a very basic example of obfuscation.

Things to remember before obfuscating the code:

  • Not 100% secure – at some point with time and ample patience one can reverse the logic written
  • Cost of reverse engineering is greater than cost of developing the logic from scratch – Generally not recommended

This blog is written by Vishnu Narayanan from BRIDGEi2i