When you are developing exploits, there are times when you will come across restrictive character sets, which may limit the characters that you are able use in your shellcode.
Fortunately, there are ways to overcome this restriction. You are probably familiar with the msfvenom tool that comes with the Metasploit Framework. This tool allows you to use various encoders to mitigate restrictive character sets and obfuscate your shellcode. There are also many custom encoding scripts and tools available online if you search around.
The goal of this post is to explain one method of manually encoding shellcode by hand so that you understand what is going on, instead of relying on a tool to do the work.
Below I will detail one approach to circumventing character set limitations by using SUB instructions to “carve” the desired shellcode to a location in memory.
I have intentionally left out some details in this guide to allow for you to do a bit of research and testing so that you can learn from the exercise.
Preparing a register to perform math operations
The process of encoding shellcode using SUB instructions by hand can be tedious, but after doing a few rounds of the encoding it becomes much quicker.
Zeroing a register to use for the math
First you need to zero out a register to use for the SUB math. Typically EAX is used for this, but you can use whatever you want as long as you stay within character and size restrictions and the register isn’t storing anything that you need.
The XOR method is simple. You simply XOR a register with itself. For example,
This method is great if the required bytes are within your good character set.
This method is a little more difficult, but it’s not too bad. For this to work, you have to take two 4-byte values that equal zero when you AND them both. If you aren’t familiar with logical operations, it’s worth taking some time to read about them. A starting point for research would be truth tables for AND, OR, and XOR.
Here is how AND logic works:
1 AND 1 = 1
1 AND 0 = 0
0 AND 1 = 0
0 AND 0 = 0
If both values equal 1, then the result is 1, otherwise the result is 0.
The following example should make this clear:
The first and second digits are both ones, so the value equals 1, while the rest result in a zero.
With an understanding of how AND logic works, you can find two 4-byte values to AND that will result in 00000000 in your desired register.
Here is an example of some values to test:
Take these each of these values and use a calculator to convert them from hex to binary.
The result will be two 32-byte values that you can AND to test. The reason you have to convert these to binary is because logical operations work with 1s and 0s.
Using AND logic from left to right, you can see that the final value is zero.
Other methods to zero out a register
There are plenty of guides on how to zero out a register, just search around! A first stop could be this StackOverflow post that I found on the first page after searching.
Now that you have a zeroed register to work with, it’s time to start the manual encoding.
First, we need some shellcode to encode.
For the example, I will use:
Breaking down the shellcode
This shellcode has to be broken down into 4-byte chunks, starting from the end. You start from the end because of how the stack works.
This breaks down to:
Now it’s time to start doing the math.
The first set of bytes that we have is \x75\xe7\xff\xe7.
Reverse the order of these bytes to get \xe7\xff\xe7\x75.
The goal here is to use SUB instructions to wrap around the register to reach the value that you want for your shellcode.
To make this clear, trying doing the following math with a calculator (make sure you are in hex mode):
As you can see, the register wraps around, which is a cool trick.
With the register at 0000000, let’s subtract 0 – e7ffe775, which is the top line of our 4-byte chunks reversed.
The result is: 1800188B.
Let’s break this down into a table to make visualizing this easy. I’ve color coded the chart for readability.
This table is designed to use three SUB instructions, but it can be modified to do more or less calculations. The key here is avoiding null-bytes in your final shellcode, so if you can do the encoding with only two SUBs, that could save you valuable shellcode space.
Column A has the values that need to be encoded. You can create a chart that works for you, but what I’ve done here is added in the bytes from 1800188B starting at the bottom of column A and worked my way to the top row. So 18, 00, 18, 8B have been populated.
Next, you have to do the math for each row. The equation is A = B + C + D. The key here is to take the first value and subtract three other values with an end goal of hitting zero.
At this point, I find it easiest to have a list of known good shellcode characters to easily reference when doing the math.
Using a calculator in hex mode, take the value in the first column, which is 8B, and start subtracting known-good characters. Unless you are great at hex math, just use a little guesswork.
The values that I subtracted from 8B are 42, 42, and 07. You can use any combination of good characters, as long as the math is correct and you use known-good characters.
8B – 42 – 42 – 07 = 0.
Now, continue doing the math for each row in the chart.
There is something very important to note once you reach the row with 00.
You can’t use 00 in your shellcode that you will be sending because it is a null character, which is a string terminator. The trick here is to change the 00 to 100 and then do the math.
So: 100 – 5E – 50 – 52 = 0.
If you look at the table above, you may notice that the final row starting with 18 seems to be off.
The math shows 18 – 15 – 01 – 01 = 01.
The reason this math works is because by changing 00 to 100 on the previous row, the extra 1 is carried down, so you have to take that into account.
The equation is as follows:
18 – 01 – 15 – 01 – 01 = 0
The first 01 is the value carried over from the row above.
You may be wondering what happens when the bottom row has 00. You still have to change this to 100 and do the math, but you can just ignore the extra 1 in this case.
Creating assembly instructions from the completed chart
The first instructions that you need are the ones used to zero out the register that is being used for the SUB math. I am using EAX.
Next, you need to convert the chart values into SUB instructions. This part is a little tricky with how I have my chart set up.
Starting from the column of you first math value, take the value in the bottom row and work your way up. This will give you a 4-byte value for the first SUB instruction.
Referencing the chart, the values in the correct order are:
55, 4E, 4D, and 41
As a hex string, this is 554E4D41. This is the value used in the first SUB instruction:
Repeat these steps for the other two columns.
Putting it all together
First, add the instructions to zero out the register. Next, add in the three SUB instructions. Finally, you need to push the value of EAX to the stack. EAX will hold the value that you started with.
You will have the following instructions:
When you execute the PUSH EAX instruction, the value will be pushed to the stack. Keep this in mind setting up your final shellcode. You may have to make some modifications to place the decoded shellcode in the correct spot.
I’ll let you figure out how to do this and why it is important.
Now that you have the first set of bytes encoded, it’s time to move on to the next set.
The next row of bytes that need to be encoded is:
As seen in the previous example, first you reverse the order of the bytes and then subtract that value from zero.
0 – afea75af = 50158A51
Next, create and populate a chart to help visualize the problem.
Now, start doing the math for each row and populate the table.
With the table populated, it’s time to start writing the SUB instructions. Remember, start with the bottom row of column B, then do column C and D.
The final results are:
If you have more shellcode that needs to be encoded, just continue this process until all 4-byte segments are complete.
When doing manual encoding, it is important to test the code in a debugger to ensure you are getting the desired results. Make sure that you understand what is happening at each step of execution, or you may have a difficult time correcting errors along the way.
The method that works for me when doing manual encoding and decoding is to simply read the instuctions that I am executing line by line and stepping through one line at a time. Paying attention to what is happening with the registers and stack pointer is critical.
Here is an example using the instructions from Example #2 above:
First, the EAX register is being zeroed out using two AND functions. Is this working properly? Does the EAX hold the value 00000000? If the answer is no, then there is a problem. Maybe you will have to adjust the values of the AND instructions, or maybe you can manually tweak the register to get the desired result.
Next, three SUB instructions are executed. After stepping through these, does EAX hold the correct value?
Finally, EAX is pushed to the stack. Is the correct value being pushed to the stack? Where is the value being pushed in memory?