I’ve been away from web browsers for some time now, but after the recent CSS Prime+Probe, SMASH, and Stephen Röttger’s spectre PoC I couldn’t help myself and had to spend some time catching up. Kudos to everyone involved! :)
First, I’m super happy to see that my work on replacement policies was justified, both SMASH and the Spectre PoC make super cool use of them, that’s something I’ve been expecting for some time now, and I’m pretty sure we’ll see more of these tricks to squeeze caches.
A sexy idea (probably more weird science than really practical) that I love is using synchronizing (or reset) and distinguishing sequences to encode/decode information into the cache state. In fact I think I should write a separated post about this, with error-correcting codes, caches as automatas, and assembly, but for an appetizer see Section 4 from my Flushgeist paper.
Ok, that said, let’s “cut the cod”.
I first heard about this one on twitter, and I quickly went from “WTF! this people are insane lol” moment, to “yeah… I should probably have done it”. To be honest I had thought about it, but after failing on more simple endeavours it seemed an unnecessary amount of suffering to deal with.
While speculating with some colleagues I realized that these were the same authors from 1811.07153, so I started suspecting that what they might actually be doing is a whole “cache occupancy” channel with CSS, and after reading the paper it indeed is the case. What I found a bit puzzling is that they first highlight the distinction between Prime+Probe and cache occupancy, to later call “CSS Prime+Probe” an occupancy channel (marketing? :P).
This makes things easier, as there’s no need to find eviction sets and maintain a complex backend logic with stylesheet recursion or html redirections, but the trick they use (veeeery long class names + selectors) is still pretty cool, in fact as cool as simple, so twice cool! :)
The part I liked less (and this is a personal opinion) is the use cases and the whole page fingerprinting business, just not my cup of tea. I did something similar with Loophole, which honestly took a lot of effort, and at the end there was a much simpler yet deadlier example that I completely missed. At that time the page fingerprinting seemed the way to go, but I’m pretty sure that had I thought half the time it took me to learn all that, I would had came up with something similar. Still, life works in mysterious ways, and yesterday I just finished reviewing a security-with-ML paper, so maybe it was worth it at the end.
In any case, I’m still waiting for a crazy PoC, and based on Yossi’s “yet” I hope we can see one soon. If not, it will take another brave soul to pull that off…
This reminds me, have there been any XS-leak exploiting cache (not the browser’s) attacks? :)
I just added this for completeness, to be honest there’s little of value I can say (and I already forgot most of the details after skimming through the paper anyway), but the techniques they use are extremely cool, and these exploits are reaching levels of sophistication that blow my mind.
This somehow relates to my last entry about computing with codes, maybe we should add redundancy into JITs to have VMs resistant to bit-flips ;)
performance.now() it’s something that can be done in in just a few seconds.
Shortly after Stephen’s PoC, Dougall J shared another technique to exploit Spectre in the browser with just a few general assumptions about the microarchitecture, making it very portable. In contrast, in this method the spectre gadget and probing are intertwined: there is a pointer-chasing loop containing also the spectre gadget, and this gadget will increase the performance of the loop (by speculatively fetching upcoming addresses in the linked list) based on the value of the bit to leak; the probing, as I understand it, is reduced to measure the performance (or timing) of the loop. The fundamental benefit of this approach is that we do not need to care about cache sets at all.
I think the key part to understand all this is to distinguish between the gadget (code snippet that speculatively access a secret and sends a signal) and the probing (the method used to decode this signal, usually involving time). The exciting thing is that these are just two concrete examples, and there’s plenty of similar gadgets and probing techniques to be discovered :)
That said, there’s a limitation in Dougall’s approach: it only works if the attacker fully controls the gadget. And this brings me to another point I wanted to discuss, namely cross-process Spectre attacks in the browser.
Spectacularity aside, we already knew spectre in the browser was possible, so how much should we worry? We have site-isolation (i.e., each site runs in its own separated memory-space), therefore even if Spectre turns out to be “not that hard” to exploit, the impact is an unpatchable speculative arbitrary read primitive. We are safe. Right?
Date.now() (with a noisy resolution of 1 ms), and I ended giving up.
However, after Stephen’s PoC I thought I could give it another try. In fact, why not exploit Spectre from a PDF? Interestingly, as far as I know, cross-site PDFs are still not isolated :)
Update: Jun Kokatsu pointed out that this is not really the case; PDFs are isolated in its own PDF process, they only share the viewer extension.
A_n refers to line
0 <= n < 16), we can construct a linked list as:
0_0 -> 1_0 -> 2_0 -> 4_0 -> 3_0 -> 5_0 -> 6_0 -> 4_1 -> 7_0 -> 0_1 -> 1_1 -> 4_2 -> 2_1 -> 3_1 -> 5_1 -> 4_3 -> 6_1 -> 7_1 -> 0_2 -> 4_4 -> 1_2 -> 3_2 -> 5_2 -> 4_5 -> 5_2 -> 0_2 -> 7_2 -> 4_6 -> 0_0
With that we only need to repeat
p = buffer[p], where
buffer is an
Uint32Array, to avoid indirect derefs and generate a clean enough native code to probe the cache set. I run tests on my i7-8550U and performance wise there’s little difference between using this and the original Wasm implementation.
The second problem is that we have no access to
performance.now() from inside a PDF, only to
Date.now(). This means that we need to amplify the signal an order of magnitude (by increasing the number of loop iterations). This works well enough, kudos again to Stephen, and I was able to distinguish between a cache hit and a cache miss, but unfortunately it becomes slow and unreliable. I spent some time trying to improve it, but without a more reliable and fast primitive the rest of the PoC just breaks. I think Dougall’s method would work better in this scenario.
As I didn’t want to spend too much time re-implementing everything from scratch, the next thing I looked for is a higher resolution clock in PDF. Maybe we could create a covert-channel between the parent page and the PDF (e.g., via open parameters) and simply pass the page’s
Unfortunately I had no luck either (we can pass data only once per load), but at this point I realized that this had been just a waste of time :P
PD: I commented some of Stephen’s code while modifying it.
Trojan cross-process attacks
Why do we need to implement the probing, which is the only part with high requirements, inside the PDF? We can simply embed a PDF with a spectre gadget that leaks bits through the cache, and use a regular web page with high-resolution clocks and Wasm to probe the cache an extract the information. Think of it as an Spectre trojan: we insert a simple gadget into the victim’s memory space (which in this case can contain cross-site data), and start exfiltrating. We still need some synchronization between sender (PDF’s spectre gadget) and receiver (web page probing), but this is a covert-channel problem. Make sure you read the last sections of Dougall’s post for some more insights on this!
In this specific case, it might still be possible to use Dougall’s amplification gadget and measure the delay from another context (i.e., another PDF), because all share the same main thread an we can observe the event loop contention. But in general, for cross-process attacks, we need to a way to communicate a signal without relying on shared memory, and cache covert-channels (both occupancy and set based) are probably the best option. Note that in this case L3 is preferrable to L1, as we do not need core colocation.
The remaining part is what happens when we do not control the gadget. Interestingly, we only have proper hardware countermeasures against branch injection (i.e., Spectre v2), but this is unlikely to be a thread in the browser due to its dynamic nature. Not that is going to be impossible, I just think it’s not worth it.
For other variants I’m still not sure. Can we find Spectre v1/v4 gadgets in JS, CSS, or even in native parts of the browser that are easily triggeable from another process?
postMessage API is a clear candidate, but so can be most event handlers (e.g., hash change, load, etc.). Fortunately, we have COOP and many other mechanisms to make this even more difficult. What about the host-process? Most JS APIs end up interacting with it.
There’s been a lot of research about finding Spectre gadgets in the Linux Kernel, and apparently even a real exploit triggered via an
ioctl. Likewise, researches have already explored in-process vs. cross-process gadgets for a while, namely, how to control and trigger the transient execution. Will we see similar things in the browser? Or is it just to noisy and fast changing?
What I’m sure about is that it’s not that difficult to implement an artificial cross-process Spectre PoC in JS…