I'm wondering, for example, if the branch predictor could take the page table/TLB into account, thus preventing one process from poisoning its predictions for another process.
(Also, I know bumpkiss about CPU design. I actually managed to fail a college class on it. True story.)
@joeyh
it would be fairly easy to do a hardware CLFLUSH on every context switch.
performance could be retained by windowing the caches (something like what SPARC does with registers).
but this isn't something that is a simple patch to the CPU design, we will probably have to wait for the next couple of microarchs before this is done