A page load is possibly done by DMA via P-Box and M-Box of the chip set, without the involvement of the CPU. The bottle neck of such an approach could be serving multiple cores. What if all of them have a page fault at once, and need a page load. Not sure whether the P-Boxes and M-Boxes can work in true pallel. So maybe the future isn’t that bright as a I thought.
But who knows what future chip set and memory designs will deliver. The cores are already in competition to normal memory access, and can only sustain performance through the faster memory caches. So a normal memory access is already a kind of page fault. So maybe nevertheless the trend is towards no difference between on board memory and peripherial memory.
See also, nice picture on page 9 (1.1 Introduction):
Xeon Processor Series Uncore Programming Guide
Intel Xeon Processor 7500 Series Block Diagram