Your post reminds me that the original unix idea was that the whole system could easily be ported to new hardware. The first thing ported to new hardware was the compiler!
IMHO, this is what DEC and DOS for that matter delivered. A kernel minimally dependent on the actual hardware it was running on. Kinda like the idea behind retroforth also.
The conflict between simple kernel and speed seems to me to be a big part of the issue. Consider 'ls'. Does it matter whether it is written in efficient machine level code or is interpreted or whether it is part of the kernel or a stand alone program? Of course it does matter, but not very much since it isn't executed very often.
Only the inner core of the kernel makes any real difference. So as extraneous 'stuff' gets added to the kernel, the job of getting a new port running becomes VERY MUCH bigger.
So from where I sit, we need to remove as much 'stuff' as we can from the kernel itself and make such 'stuff' into loadable modules or regular programs.
I suspect the LiteBSD virtual memory capability is also needed so huge user programs can be more simply written.
Should we do this? How should we begin?
Lots of fun