November 20th, 2008

face

EXA and kernel memory manager (the DFS saga).

So I've been going over the accel in my radeon EXA code particularly how we related to buffer management.

One thing I noticed, is I've implemented a really sub-optimal DFS.

DownloadFromScreen is meant to accelerate reading back pixmaps from offscreen. However with driver managed pixmaps, all pixmaps are
considered to be "offscreen". Now my great implementation, created two scratch buffers, and executed blit commands moving parts of the pixmap into each scratch and copying it into the supplied destination pointer.

However this sucks when you think about the use cases:

1. backing BO for the pixmap is actually in VRAM - not a bad solution
2. backing BO is actually in GTT and wasn't in VRAM - really crappy solution - we end up doing a load of GTT->GTT blits followed by memcpys.
3. backing BO never used in hw - worse we bind it to GTT, blit it, copy it.

So clearly I was delusioned when I wrote the initial implementation... however it lead me to wonder what the right answer is and where in the stack to effectively code it.

What we have now is EXA->driver->driver bufmgr->kernel layers. The buffer manager abstracts away the kernel internals and lets us just reference buffer objects in the driver code without worrying about where they are located. The bufmgr/kernel don't have any acceleration functionality, except the kernel has a fast buffer copy and buffer clear functionality, used for moving buffers around for eviction etc. So the problem is the driver bufmgr and above really don't know where a buffer object is currently located, the kernel only knows for certain. As objects may be evicted by the kernel on a whim or after a suspend/resume etc.

So ideally the driver would know that pixmap is in VRAM, I'll use the cool method, but if its in GTT or never in hw I'll just memcpy it and save the overhead of doing the other stuff. However since it doesn't know where the buffer is currently, it can't really do this. I'm suspecting I might need a kernel interface to optimally copy data to a userspace pointer from wherever the buffer is currently located, but that smacks of having more acceleration in the kernel, esp since DFS allows for rectangular blits. If for future cards like r600 I don't have a 2D engine to do this with, I don't really want a load of 3D engine in the kernel.

So for now I think I'll just pull more of my hair out.