November 1st, 2007



So I posted last week to jinx myself and I certainly did, the 945 performed as expected and led to further head scratching, hair pulling and numerous other indescribable acts..

So I found out that Intel chipsets have a Global Write Buffer where writes can be held, but the integrated GPU doesn't participate in any sort of coherency so writes can hide in this buffer before landing on the GPU particularly if we are using cachable CPUs mappings..

So I had to spend a bit of time with Intel getting info out of them on this GWB and how to flush its ass to memory, so I have this figured out for 915->965, and will need to check on the older hw...

Just in case anyone is wondering what I was doing:

So the integrated graphics chipsets have access via the GART to main memory, pages in the Intel gart can be snooped (cache coherent) or unsnooped (main memory). However the snooped pages are restricted in their uses and I had a lot of crashes with them even within what the docs claimed were valid use cases (particularly with the blit engine). So I wanted to avoid using these snooped objects but still retain the cached memory option. Using uncached memory sucks as you have to allocate a big chunk of RAM and change all the page table bits and flush the tlb on all processors causing an interprocessor interrupt causing a stall, causing pain slowness and bad things...). Trying to do things with smaller objects and uncached memory ends up in large overheads of ipis on SMP machines... however with a lot of objects we aren't sure whether we want them cached/uncached at startup (using uncached objects from the CPU sucks really badly, using cached objects from the GPU sucked worse).

So I added support for objects that are cached to the CPU but uncached on the GPU, with the kernel flushing the caches and later the chipset buffers when I discovered their existence. I've heard certain AMD CPUs could cause me trouble (Intel only solution so far as the docs for certain chipset aspects were very necessary), and I'm hoping Intel take note and don't implement something as bad as the AMD agressive caching stuff. (

Why did Red Hat pay me to do this? we require fast shareable objects for compositing desktop pixmaps to talk to compiz

What about with non-integrated GPUs? Well AGP will cause more fun, AGP is also non-cache coherent. PCI and PCIE GPUs are meant to be cache coherent so we should be able to avoid a lot of the issues on those.

What about AMD integrated GPUs? I've no idea, and considering how much effort Intel have put in helping me, I'm guessing it would require a good bit of help from their engineering ppl.

glxgears on the powerpc g5 + nouveau

So this happened earlier

07:03 < IronPeter> hi.
07:04 < IronPeter> I have stable vertex and fragment processing on RSX. Both in
big and little endiannes.
07:04 < darktama> what was the fix?
07:05 < IronPeter> the data for frag progs must be prepared in ths next way: to
words in dword must be swapped.

So darktama put the code into the DDX for nv40 composite on powerpc which we had disabled and I tested it on getting home this evening and it worked!!

So I added code to Mesa to do the same thing for nv40 fragment programs, and gears suddenly got all its colors!!!

Bigups to the PS3 RSX project who despite the best attempts of the hypervisor are getting some basic 3D stuff going on the PS3 nvidia chip :-)