Home
airlied
airlied
:.:..:.
Back Viewing 0 - 20  

So we've had an -ati DDX with KMS support in a branch for quite a while, but it was starting to grow into a very big mess, and some of the hacks in it were quite unmaintainable.

So started cleaning it up and pushing the bits to master.

Step 1 was adding macros in all the places in the accel code to abstract away the different command submission methods, and
add some ifs to do KMS specific things. Once this was done in theory the accel code wouldn't functionally regress and wouldn't require any more changes.

Step 2 was bringing over the kms and DRI2 support files from the branch.

Step 3 was making the decision between if (kms) if (!kms) blocks all over radeon_driver.c in all the various functions or having a nearly completely separate KMS/DRI2 driver file. The original code has the if approach and it was an unmaintainable nightmare, so I opted for approach 2 and it definitely is the best. At driver probe time in radeon_probe.c, I now do the KMS check if the pciaccess probe is called (kms without pciaccess is probably not going to matter). If I get KMS supported the driver picks a nearly completely different set of functions for PreInit/ScreenInit etc. I now have a separate radeon_kms.c file which has all the DDX interface in it. Of course if we have any changes they may need to be done in two places, but its a lot cleaner than it was in the other codebase.

I also ported the KMS code to use the libdrm_radeon buffer management code which is shared with mesa, instead of the DDX having its own buffer manager code base. This code is well tested via mesa, however it hasn't got all the features/optimisations that I've added to the DDX bufmgr over the last while.

So what's left:
Missing optimisation from old buffer manager:
1. buffer in VRAM? has the buffer ever explicitly been validated to VRAM, this allow for an optimisation on download from screen.
2. Download from screen, with driver pixmaps we don't know if the buffer is in VRAM or GTT, with (1) we can either blit if in VRAM, or just memcpy if in GART.
3. force a buffer to validate in GTT or force a buffer to stay in fast CPU access space - this was really useful some sw fallbacks where a buffer would end up in VRAM and then get used by the CPU from there. It probably only really takes the place of fixing EXA properly so the pixmap scoring is separate from the offscreen memory, and having a XA that works with driver pixmaps a lot better.
4. bugs and crashes I appear to be hitting a realloc crash at some point where glibc reenters itself and fails.

So there is a kernel conference on in Brisbane next month, being run by Sun.

Now when this was announced initially it was proposed as an all kernel hackery type get together for folks in the region, not matter which kernel they cared about.

(I did propose to talk about kernel graphics at this and got refused - so maybe I'm just being bitchy, however this is my blog).

20 speakers break down as follow:
12 Sun
1 Intel - Sun manager
1 RH
1 OpenBSD
1 FreeBSD
4 misc.

2 of the misc talks are in some way OSOL related,

the OpenBSD talk is about networking and pf, the RH talk is about security, FreeBSD about storage.

Now really if you aren't into OSOL or ZFS (3 slots OSOL FS related), why would you go. This conference is
local to me and I still couldn't justify paying the signup fee/taking the time to my manager at all. Now if
one of the main kernel and X.org hackers who lives in Brisbane can't be bothered to go, I do wonder
why anyone who isn't into OSOL kernels might be tempted.

There was talk of a .au unconference at one point, which maybe when the whole swine flu escapade is over might
actually be a useful meetup for the aussie open source community.

Okay radeon TTM/KMS has landed in Linus tree under staging.

To enable it you need to enable CONFIG_DRM_RADEON_KMS, which relies on CONFIG_STAGING being set.

please read the CONFIG_STAGING warnings, esp the
"Please note that these drivers are under heavy
development, may or may not work, and may contain userspace
interfaces that most likely will be changed in the near
future."

Now to get a userspace that can use this code you need to get

git://git.freedesktop.org/git/mesa/libdrm master branch
and build it with --enable-radeon-experimental-api and install that.

git://git.freedesktop.org/git/xorg/driver/xf86-video-ati kms-support branch
build that second

git://git.freedesktop.org/git/mesa/mesa.git master branch
build this with libdrm_radeon somewhere that pkgconfig can find it.

You should either have KMS + DRI2, or a pile of smoking trash.

Please report any mode type issues on #radeon or dri-devel mailing list.

If you can't compile or configure your system to use this please wait until you have a distro do it for you.

If you are using Fedora 11, grab the latest xf86-video-ati from koji and all you need is the new kernel bits.

Known issues:
My DDX reports something about DRM 2.0.0 and wanting 1.2.x or something like that, you messed up setting up the DDX or are
still using the system DDX.
Xv might be broken on resize (or normally)
r600/r700 doesn't work (no surprise its not ready yet)

So I merged the rewrite of the radeon/r200/r300 to the mesa master today.

On a non-KMS DRI1 system we are hoping these patches are mostly regression free in terms of lockups and possibly fix a fair few lockups thanks to Maciej and Jerome's work.

Of course on a KMS/DRI2 system, this work enables GL on r100-r500 families of radeon cards and we hope to add more features to the ones we gain. We should now at least have FBOs where we never did before (in DRI2/KMS mode).

R600 work is ongoing in a branch still so this announcement has no effect on r600 or above.

I've also sent Linus a drm pull request with all of Intels drm-intel-next branch + all the patches I had sitting in my queue.

Early next week I'll send him the KMS patches for radeon, which will be in the radeon driver but the enable switch will be hidden in staging, while we stabilise it, since its such a huge amount of code. So don't expect miracles out of it anytime soon.

On a PCI ID table comparison Intel KMS have 28 IDs to support, radeon KMS has ~350, granted this code has no memory manager for r600 and up yet (kms code sohuld all be there), so its probably close to 200 GPU variants (+/- 50 for ids not seen in the wild maybe).

So Gia, Isabel and I are off to Ireland for 3 weeks from the 7th -> 31st of May.

Radeon KMS is all I've been doing lately and my feeling is the F11 driver is in a fairly good state, a lot better than F10. 3D seems to be working okay, however we have some speed regressions that I will be attempting to track down in earnest on my return. AGP cards got hit a bit lately as a fix for a real bug slowed an optimisation I previously made down.

For mesa I'd really like to merge radeon-rewrite to master already, but I expect I should wait until I get back as I need to be more reactive to fixing regressions in it.

So I've been slowly getting radeon-rewrite into a useful state, I diverted myself on Friday evening to take a quick look at FBOs on KMS/DRI2. I'd already layed down most of the basis for FBO code so I thought it might not take too long to get working.

So today I merged the FBO code into the radeon-rewrite branch, its not perfect but its a good place to start, its only enabled for KMS/DRI2 setups of course.

So there are a few things left to get radeon-rewrite into master:

1. Fix hangs on r100/r200 that some people have reported. I'm not having a huge amount of success in reproducing this, but I'll try and give it a larger effort asap. Its hanging in the buffer manager aging code somewhere. It smells like a GPU hang but its hasn't actually hung, killing the app, and starting it again, keeps things going.

2. r200 depth buffer is broken.

3. DRI2 single buffered rendering isn't drawing to the window where it should.

4. Have to run piglit regression tests on r100/r200/r300/r400/r500, under current mesa master, radeon-rewrite, radeon-rewrite DRI2 with swtcl and hwtcl paths. So I think it works out around 30 tests cases to try, and then fixup the regressions!!

5. lots of other things I've forgotten.

Post merging, I need to add support for two things back:
1. Color buffer tiling under KMS/DRI2. Works fine under DRI1.
2. Texture tiling under *.

previous quote by radeonhd devloper: "And this is real cutting edge stuff, we're doing things no-one else is doing"

mailing list today:
"A question the for the radeonhd developers: Why are you not using the register access routines provided in compiler.h for the purpose? By not using those you break portability."

These two things may not be equal.

In order to further Red Hat's commitment to KMS on radeons, I've been adding support for a bufmgr to all 3 radeon 3D drivers.

The bufmgr essentially abstracts the lowlevel buffer management from the state machine of the driver, so we can implement a simple userspace buffer manager which operates on the current system, and a memory managed buffer manager which operates on the GEM kernel API we defined for radeon.

Jerome Glisse and Nicolai Haehnle wrote the initial r300 bufmgr, texturing and mipmap tree code. I decided that instead of trying to port it to r100/r200, I took the opportunity to merge as much of the common code in radeon/r200/r300 drivers.

So armed with piglit to fight regressions and a lack of sleep, I started merging.

So I've pushed the results to the radeon-rewrite branch of mesa, it works for me on all the radeons I've played with in legacy mode.

My future plans for the codebase are:
1. get libdrm_radeon on modesetting-gem autodetected
2. Implement radeon/r200 userspace clear code. This is in-kernel at the moment but kms needs it out.
3. Play with DRI2 - should already be supported on KMS/GEM stack. I suspect I need to fix some state handling for this.
4. make it go fast on kms - work with compiz and a few games for F11.

Oh I forgot to mention the best bit:
100 files changed, 11289 insertions(+), 15487 deletions(-)

4000-5000 less lines of code ftw. I have a few more license headers in new files :)

(meant to post this pre-baby)

When I originally started to work for Red Hat I relocated to Brisbane so I could be based in an office and for other personal reasons. I was the first kernel/X.org/OS engineer based in the Brisbane office and the rest of the X.org team was based in Westford, MA mostly. I didn't think at the time this would change much, I mainly wanted to have the office as an option to work in as I can get demotivated working from home, changing scenery helps.

When RH moved offices in Brisbane last year, a new lab was built in the new office, and I got assigned a nice desk + rack + remote power + remote KVM over IP. This has proven really useful for doing development on all my different machines.

So mid-last year RH desktop team got the opportunity to hire Peter Hutterer who was based in Adelaide at the time, and we persuaded him to relocate to Brisbane once he finished a few things in Adelaide. He finally relocated a couple of months ago and is hacking lots on getting X.org input to a better place.

Late last year we also had a replacement position open up in the X group, and after some searching, Ben Skeggs, one of the top two nouveau developers (nvidia reverse engineering project) was interviewed and accepted the position. Ben was based in Tasmania, and agreed to relocate to Brisbane and started this week in the office. Hopefully this will mean good things for Fedora + nouveau development.

Its nice to have a team in one place for a few reasons,
a) people to talk technical to in the same timezone!!
b) easier to keep a hw library in one place, since we mainly do hardware enablement on Intel/AMD/nvidia hw having access to as many cards/systems as possible in one place very winning. With having more people in the office we can work on getting more hw directly shipped here and build up a better development environment.
c) talking technical can involve drinking.
d) easier to attract more people to the sunniest place to work :)

update from between the baby naps :)

So back in 2002 while trekking in Nepal I got a dog bite from a possibly rabid dog, hilarity ensued (well actually scaryness), recently the story got picked up again by an Irish medical communications company, and has now appeared in the Irish Evening Herald and Longford Leader with at least one other paper possibly running an article.

http://www.herald.ie/national-news/city-news/warning-after---irish-trekker-in-rabies-ordeal-1587917.html
http://www.longfordleader.ie/news/Longford-man-relives-rabies-horror.4848488.jp

update: rapid->rabid :-)

Isabel was delivered this morning, at 11:54am, at 8lb13oz (4.07kg).

I'll be slow on anything non-baby related for a while :)

AMD today pushed the initial code to support acceleration on the r600/r700 range of GPUs.

This consists of r6xx-r7xx-support branches in the drm, radeonhd and a new r600_demo repo.

This code is really only for developers at this point but its great to see AMD finally get things lined up to allow this code to be released.

I've only been barely involved in the r600 code so far, I wrote the original drm over a few days and handed it over to AMD to continue on with, as I wanted to concentrate on the kms work. Hopefully I can get some time to look at it over the next while (yeah right, new baby still not here).

So I sent a drm pull request that includes the kernel modesetting core + intel i915 driver supporting it.

This is a major milestone for a project I started working on in a previous job, and I barely remember burning through the initial code for the initial prototype in a week of little sleep.

To enable the code you need to set the CONFIG_DRM_I915_KMS, this isn't enabled by default, as we don't have a userspace that supports it available yet for general consumption. If you enable kms now, you will more than likely get a broken X for your trouble as the kernel drivers aren't compatible with having userspace drivers trample the hardware.

So where is ATI at?

although we are shipping radeon code in F10, the code is based on the TTM memory manager, which isn't really in an upstreamable
state in its current form. Hopefully a newer TTM codebase might become available that can be used upstream. If that doesn't happen, we might rearchitect the core memory manager code of the radeon system now that we have the API mostly proven. So I'm not sure when we will upstream it, it all depends on how much time I can work on it.

Baby status: due today, no sign yet, will severely impact amount of time I spend on this stuff :)

So we merged r500/r600 via atombios TV-out support, but we haven't enabled it by default.

You need to add

Option "ATOMTvOut" "true" to xorg.conf to enable it.

We'll hopefully get to fix up some of the remaining corner cases and issues or enable it at
least on a card by card basis.

Okay I've just spent a few days doing output enablement on -ati, since I got an rv730. I decided to try and get tv-out up and going again.

I've pushed a branch to the main xf86-video-ati repo called "atom-tvout"

Please test this on any r500 -> rv7xx cards if you are interested in TV-out support on this range in open source drivers.

It mostly uses atombios to set the cards up, with one exception for the scaler on the R500 cards which I'm discussing with ATI.

I've also added and tested DCE3.2 output support to master in the past day or two, I just need to add the official PCI IDs for all the rv7xx variants.

As Alex mentioned http://www.botchco.com/agd5f/?p=33 the radeon community developers have been tearing *pardon the pun* along making radeon do cool things on top of the cool things it already does!!

Alex came up with an idea a while back to try and get tear free EXA and Xv rendering. Tear free Xv means we hopefully can get textured video on (r100->r500) without ugly tears in the middle. Pierre Ossman took on the proof-of-concept Alex hacked together and spent some time making it all work properly. While he was there he picked up the r3xx/r4xx bicubic shader two of the other developers (Maciej and Corbin) had started but never had time to finish debugging, and pushed it forward so it works on those chips.

Its always good to know there are developers out there scratching itches and that with a small bit of input from Alex or myself can push the project to do what they want. There have been many accusations levelled at the radeon codebase over the years but the number of really useful community contributions has risen a lot lately, so it can't all be bad!!

So if you want tearfree Xv video on your r300/r400/r500, radeon is now the only driver that can provide it.

No we don't have any r600 support for Xv yet, stay tuned hopefully :)

Okay I'm hoping I've found the bug that was causing KMS + r300 AGP systems to be so buggy.

I've just found a couple of bugs in the AGP codebase that might mean I can reenable AGP mode by default
instead of the PCI fallbacks. you can try radeon.agpmode= 1,2,4,8 to enable AGP speeds, or -1 to fallback.

I've just kicked kernel-2.6.27.7-132.fc10 into koji so it might help a few people.

Also AGP r500/r600 cards are an abomination, they are just a PCIE chip with an AGP->PCIE bridge on them, the old
AGP fallback code used to hit PCI fallbacks but it should use PCIE codepaths on these chips by the looks of it.

So I've also fixed that.

So I've been going over the accel in my radeon EXA code particularly how we related to buffer management.

One thing I noticed, is I've implemented a really sub-optimal DFS.

DownloadFromScreen is meant to accelerate reading back pixmaps from offscreen. However with driver managed pixmaps, all pixmaps are
considered to be "offscreen". Now my great implementation, created two scratch buffers, and executed blit commands moving parts of the pixmap into each scratch and copying it into the supplied destination pointer.

However this sucks when you think about the use cases:

1. backing BO for the pixmap is actually in VRAM - not a bad solution
2. backing BO is actually in GTT and wasn't in VRAM - really crappy solution - we end up doing a load of GTT->GTT blits followed by memcpys.
3. backing BO never used in hw - worse we bind it to GTT, blit it, copy it.

So clearly I was delusioned when I wrote the initial implementation... however it lead me to wonder what the right answer is and where in the stack to effectively code it.

What we have now is EXA->driver->driver bufmgr->kernel layers. The buffer manager abstracts away the kernel internals and lets us just reference buffer objects in the driver code without worrying about where they are located. The bufmgr/kernel don't have any acceleration functionality, except the kernel has a fast buffer copy and buffer clear functionality, used for moving buffers around for eviction etc. So the problem is the driver bufmgr and above really don't know where a buffer object is currently located, the kernel only knows for certain. As objects may be evicted by the kernel on a whim or after a suspend/resume etc.

So ideally the driver would know that pixmap is in VRAM, I'll use the cool method, but if its in GTT or never in hw I'll just memcpy it and save the overhead of doing the other stuff. However since it doesn't know where the buffer is currently, it can't really do this. I'm suspecting I might need a kernel interface to optimally copy data to a userspace pointer from wherever the buffer is currently located, but that smacks of having more acceleration in the kernel, esp since DFS allows for rectangular blits. If for future cards like r600 I don't have a 2D engine to do this with, I don't really want a load of 3D engine in the kernel.

So for now I think I'll just pull more of my hair out.

1. whatever corner case you never expect could ever happen - it will 5 secs after release (this is a generic sw lesson)

2. X does rendering without the vtSema, including hw calls. So if you invalidate the 3D state flags in EnterVT its too late, X has already sent command to the card without resending the state. So invalidate your state in LeaveVT as well.

3. Kernel memory management is a messy problem. The GPU has a finite amount of addressable memory it can use. On modern GPUs, this is either a single GART (like Intel) or VRAM + GART (like everyone else). So userspace applications like X or 3D apps, submit command buffers to the kernel, and along with a single command buffer there is a list of referenced data buffers. These data buffers can be pixmaps src/dst/mask, textures, vbos, fbos whateva. The userspace gives the kernel this list along with acceptable placement parameters. (GEM API uses a set of read domains, and a single write domain). If the buffer is to be written to it needs to end up in the write domain, for reading it might be acceptable in a few places. So on my radeon driver for example it is acceptable to read the buffer from either GART or VRAM, but only write to VRAM buffers. So when the kernel gets this list of buffers it tries to fit them in as well as it can. Now if the kernel can't fit these buffers in, we are in a bind. The naive person would just say fallback to software, and I spit on them. Because building the command buffers happens over time, the operation list and codepaths that generated the command stream have all been done. So we can't just go back and redo the operations in software fallbacks. So we run into the problem that userspace cannot reference more buffers in the command stream than the kernel can relocate into memory at once.

Rule 1: The kernel cannot fail to complete the command stream under any circumstances.

The two ways around this - are insert places in the command stream where its legal to break it up or have userspace flush the command stream when it hits a limit on the buffers.

For radeon I've done the latter. At startup the kernel tells X the amount of dynamic VRAM and GART it has to play with.

Then before each operation in the 2D driver, I sum up all the buffers it references for read and write, and then compare the write with the amount of VRAM and read with the amount of GART. If a single operation cannot fit at all in VRAM/GART, then sw fallback it. If the current op + total of ops in the list doesn't fit, flush before the current op and try again.

Now this works up until it falls over in a heap.

Why? Well firstly if a buffer was just written to in a previous cycle, it will be in VRAM, however X doesn't know or care where the buffer lives, so when it submits a read for it, it totals it against the GART read space, and when the kernel goes to fit everything in, it already has left that buffer in VRAM taking away from the write space. So to solve that I have the kernel do a two pass, it validates all the writeable buffers first (which if not enough space will kick out the readable buffer), the validates all the readable buffers (which will pull it back into the GART).

So this works great until it falls over in a heap.

So you submit the kernel a load of command streams over time, and VRAM gets a bit fragmented. So you have 20MB of dynamic VRAM,
you have 5MB of 1MB buffers, then 10MB free, then 5MB of 1MB buffers, you have 13MBs in 3 buffers (1MB, 1MB, 11MB), the two 1MB buffers are just before the 10MB and just after it. So the command stream validates those two buffers at those two points, then goes to validate the 11MB buffer, and goes all wtf? and it fails. See rule. 1.

Of course per-context GART tables ma
So now I needz defragmenation. Simple defragmentation is to kick out all the dynamic buffers from VRAM and revalidate them in order so they all fit in. Its messy but it should mean I get a system that doesn't violate Rule. 1.

Now I'll probably like to think about inserting scheduling points in the stream, but I'm not sure how well that wins, and whether I'd still have to worry about the fragmentation issue somehow.

Of course per-context GART tables and page table addressable VRAM makes all of this stuff a lot easier, or at least push the problem out to a lot harder to hit boundary, however see the first point I made.

So if you are ever in the enviable position of writing a kernel memory manager for a graphics card, allow me to buy you some spirits.

So we've had this bug on modesetting on radeon for a while where random strings of glyphs in a pixmap were getting corrupted when
scrolling in xchat or gnome-terminal or firefox etc... I've been looking into it on and off and keeping it in the back of my mind trying to figure out what could be causing it.

So I came into the office on Friday feeling a bit tired and decided to just go and do some random hacking on the radeon code that I'd contemplated for a while. Radeon GPUs have a memory controller which gives a 32-bit space to the GPU consumers, and in which you map the VRAM and GART. However access to the address space that fall outside the VRAM/GART mappings end up getting passed through to PCI DMA controller and accessing main memory at that address. One the PCI/AGP variants this feature cannot be disabled, however on the PCIE variants you can discard and error on accesses outside the GART/VRAM. I've wanted to enable this for a while but never found the time.

So I patched my kernel to enable it and straight away on starting X in kms, I got an NMI and the GART logged an error at address 0 being accessed for a read. This intrigued me as I was sure it wasn't on purpose. I added a bunch of debugging to track it down to a buffer migration into VRAM. Upon migrating an untouched buffer to VRAM I was doing a solid fill using the 2D engine to 0 the VRAM area. This is doing using a radeon type-3 CP packet called PAINT MULTI. (Its explained in the r500 accel docs). The fill operation of course is a destination fill so it shouldn't try to read from any source, and I wasn't specifying any source in the packet. However clearly it was reading from 0 which was causing the NMI to trigger. The packet format looked fine, and eventually I got to reading the ROP3. ROP3s are raster operations defined for 2D accel, mostly from the Windows GDI days (at least I use MSDN to get at the defines still.). The packet was set to ROP_S, which is a solid copy ROP, not the solid pattern ROP I wanted. Changing it to ROP_P gave me the solid pattern fill ROP and got rid of my NMI and X worked fine with the out of bounds checking enabled.

I then noticed that my transient glyph corruption had disappeared, as sometimes the source must have had data at 0 and it was introducing crap in to the VRAM instead of zeroing it. So it just goes to show sometimes slacking off and not trying to fix bugs is the best way to fix bugs!!

Back Viewing 0 - 20  

Advertisement