You are viewing jglisse

Tired of ads? Upgrade to paid account and never see ads again!

Weeks in the life of GPU driver developer

Sometime you feel like you need to cry out loud the painful process that GPU driver development is. Over the last few weeks i have been working on DP->VGA bridge (named nutmeg) for AMD A serie integrated GPU. Up until now the VGA output of anything using this nutmeg bridge never light up, of course all the documentation we had told us that we were doing things properly, but when the monitor desperetly stays black, you have to face the true, it doesn't work.

So i went on a journey, when working on GPU, journey always turns up in some kind of heroic quest. But first a bit of background on how you get a picture on the monitor :

framebuffer -> crtc -> encoder -> transmitter -> connector -> monitor

Of course things can go wrong in any or all or some of those stages (see  if you want a longer description). Display port add more fun to the mix, the main idea behind display port is that you have train the link between the source, your GPU, and the sink, usualy your monitor but in this case the nutmeg bridge that convert the DP to VGA. Link training is one of those things that never go quite right. It's as if it was designed to be able to fail in an unlimited number of way, creativities in failure is a slogan that would fit display port. Don't get me wrong, i prefer display port of DVI or HDMI any time of the day. Display port is just almost perfect, but people designing it must had a bad day when they came up with link training.

So display port link training between the integrated GPU and the nutmeg VGA bridge was failing reliably for me. As usual i spend few days trying all sort of incentation to make it works, bending the display port specification in all possible way, interpreting it in the most non logical and backward way. I was brave and discarded no possibilities.

Because GPU would never accept to surrender to your will easily, all my attempt at fixing the link training were in vain. So it was time for me to go look at what the good old vesa bios was doing. Because the hard true is that vesa was successfull at bringing up VGA. Running the vesa bios in an emulator and catching register read/write was something we usualy did back in the day, so we already have a tool for that. But nowadays register read/write are too verbose, there is too many of them and it's way too tedious to figure out what and why such register have such value.

Before going further i must quickly describe atombios, the poor folks at AMD that work on video bios decided to come up with a simple langage (opcode is allmost limited to jump,nop,delay,mov,test,add,sub,shift,mask) that could be reuse by the driver to perform common task that OEM might need to tweak for their specific design. You see each OEM have some freedom around what kind of component it pair the GPU with. For instance different OEM can choose different DDR chips, each chips has its own timing and needs its own special initialization. For sake of simplicity anything that is specific to an OEM should be hidden behind atombios data or code. Why not using x86 directly ? Well nowadays running x86 real mode code can be tricky and x86 is not the only beast in town. Atombios offer a simple langage for which one can easily write an interpreter.

One of the thing that can be, and should be, share is the modesetting code, this code might need some special tweak by each OEM depending on voltage, frequency of the board, and possibly associated external bridge. So atombios is the perfect place for the modesetting code and it allows the video bios and the driver (wether closed or open source) to reuse the same code.

Here i am, i don't want to trace register read/write of the vesa bios but i would like to trace atombios execution of the vesa bios. You see the x86 vesa bios is allmost only just an atombios interpreter that call and execute various atombios function. What is interesting to know is the arguments the vesa bios is giving to those atombios function. Because if there is a difference between the open source driver and the vesa bios it must be in the argument to the atombios function, or an atombios function we don't call, or one we should not call.

So i needed to  trace the vesa atombios execution and catch the argument it gave to the atombios function. Thing about video bios is that there is no symbol, it's just pure x86 real mode assembler. Nevertheless there must be an entry point to the atombios interpreter in the vesa bios. The entry point offset will be in some of the call x86 opcode. So here i am editing the x86emu and tracing all the call opcode and printing register value and stack at each call. You would think that video bios is small and that there isn't much function in it, but of course it's not. I ended up with several thousand of different call, of all those call ~400 different function (offset). That's where you know you need to think a bit and find a way to beat the machine. 

I want over all the atombios function argument and looked for some i could easily guess the values the vesa bios would use, for instance the video mode size. One more thing about atombios function, they are all identified by an unique index. A given function has a fixed index and you look up its offset inside a table (using the index) in the video bios. The atombios interpreter entry point must take as argument the index of the atombios function and the arguments to the atombios function. Now i knew what kind of number i was looking for. After few (if you ever wanted to meet an euphemism you now did) false positive i finaly spoted this interpreter offset and identified on the stack the arguments to the atombios function.

It wasn't all downhill from that point, i now had a full trace of vesa setting a mode with atombios. I first replayed the trace to make sure that i did capture thing properly and that i got everything, and of course it worked. But this trace was way too big, it had too many call to too many atombios function. I had to trim it down, and so i did. This is the usal start from non working condition, remove some atombios function call, run the trace if it bring up the VGA output, it means this atombios call is not crucial so one can overlook it. Well you get it, refine and repeat over and over until you come up with a minimal trace.

Of course, GPU is a twisted thing, and nothing goes as plan with it. So i started comparing minimal trace with the open source driver atombios execution and parameter and found nothings, or so i thought. You see, most of the atombios function take 1 or 2 dword as arguments, everythings is tightly packed, each field take only the minimun number of bit needed for it. And there was my demise, i only paid attention to the first 2 dwords, but the tiny single bit difference between the open source driver and the vesa bios was in the third dword, hidden from my scrutiny.

Of course i can only blame my self for not being thorough enough, but when you spend hour looking at hexadecimal strings, you just can't help being lazy, at least i can't.

This was the tell of how to fix the VGA output of AMD A integrated GPU (also known as llano) in 3 weeks. You should not worry about my next journey, it will once again prove tedious and frustrating. I also hope that this little story shed some light on the difficulties of modesetting, many people believe that 3D engine is the most complicated piece of the GPU, well modesetting is the most unwilling and failure creative piece of the GPU (for all fairness sometime the monitor helps with broken EDID, narrow minded working frequency, limited tolerance, ...). One last thing about modesetting, without it nothing on the screen ... that could disappoint people that want to use their computer.

Road to upstream

I have been working on porting radeon kernel modesetting to new ttm developed at VMware by Thomas Hellstrom. Just to avoid confusion ttm is used inside the driver the API we expose to userspace is the GEM api. I cleaned up the radeon code along the way and i feel it's now ready for wider range of tester ! Few words on the code itself, there is now a split between old drm path and new kernel modesetting path, i did so in order to avoid breaking old path while also being able to have a clean design & code for new kernel modesetting path. I am quite happy with how the code looks now, of course i am not the only one guilty about all this work Dave Airlie and Alex Deucher did lot of the original modesetting work, Dave also worked hard in last few month on mesa radeon rewrite which has root into Nicolai Haehnle initial work. Big kudos to Maciej Cencora (aka Osiris) for all improvement he done on the code. So here is a screenshot of what you can get with kernel modesetting, a decent ddx and proper mesa (radeon-rewrite branch) :

radeon newttm screenshot

So if you feel adventurous here are things you need:

git clone git://
cd drm-next
git branch drm-next-radeon origin/drm-next-radeon
git checkout drm-next-radeon
Then usual kernel configuration just enable fbcon,ttm and radeon kernel modesetting.

git clone git://
cd drm
git branch modesetting-gem origin/modesetting-gem
git checkout modesetting-gem
./ --prefix=/usr --libdir=/usr/lib64
(libdir is only needed if on x86-64)
sudo make install

git clone git://
cd xf86-video-ati
git branch radeon-gem-cs3 origin/radeon-gem-cs3
git checkout radeon-gem-cs3
./ --prefix=/usr --libdir=/usr/lib64
(libdir is only needed if on x86-64)
You also need the Xorg dev package from you distribution sudo yum-builddep xorg-x11-drv-ati.x86_64 (on fedora)
sudo make install

git clone git://
cd mesa
git branch radeon-rewrite origin/radeon-rewrite
git checkout radeon-rewrite
./ --prefix=/usr --libdir=/usr/lib64 --with-dri-drivers=radeon,r200,r300
(libdir is only needed if on x86-64)
sudo make install

Now once you reboot under new kernel you need to load radeon module with modeset=1, then your userspace should be all set, i know this needs many things to build but hopefully soon we could start merging little by little thing upstream and so it should end up in your distro. Anyway if you own an x200 (igp) or x1250/x1200 with intel CPU i would love to know if you got any issues with this code. Early tester are always appreciated and i would like to thanks spstarr, dilex, osiris, ... for their early testing and reporting on this code.

Forget to mention that suspend/resume should work flawlessly even if you are in a middle of quake3 game while spinning compiz cube (don't think it's a normal use case but you never know).

Of course performance are not yet what it used to be, but here is my todo list which should also bring us back somewhere near old performance and maybe even outperform old path :
-erase memory on bo allocation (security needed before upstreaming)
-more check on command stream (security needed before upstreaming)
-port & cleanup Dave's page allocator to avoid heavy cache flushing (performances)
-buffer swapping (performances)
-buffer tiling (performances)