okay another braindump (still nothing working).
The git repo mentioned in previous post has all the code I've hacked up so far.
I finished writing the HDCP protocol stages, and sending all the msgs and getting replies from the device.
So I've successfully reached a point where I've negotiated a HDCP session key with the device, and we are both happy about it. Unfortunately I've no idea what I'm meant to be encrypting to send to the device. The next packet the USB traces contain is 384-bytes of encrypted data.
Now HDCP v2 had a vulnerabilty in its key neg, and I've written code to try and use this fact. So I've taken a trace I made from Windows, and extracted the necessary bits, and using that I've managed to derive the master key used in that trace, and subsequently managed to derived the session key for it. So I've replayed the first encrypted packet from the trace to the device and got an encrypted response the same as in the trace.
I've tried changing a bit in the session key, riv value and data I'm sending, and doing that causes the device not to reply with the answer. This to me implies that the device is using the HDCP cipher to encode the control channel. Now HDCP does say you should only do this for video streams, but maybe DisplayLink forgot to read that bit.
Now where does this leave me, in theory I should be able to replay the full trace (haven't had time yet) and I should see the same picture on screen as I did (though I can't remember what monitor/device I used, so I might have to retrace and restage my tests before then).
However I really need to decrypt the encrypted data in the trace, and from reading the HDCP spec the only values I need to feed the AES engine are ks ^ lc128, riv, streamctr, inputctr. I'm assuming streamctr and inputctr are 0 for the first packet (I could be wrong, maybe they use some wacky streamctr to avoid messing with hdcp), riv and ks I've captured. So lc128 is possibly the crux.
Now what is lc128? Its a secret 128-bit value in the HDCP world given only to HDCP adopters. Its normally something you'd store in hw on the GPU etc as an input to the hw cipher. But in displaylink there is no GPU encrypting the data. Now its possible that displaylink don't use the same lc128 as the HDCP people, unlikely but possible. Maybe they cipher their streams with their own lc128, and only use the offical hdcp lc128 for actual HDCP streams.
I don't think lc128 has leaked, I'm not sure what the consequences of it leaking would be, but hey its just a magic number, and if displaylink are using as an input to their AES code, it must be in RAM at some point, now I need to figure out ways to work that out. I'm not sure how long it would take to brute force as 128-bit key space, probably impossible.
At any point if someone from DisplayLink wants to talk, you know where to find me :-)
So for some reason I decided to look at the displaylink usb3 adaptors today. (no good news).
This blog post is so I don't forget all of this when I page it out. Notes, HDCP1.0 being broken doesn't matter to this, maybe HDCPv2.0 being a bit broken could be used, but I'm not sure how!
The displaylink USB3 protocol is based on HDCP protocol. I've traced the first few packets and it clearly
looks like the host sends two packets
and the device sends back
AKE_Send_Cert contains a 522 byte certificate, containing a receiver id, public key, some misc bytes and a signature generated with the DCP LLC private key, that you have to verify.
so the HDCP v2.2 spec contains the DP LLC public key, and I've written some code to verify the spec using openssl, but it totally fails to work. This is probably due to me doing something stupid, or not understanding what I'm doing, if you are openssl knowledgeable and want to look, the hack fest is
It might be the DisplayLink devices use a different signing key than the DP LLC one.
That repo contains some code to talk to the device (currently disabled) and do the initial sequence, along with an attempt to verify the cert.
Now once I get past this hurdle, the larger one seems to remain, the HDCP 2.0 spec has a global secret 128-bit value called LC128, that everyone who implements HDCP gets and hides somewhere. Its probably sitting in the displaylink driver in hex, but I'd hope they at least hide it better than that. It may also be possibly supplied by the OS, Windows or OSX. (I've no clue yet). That value is used in the key negotiation.
Now it might be possible that Displaylink allow non-HDCP encrypted data to be sent to the device, in which case win if I can find out where/how to do that, or it might be the device requires HDCP and decrypts non-HDCP content before sending it over VGA/DVI. I've no ideas yet on that front either.
Ah well probably enough learning for today, I knew nothing about HDCP this morning, so I can't say it made my life any better learning about it :-P
or why publishing code is STEP ZERO.
If you've been developing code internally for a kernel contribution, you've probably got a lot of reasons not to default to working in the open from the start, you probably don't work for Red Hat or other companies with default to open policies, or perhaps you are scared of the scary kernel community, and want to present a polished gem.
If your company is a pain with legal reviews etc, you have probably spent/wasted months of engineering time on internal reviews and stuff, so think all of this matters later, because why wouldn't it, you just spent (wasted) a lot of time on it, so it must matter.
So you have your polished codebase, why wouldn't those kernel maintainers love to merge it.
Then you publish the source code.
Oh, look you just left your house. The merging of your code is many many miles distant and you just started walking that road, just now, not when you started writing it, not when you started legal review, not when you rewrote it internally the 4th time. You just did it this moment.
You might have to rewrite it externally 6 times, you might never get it merged, it might be something your competitors are also working on, and the kernel maintainers would rather you cooperated with people your management would lose their minds over, that is the kernel development process.
step zero: publish the code. leave the house.
(lately I've been seeing this problem more and more, so I decided to write it up, and it really isn't directed at anyone in particular, I think a lot of vendors are guilty of this).
So I've created a copr
With a kernel + intel driver that should provide support for DisplayPort MST on Intel Haswell hardware. It doesn't do any of the fancy Dell monitor stuff yet, it primarily for people who have Lenovo or Dell docks and laptops that can't currently multihead.
The kernel source is from this branch which backports a chunk of stuff to v3.14 to support this.
It might still have some bugs and crashes, but the basics should in theory work.
DisplayPort 1.2 Multi-stream Transport is a feature that allows daisy chaining of DP devices that support MST into all kinds of wonderful networks of devices. Its been on the TODO list for many developers for a while, but the hw never quite materialised near a developer.
At the start of the week my local Red Hat IT guy asked me if I knew anything about DP MST, it turns out the Lenovo T440s and T540s docks have started to use DP MST, so they have one DP port to the dock, and then dock has a DP->VGA, DP->DVI/DP, DP->HDMI/DP ports on it all using MST. So when they bought some of these laptops and plugged in two monitors to the dock, it fellback to using SST mode and only showed one image. This is not optimal, I'd call it a bug :)
Now I have a damaged in transit T440s (the display panel is in pieces) with a dock, and have spent a couple of days with DP 1.2 spec in one hand (monitor), and a lot of my hair in the other. DP MST has a network topology discovery process that is build on sideband msgs send over the auxch which is used in normal DP to read/write a bunch of registers on the plugged in device. You then can send auxch msgs over the sideband msgs over auxch to read/write registers on other devices in the hierarchy!
Today I achieved my first goal of correctly encoding the topology discovery message and getting a response from the dock:
[ 2909.990743] link address reply: 4
[ 2909.990745] port 0: input 1, pdt: 1, pn: 0
[ 2909.990746] port 1: input 0, pdt: 4, pn: 1
[ 2909.990747] port 2: input 0, pdt: 0, pn: 2
[ 2909.990748] port 3: input 0, pdt: 4, pn: 3
There are a lot more steps to take before I can produce anything, along with dealing with the fact that KMS doesn't handle dynamic connectors so well, should make for a fun tangent away from the job I should be doing which is finishing virgil.
I've ordered another DP MST hub that I can plug into AMD and nvidia gpus that should prove useful later, also for doing deeper topologies, and producing loops.
Also some 4k monitors using DP MST as they are really two panels, but I don't have one of them, so unless one appears I'm mostly going to concentrate on the Lenovo docks for now.
So to enable OpenGL 3.3 on radeonsi required some patches backported to llvm 3.4, I managed to get some time to do this, and rebuilt mesa against the new llvm, so if you have an AMD GPU that is supported by radeonsi you should now see GL 3.3.
For F20 this isn't an option as backporting llvm is a bit tricky there, though I'm considering doing a copr that has a private llvm build in it, it might screw up some apps but for most use cases it might be fine.
So I've been looking into how I can do some buffer passing with EGL and OpenGL with a view to solving my split renderer/viewer problem for qemu.
contains the hacks I've been playing with so far.
The idea is to have a rendernode + gbm using server side renderer, that creates textures and FBOs attached to them, renders into them, then sends them to a client side, which renders the contents to the screen using GL rendering.
This code reuses keithp's fd passing demo code and some of dvdhrm's simple dma-buf code.
Firstly the server uses GBM and rendernodes to create a texture, that it binds to a FBO. It generates an EGLImage from the texture using EGL_GL_TEXTURE_2D_KHR, then uses EGL_MESA_drm_image to get a handle for it, then uses libdrm drmPrimeHandleToFD to create an fd to pass to the server. It passes the fd using the fdpassing code. It then clears the texture, sends the texture info to the client, along with a dirty rect, clears it again, and sends another dirty rect.
The client side, uses EGL + GLES2 with EXT_image_dma_buf_import to create an EGLImage from the dma-buf, then uses GL_OES_EGL_image to create a 2D texture from the EGLImage then just renders the texture to a window.
Shortcomings I've noticed in the whole stack so far:
a) asymmetric interfaces abound:
1) we have an EGLImage importer for dma-buf EXT_image_dma_buf_import, but we have no EGLImage dma-buf exporter yet - hence the MESA_drm_image + libdrm hack.
2) we have an EGLImage exported for Desktop OpenGL, EGL_KHR_gl_image works fine. But we only have EGLImage importers for GLES, GL_OES_EGL_image - hence why the client is using GLES2 to render not GL like I'd like.
b) gallium is missing dma-buf importing via EXT_image_dma_buf_import, I have a quick patch, since we have the ability to import from fds just not from dma-bufs, I should send out my first hack on this.
The demo also has color reversing issues I need to sort out, due to gallium code needing a few more changes I think, but I've gotten this to at least run on my machine with nouveau and the hacked up dma-buf importer patch.
So one of the stumbling blocks on my road to getting 3D emulation in a VM is how most people use qemu in deployed situations either via libvirt or GNOME boxes frontends.
If you use are using libvirt and have VMs running they have no connection to the running user session or user X server, they run as the qemu user and are locked down on what they can access. You can restart your user session and the VM will keep trucking. All viewing off the VM is done using SPICE or VNC. GNOME Boxes is similar except it runs things as the user, but still not tied to the user session AFAIK (though I haven't confirmed).
So why does 3D make this difficult?
Well in order to have 3D we need to do two things.
a) talk to the graphics card to render stuff
b) for local users, show the user the rendered stuff without reading it back into system RAM, and sticking it in a pipe like spice or vnc, remote users get readback and all the slowness it entails.
No in order to do a), we face a couple of would like to have scenarios:
1. user using open source GPU drivers via mesa stack
2. user using closed source binary drivers like NVIDIA or worse fglrx.
How to access the graphics card normally is via OpenGL and its window APIs like GLX. However this requires a connection to your X server, if your X server dies your VM dies, if your session restarts your VM dies.
For scenario 1, where we have open source kms based drivers, the upcoming render nodes support in the kernel will allow process outside the X server control to use the capabilities of the graphics card via the EGL API. This means we can render in a process offscreen. This mostly solves problem (a) how to talk to the graphics card at all.
Now for scenario 2, so far NVIDIA has mostly got no EGL support for its desktop GPUs, so in this case we are kinda out in the cold, until they have at least EGL support, in terms of completely disconnecting the rendering process from the running user X server lifecycle.
This leaves problem (b), how do we get the stuff rendered using EGL back to the user session to display it. My first initial hand-wave in this area involved EGL images and dma-buf, but I get the feeling on subsequent reads that this might not be sufficient enough for my requirements. It looks like something like the EGLStream extension might be more suitable, however EGLstream suffers from only being implemented in the nvidia tegra closed source drivers from what I can see. Another option floated was to somehow use an embedded wayland client/server somewhere in the mix, I really haven't figured out the architecture for this yet (i.e. which end has the compositor and which end is the client, perhaps we have both a wayland client and compositor in the qemu process, and then a remote client to display the compositor output, otherwise I wonder about lifetime and disconnect issues). So to properly solve the problem for open source drivers I need to either get EGLstream implemented in mesa, or figure out what the wayland hack looks like.
Now I suppose I can assume at some stage nvidia will ship EGL support with the necessary bits for wayland on desktop x86 and I might not have to do anything special and it will all work, however I'm not really sure how to release anything in the stopgap zone.
So I suspect initially I'll have to live with typing the VM lifecycle to the logged in user lifecycle, maybe putting the VM into suspend if the GPU goes away, but again figuring out to integrate that with the libvirt/boxes style interfaces is quite tricky. I've done most of my development using qemu SDL and GTK+ support for direct running VMs without virt-manager etc. This just looks ugly, though I suppose you could have an SDL window outside the virt-manager screen and virt-manager could still use spice to show you the VM contents slower, but again it seems sucky. Another crazy idea I had was to have the remote viewer open a socket to the X server and pass it through another socket to the qemu process, which would build an X connection on top of the pre opened socket,
therefore avoiding it having to have direct access to the local X server. Again this seems like it could be a largely ugly hack, though it might also work on the nvidia binary drivers as well.
Also as a side-note I discovered SDL2 has OpenGL support and EGL support, however it won't use EGL to give you OpenGL only GLES2, it expects you to use GLX for OPENGL, this is kinda fail since EGL with desktop OpenGL should work fine, so that might be another thing to fix!
Okay its been a while, so where is virgil3d up to now I hear you ask?
Initially I wrote a qemu device and a set of guest kernel drivers in order to construct a research platform on which to investigate and develop the virgil protocol, renderer and guest mesa drivers based on Gallium3D and TGSI. Once I got the 3D renderer and guest driver talking I mostly left the pile of hacks in qemu and kernel alone. So with this in mind I've split development into two streams moving forward:
1) the virgil3d renderer and 3D development:
This is about keeping development of the renderer and guest driver continuing, getting piglit tests passing and apps running. I've been mostly focused on this so far, and there has been some big issues to solve that have taken a lot of the time, but as of today I got xonotic to play inside the VM, and I've gotten the weston compositor to render the right way up. Along with passing ~5100/5400 piglit gpu.tests.
The biggest issues in the renderer development have been
a) viewport setup - gallium and OpenGL have different viewport directions, and you can see lots of info on Y=0=TOP and Y=0=BOTTOM in the mesa state tracker, essentially this was more than my feeble brain could process so I spent 2 days with a whiteboard, and I think I solved it. This also has interactions with GL extensions like GL_ARB_fragment_coord_conventions, and FBOs vs standard GL backbuffer rendering.
b) Conditional rendering - due to the way the GL interface for this extension works I had to revisit my assumption that the renderer could be done with a single GL context, I had to rewrite things to use a GL context per guest context in order to give conditional rendering any chance of working. The main problem was using multiple GL queries for one guest query didn't work at all with the cond rendering interface provided by GL.
c) point sprites - these involved doing shader rewrites to stick gl_PointCoord in the right places, messy, but the renderer now has shader variants, however it needs some better reference counting and probably leaks like a sieve for long running contexts.
2) a new virtio-gpu device
The plan is to create a simple virtio based GPU, that can layer onto a PCI device like the other virtio devices, along with another layer for a virtio-vga device. This virtio based gpu would provide a simple indirect multi-headed modesetting interface for use by any qemu guests, and allow the guest to upload/download data from the host side scanouts. The idea would be then to give this device capabilities that the host can enable when it detects the 3d renderer is available and qemu is started correctly. So then the guest can use the virtio gpu as a simple GPU with no 3D, then when things are ready the capability is signalled and it can enable 3D. This seems like the best upstreaming plan for this work, and I've written the guts of it.
In order to test the virtio-gpu stuff I've had to start looking at porting qemu to SDL 2.0 as SDL 1.2 can't do multi-window and can't do argb cursors, but SDL 2.0 can. So I'm hoping with SDL 2.0 and virtio-gpu you can have multiple outputs per the vgpu show up in multiple SDL windows.
I'll be speaking about virgil3d at the KVM Forum in Edinburgh in a couple of weeks and also be attending Kernel Summit.