Sunday, 1 May 2016

Using the Windows Subsystem for Linux to simplify CUDA builds

Recently, Microsoft announced the Windows Sybsystem for Linux, aka Bash on Ubuntu on Windows. This is currently pre-release, and allows you to run linux tools inside Windows. In particular, this opens up a whole range of development tools and options. First. a caveat: this stuff is early release and only currently available on the “Fast Ring” of insider builds. If it works: great. If it doesn’t work: oh well. If it melts your PC, stops it booting, or causes skynet to launch the missiles: that’s tough too.

Now, it just so happens that I’ve been playing a lot with CUDA lately – it turns out that it works quite nicely for our “tag engine” back-end service (and I shall no-doubt be blogging about my journey here very soon). If you’ve done any CUDA, you will know that there are two compilation options for compiling your CUDA kernel source-code into something that can run on your GPU – runtime (NVRTC) and build-time (NVCC). The runtime compiler is very convenient for fast iterations – edit a text file, reload your app – but: it doesn’t currently support one very important feature: dynamic parallelization. I’m hoping that CUDA 8.0 Toolkit (due some time around August) might fix this, but until then, it means that I need to use NVCC for my kernels, since I happen to need this feature.

The next thing you’ll know if you’ve done any CUDA work is that NVCC on Windows is fussy. Not just a little fussy, but really fussy. Not only does it require specific versions of Visual Studio (that are several major versions behind what I usually have installed), but it also requires your command-line C++ environment to be configured perfectly, and even then (as I found out the hard way), it might still turn around and laugh in your face. Possibly for reasons, possibly just for the giggles. I eventually gave up on making NVCC work in Windows, so I configured a Ubuntu VM in Hyper-V, added a SMB share, and used the VM to build my kernels. It worked, but it was unsatisfactory. So I thought: can I use the new Windows goodies to fix this? And would you believe it: yes… ish.

Scope: note that for the purposes of this article I’m not going to try to explain what CUDA kernels are, or how to use NVCC. If you don’t happen to know these things, it will suffice to know that NVCC is a tool you need, it is really awkward to get it working on Windows, and quite a bit easier to get it working on linux.

It turns out that actually this was pretty easy; first we need to get Bash working:

1. enable the fast ring updates
2. be patient for a day or so until it deigns to offer to let you install the new Windows build
3. install the new Windows build and once again learn the true meaning of patience
4. enable developer mode in Windows settings:

image

5. turn on the new system in Windows features:

image

6. find and run the new tool

image

The first time you run this, it downloads some pieces and prompts for your new linux credentials. When it has finished installing itself, you get a Ubuntu 14.04 terminal:

image

Conveniently, it also mounts your Windows drives for you – so “C:” is “/mnt/c” etc.

Important note: ctrl-v to paste doesn’t currently work, but don’t worry: you don’t need to re-type lines – you can successfully paste via the window menu:

image

Right. So we have a vanilla (if minimal) Ubuntu installation. From my Hyper-V install, I know that NVCC on linux requires gcc 4.9, so we can follow a few lines from Ask Ubuntu to install this. We also need to install NVCC itself; I recommend not using the version from apt-get, as that is very old – I installed 7.5 using the notes on nvidia.com, which really just meant choosing my preferred install from here - I went for the network-enabled deb install:

image

And remember: once you have downloaded the file, you can access it under /mnt/c/ or similar. The website tells you the full commands to run, so it isn’t a challenge even if you don’t know much linux. The installer takes quite a while to download and complete, so again: patience. It is also worth noting that when installing this way, no symlink or similar is added to the path – so you can either do that yourself, or just use the tool in the installation location of “/usr/local/cuda-7.5/bin/nvcc”:

image 

After that, compiling to cubin, fatbin, or ptx should just work. The actual commands aren’t very interesting and are described in the documentation. I just wrap my NVCC commands in a bash file (remembering to ensure newlines are LF only), and run that. Here we can see no “fatbin” before the script, executing the script, and the “fatbin” existing after the script:

image

So what have we done?

  • we have installed the linux subsystem
  • we have installed some linux tools and utilities
  • we have used those tools as part of a development build process, accessing our windows file system
  • all without needing a separate VM

The fact that it involved NVCC and CUDA is largely incidental; the same tecnhiques could be used on many tools.

Minor confessions

1. Life doesn’t often run smoothly. I’m not much of a linux person, and it took me very little time to completely trash my linux install. Fortunately the nuclear reset option exists.

2. The fast ring installs are … temperamental. As it happens, it installed just fine on my travel laptop (hence @TRAVEL in the screens), but it silently failed to install on my main development desktop, and has now disappeared from the update UI … so I can’t find any way to get it installed. Unfortunately, my laptop doesn’t have a CUDA-enabled GPU, so actually this hasn’t saved me anything – I still need to use my Hyper-V VM for real work. But: it was still fun to get it working!