With the latest SDK release, the Kinect platform has become pretty robust – skeleton tracking, gesture support, and access to the raw infrared stream all come “out-of-the-box”. Which is great, because if you’ve ever used any of the opensource equivalents, things can get pretty hairy.
Hands and skeletons are cool and all, but unfortunately there isn’t an *easy* way to do blob or generic object detection.. Right now, we’re working on a fun project and ultra fast blob detection is one of the major tech hurdles we’re working through.
After googlin’ around for blob-tracking with Kinect I came up empty handed. So I decided to look into OpenCV for doing blob detection. Many curse words later, I finally got things working the way I wanted. Along the way I noticed a handful of forums with people looking to do similar object tracking with Kinect, so I figured I would share how I cobbled things together.
Demo Overview & Source Code
The demo below is showing a couple of key things:
- Consuming the raw Depth image from the Kinect sdk
- Slicing the Depth image to a particular range
- Using OpenCV to perform object/blob tracking
- Rendering OpenCV output with WPF
Before We Get Started
Be sure to download the latest Kinect SDK. There are also a bunch of handy samples that come with the toolkit. You can get it here
EMGU (.Net Wrapper for OpencV)
Emgu is a .net wrapper for OpenCV. It is very complete and does a good job of providing .net hooks into the common OpenCV functions. Unfortunately, getting it to work with WPF is a little tricky (More below ). Also, I wasn’t able to get the lastest (2.4) version of the code to work with .net 4.5 and Visual Studio 2012, but the 2.3 version works well. Get it here
This codeplex project is similar, but it’s using another computer vision library called AForge and only compiles with a beta version of the Kinect sdk. Regardless, there are some handy tid-bits of code in there. You won’t need it to build my sample but its a good resource to look through.
Ok, now that we have the dependencies installed, its time to start setting up our WPF project. There are similar instructions on the Emgu site, but I found them a little confusing, so I’ll give you the cliffnotes version here.
After you’ve installed Emgu you need to add some .dlls to your project. This is the standard Solution->right-click->Add Reference and then browse to the Emgu installation path to add the following .dlls
The next step is a little odd. You need to add 3 .dlls to the root of your project. Right-click the Project fill and select “add existing item”, then navigate to the Emgu installation folder and add the following:
The last thing, and this is important, is to set the build action to “copy always”. This will dump those dlls into your build folder when you build the project. When you’ve got all that done you’re solution should look something like this:
Getting OpenCV to Play Nice with WPF
Ok, now that we have all the references in place we need to make openCV (Emgu) play nice with WPF. The biggest thing to note is that OpenCV uses it’s own type of image objects (IImage) which don’t work by default with the native WPF image controls.
Luckily, there is a little helper class that comes with Emgu that will convert the IIMage to a BimapSource, which works nicely with the WPF image control. By adding this helper method, we now have a convenient way to plug an OpenCV image into a WPF image. See the helper class here
The Fun Part
Now we can get to the good stuff. When you do blob tracking it is very important to have a “clean” grayscale image that has a high contrast between the object you are tracking and the background. This is one of the best parts about Kinect – it provides a grayscale depth image that we can analyze and extract out the regions of interest relatively easily. Let’s looking at the key pieces of the solution.
Depth Image Slicing
Like I mentioned above, when you are doing generic object detection you want to have high contrast between your object and the background. In other words, you want to isolate the object the best you can and ignore everything else. By doing so, you can ensure the vision library, like OpenCV, does a reliable job of finding the blobs.
The beauty of using the Kinect for blob tracking is that it gives us a depth image that contains all of the information we need to “slice” the image into the parts that we’re interested in. While I was trying to figure out the best way to slice the depth image I came upon a Kinect depth data video that really opened my eyes to working directly with the depth byte stream coming from Kinect. Check it out here
By simply ignoring (coloring pixels black) data that is out of the defined depth range, I was able to get a nice high-contrast image of the objects the depth camera was picking up. You can imagine this being something like an invisible plane that excludes everything in front and in back of it, which coincidentally creates a nice interaction space where we can track objects. You can see the slicing code here. This is the output:
OpenCV Contour Finding
The last thing to do is to get the blob tracking working. At this point we have a very clean image that we can feed into OpenCV’s “Find Contours” method. The FindContours method analyzes a grayscale image and returns the number of blobs as well as their bounding rectangles. It was a little tricky getting this to work in WPF, you can see the main piece of the code here.
Once I was able to figure out some of the cryptic tracking mechanisms in OpenCV everything came together. The cool thing about OpenCV is that not only can you get blobs, but you can blobs within blobs..in a tree. Crazy..I know. And lastly, we use our handy helper method from above (ToBitmapSource(OpenCVImg)) and plug it into the Image control in the main window xaml. Which ends up looking something like this:
The last thing I added were a few sliders to the UI to adjust the slicing parameters and the max/min Blob size. The image below shows the finished application.
All said and done, I am happy with the performance and excited about the opportunities to use this slew of technologies in some upcoming installation work.
Ok, so what?
I know what you’re thinking…so you have some blobs, big deal. And you might be wondering what else you can do with blobs. Well, let me tell you. By using generic blob detection you can not only detect when objects are present, you can also track their size, shape and rotation. For example, its possible to track an open hand transitioning to a closed fist for a grab gesture, or a user holding a wand or prop. Or in our case, we want to very efficiently capture the position of small, fast moving objects that penetrate an “invisible” IR plane. There are a ton of uses for blob detection including surveillance, card counting, and even real-time projection mapping.
While I’m pretty happy with the performance of this solution, I’m curious if it would be better/faster if I rewrote everything with the C++ flavor of the Kinect sdk and coupled it with DirectWrite or XAML to render the UI. Anybody out there do anything like this yet? If so, shoot me a tweet or email.
We’re currently rolling this stuff into a fun side project that I’ll post about shortly…stay tuned.
Best191 comments , permalink