For me one of the coolest new features added in Silverlight 4 is support for capturing video and audio using webcams and microphone. As we’ve seen during Scott Guthrie’s keynote demonstration it is now possible to capture image frames from video stream and apply some interesting effects to it. On top of that we can even process the video and audio streams directly on the client (i.e. inside the browser).
Because Silverlight 4 Beta was already available at PDC I could try the webcam support right away, and so I jumped to refresh my computer vision projects. Unfortunately I never did find time to finish this but I think it’s still worth publishing the code as it is.
One things I was playing with before was the Touchless SDK. It is an open source .NET library for visual object tracking created by Mike Wasserman. In short it identifies objects with specific color characteristics and sends you their position within the image scene enabling to use these objects as input devices. It means you can control your applications without touching the screen or keyboard (and thus it’s called touchless :-)
Original project page is here: http://touchless.codeplex.com/

Over a year ago I successfully used this library in WPF so now it was quite easy to convert it again to Silverlight 4. You can check it out and download the source code on the demo page. Below I summarize some notes of things I learned about Silverlight 4 API in hopes that Silverlight team can address these issues before final release.
As a starting point I used this excellent summary by Mike Taulty. The main class that handles video processing is the CameraTracker. It extends from VideoSink in order to get access to incoming video samples. Each sample frame is represented by byte array which format is specified by VideoFormat structure. In order to get current VideoFormat you have to override the OnFormatChange method (call to videoSource.VideoCaptureDevice.DesiredFormat will throw exception).
I learned that although VideoFormat specifies the PixelFormatType, currently it is an enum with only two values: Format32bppArgb and Unspecified. It’s WPF counterpart – PixelFormat structure - offers much more including masks and bytes per pixel. And instead of enum we get a static PixelFormats class with number of common formats declared as properties. I hope this will be changed in Silverlight 4 RTM.
Another thing to be aware of is that the video frame buffer is flipped vertically – so you need to read the image lines from bottom to top. I handle this in the GetPixel and SetPixel methods of RawBitmapAdapter helper class:
public RgbColor GetPixel(int x, int y)
{
int p = _scan0 + (y * _stride) + (x * _bytesPerPixel);
byte b = _imageData[p];
byte g = _imageData[p+1];
byte r = _imageData[p+2];
return new RgbColor(r, g, b);
}
public void SetPixel(int x, int y, RgbColor color)
{
int p = _scan0 + (y * _stride) + (x * _bytesPerPixel);
_imageData[p] = color.Blu;
_imageData[p+1] = color.Grn;
_imageData[p+2] = color.Red;
}
In both cases I first calc the offset of pixel in the byte buffer using the scan0 and stride values. Stride is specified in VideoFormat but we have to guess the scan0 value (it specifies the offset of the first line):
if (_stride < 0)
_scan0 = -_stride * (_height - 1);
Would be helpful the get it in the VideoFormat too. One last comment regarding VideoFormat is I think it’s constructor should be made public.
Before initializing the camera app displays a list of available devices. I tried to bind the collections from my ViewModel however it turns out that calls to GetAvailableVideoCaptureDevices and GetDefaultVideoCaptureDevice return different instances of CaptureDevice so you can’t do something like this:
CaptureDevices = CaptureDeviceConfiguration.GetAvailableVideoCaptureDevices();
SelectedCaptureDevice = CaptureDeviceConfiguration.GetDefaultVideoCaptureDevice();
Also I noticed that none of the devices returned by GetAvailableVideoCaptureDevices has IsDefaultDevice flag set, so I had to use FriendlyName to match it.
Because CaptureSource is used in similar way to MediaElement I think it should expose similar API. In particular it would be helpful to get events when it state changes. We should also get some notification when camera is connected or disconnected.
In order to show the markers recognized by Touchless I paint these areas on an overlay. In current version I tried to implement this using a custom MediaStreamSource with double buffering as demonstrated by Pete Brown. It works but you can see the overlay lags significantly in respect to camera video. So I’m going to switch back to WriteableBitmap instead, but I’m not sure how it will perform either. In particular I’m concerned that WriteableBitmap doesn’t have a way to indicate that we are going to begin updating the buffer. In the WPF version you first need to issue Lock to get access to the BackBuffer. I was bit surprised that it’s not required in Silverlight version and I’m concerned how efficient it is without it.
Altogether the camera API looks very promising and definitively opens many possibilities. However I think the most common scenario people will be trying to implement will be some sort of online IM client, and for this we would also need some generic video codecs for streaming. Right now Silverlight doesn't even have codecs for Jpeg/PNG images so I think it should be added in the first place. In the meantime you can try to use these as an alternative: http://imagetools.codeplex.com/ Updated: Please see this great post by Rene Shulte on saving webcam snapshots to JPEG using FJCore library.
Last thing, unrelated to camera API, is regarding support for commands that was also added in Silverlight 4. I found that the command CanExecute method won’t be required automatically on user input as it happens in WPF so you need to manually send CanExecuteChanged event for each command.