Geeks With Blogs
Ulterior Motive Lounge UML Comics and more from Martin L. Shoemaker (The UML Guy),
Offering UML Instruction and Consulting for your projects and teams.
In Part 1, we saw how the process of building a grammar is similar to the Decorator or Composite patterns, building a larger structure out of smaller pieces. In Part 2, we'll build and recognize a grammar to see how to define and identify parts of a command.

In some ways, I wish I had chosen a different example for my first speech application. I think Dee Jay is a really cool app, and I use it every day on my drive to work; but the Media Player rogramming is complex enough to be worthy of a few blog posts on its own, and that's really not what I'm trying to explain here. So I'll show some Media Player code here and there, but it won't be the main point of this post. If I get questions on the Media Player side, maybe I can delve into more detail at another time; but for now, I'll leave those details as Media Player Magic (MPM).

I wrap most of the Media Player work in two classes, MediaDescriptor and MediaPhrase:

Media Classes

I started with a single, simple command in mind: "Dee Jay, play Has Been." But "Has Been" denotes both a song and an album. If I asked you to play Has Been, you wuldn't know which I meant. How could Dee Jay know?

So I realized that any given phrase might match a song title, an album title, or an artist. Also, a given song or album might be identified by many different phrases: title, artist, abum, genre, etc. These concerns led me to create MediaPhrase, a class which links a given phrase to one or more MediaDescriptors:


/// <summary>
/// Represents a phrase that maps to one or more media descriptors.
/// </summary>
public class MediaPhrase
{

/// <summary>
/// The phrase.
/// </summary>
private string mPhrase;

/// <summary>
/// The phrase.
/// </summary>
public string Phrase
{

get { return mPhrase; }

}

/// <summary>
/// The descriptors.
/// </summary>
private List mDescriptors = new List();

/// <summary>
/// The descriptors.
/// </summary>
public List Descriptors
{

get { return mDescriptors; }

}

/// <summary>
/// Construct.
/// </summary>
/// The phrase.
public MediaPhrase(string phrase)
{

mPhrase = phrase;

}

}


Looking ahead, the plan will be simple: if a recognized phrase maps to exactly one MediaDescriptor, Dee Jay will just play the corresponding media; but if the phrase maps to multiple MediaDescriptors, then you and Dee Jay will have to identify which media you want.

The other major class is MediaDescriptor, an abstract base class which represents one or more media items:


/// <summary>
/// Describes a song or song collection.
/// </summary>
public abstract class MediaDescriptor
{

///
/// Play the media.
///
/// Target player.
public abstract void Play(IWMPPlayer4 player);

///
/// List the songs in the descriptor.
///
///
public abstract List GetMediaList();

///
/// Describe the descriptor.
///
///
public abstract string Describe();

}


The Play method plays the media on an IWMPPlayer4 object, which is the latest, most powerful interface to Windows Media Player. The GetMediaList method returns a list of all IWMPMedia3 objects within the descriptor (where IWMPMedia3 is the interface to a single media item). The Describe method describes this descriptor.

Of course, you don't want to play "descriptors"; you want to play songs, or albums, or artists. This leads to the three concrete subclasses of MediaDescriptor. SongDescriptor describes a single song, while AlbumDescriptor describes an entire album. CollectionDescriptor describes a collection of related songs, such as all songs by a particular artist or all songs in a particular genre. The details of these classes are all MPM, so we won't delve into them here.

So given a phrase, we can find media; but now we need to pull the phrases from Media Player. This is the role of the JukeBoxPhraseMap class. There's a lot of MPM in this class, but the skeleton is shown here:


/// <summary>
/// Represents a map of phrase strings to media phrases.
/// </summary>
public class JukeBoxPhraseMap : SortedDictionary
{

/// <summary>
/// Add a song to the phrase map.
/// </summary>
/// <param name="song">The song.</param>
public void AddSong(IWMPMedia3 song)
{

MPM here...

}

/// <summary>
/// The phrases in the map.
/// </summary>
public IEnumerable Phrases
{

get { return this.Keys; }

}

/// <summary>
/// Event fired when a media descriptor is scanned.
/// </summary>
public event EventHandler MediaScanned;

/// <summary>
/// Add a playlist to the map.
/// </summary>
/// <param name="playlist">The playlist.</param>
public void AddPlaylist(IWMPPlaylist playlist)
{

MPM here...

}

Lots more MPM here...

}

/// <summary>
/// Describes a scanned item.
/// </summary>
public class MediaScanArgs : EventArgs
{

/// <summary>
/// The descriptor.
/// </summary>
private MediaDescriptor mDescriptor;

/// <summary>
/// The descriptor.
/// </summary>
public MediaDescriptor Descriptor
{

get { return mDescriptor; }

}

/// <summary>
/// Construct.
/// </summary>
/// <param name="descriptor">Source</param>
public MediaScanArgs(MediaDescriptor descriptor)
{

mDescriptor = Descriptor;

}

}


This class is a SortedDictionary that maps strings to MediaPhrases. You can add songs to it, and you can also add IWMPPlaylist objects (where IWMPPlaylist is the Media Player interface to standard and custom playlists). You can get the list of Phrases as a property; and the class fires a MediaScanned event for each new descriptor added. (This is useful for displaying progress as you scan your Media Player library.)

The rest of this class is lots and lots of MPM, and not important for our topic. (That's speech recognition, in case you've forgotten...) These elements are enough for us to populate a phrase map using the following code excerpt:


/// <summary>
/// Map of phrases to media
/// </summary>
private JukeBoxPhraseMap _Map = new JukeBoxPhraseMap();

...

// Show the progress form.
using (MediaRescanForm frm = new MediaRescanForm())
{

frm.Map = _Map;
frm.Show();

// Start empty.
_Map.Clear();

// Loop over the media. Exit if stopped.
IWMPPlaylist playlist = wmp.mediaCollection.getAll();
for (int idx = 0; (idx < playlist.count) && (!frm.Stopped); idx++)
{

// Add the song to the map.
try
{

IWMPMedia3 media = playlist.get_Item(idx) as IWMPMedia3;
_Map.AddSong(media);

}
catch { }

}

// Loop over the playlists. Exit if stopped.
IWMPPlaylistArray playlists = wmp.playlistCollection.getAll();
for (int idx = 0; (idx < playlists.count) && (!frm.Stopped); idx++)
{

// Add the playlist to the map.
try
{

IWMPPlaylist list = playlists.Item(idx);
_Map.AddPlaylist(list);

}
catch { }

}

// Done.
frm.Close();

}


MediaRescanForm is a simple class which subscribes to the MediaScanned event of a JukeBoxPhraseMap and displays descriptors as they're scanned. The rest of this code should be obvious: it loops over songs and then playlists, adding them to the map.

So alllllll of this MPM is prolog, simply to get us a list of phrases and a map from the phrases to media descriptors. Now we want to turn those into commands in a grammar. This will be the point of Part 3.
Posted on Saturday, November 15, 2008 4:32 PM .NET , M-SAPI | Back to top


Comments on this post: Dee Jay, Part 2: MPM, and more MPM

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Martin L. Shoemaker | Powered by: GeeksWithBlogs.net