Geeks With Blogs
Ulterior Motive Lounge UML Comics and more from Martin L. Shoemaker (The UML Guy),
Offering UML Instruction and Consulting for your projects and teams.
In Part 2, we dug a little bit into MPM (Media Player Magic) to build a JukeBoxPhraseMap, mapping phrases from the Media Player to songs, albums, and collections. Now we need to turn those phrases into M-SAPI commands.

In concept, we want a Choices object, which represents a choice between two or more alternate phrases. We could turn the whole map into one giant Choices, and we will; but that Choices would be pretty unusable. No user is going to remember and correctly speak some of the song titles in my Media Player library:


  • The "Jamestown" Homeward Bound
  • "Krankenmal" Theme
  • Adagio (from Toccata Adagio and Fugue in C major
  • After All [Love Theme from Chances Are]
  • Parece Mentira
Users will probably only remember parts of these names, so we need partial matching. There are two approaches to the partial matching problem: the obvious way, and the lazy way...

The obvious way is to decide that this is my problem, and I have to split every one of these phrases into its component pieces, and make those into phrases, and then combine those into larger phrases, and so on, and so on, and so on, and the phrase map gets incredibly cumbersome and pretty much impossible for me to ever manage.

The lazy way is to let Microsoft spend I-don't-know-how-many millions of dollars on speech recognition technology and programmability, and solve the problem for me. After all, how many problem domains include complex phrases which can be difficult for users to speak? No, scratch that: how many problem domains don't include complex phrases which can be difficult for users to speak? The answer is: not many interesting domains. So M-SAPI includes a built-in partial match capability in one of the GrammarBuilder constructors:


public GrammarBuilder (
string phrase,
SubsetMatchingMode subsetMatchingCriteria
)


The SubsetMatchingMode describes how the speech recognizer will recognize partial matches within the specified phrase. The options are:


  • OrderedSubset: Matches one or more words in the phrase if those words are spoken in the same order as in the phrase. "Same order" does not mean sequential, necessarily: the spoken phrase "dog cat" has the same order as "dog bird cat", even though there's a word missing in the middle.
  • OrderedSubsetContentRequired: Matches one or more words in the phrase if those words are spoken in the same order as in the phrase; but ignores simple articles and prepositions.
  • Subsequence: Matches one or more words in the phrase if those words form a subsequence in the target phrase. The spoken phrase "dog cat" is not a subsequence of "dog bird cat" because there's a word missing in the middle.
  • SubsequenceContentRequired: Matches one or more words in the phrase if those words form a subsequence in the target phrase; but ignores simple articles and prepositions.


So I used SubsequenceContentRequired to turn each phrase into a partial matching grammar; and then I composed those into a Choices:


// Build the music key grammar by looping over map phrases.
Choices chcPhrases = new Choices();
foreach (string phrase in _Map.Phrases)
{

GrammarBuilder gbPhrase = new GrammarBuilder(phrase, SubsetMatchingMode.SubsequenceContentRequired);
chcPhrases.Add(gbPhrase);

}


So now I have a Choices of music phrases, and the speech recognizer can recognize them. (Well, it will when I get to that code...) So when I say, "Dee Jay, play Has Been," all I have to do is pull the recognized text apart, find the music phrase, and look it up in the map. And once again, there are two ways to pull the recognized text apart: the obvious way (do it myself) or the lazy way (trust Microsoft to do it for me). Which one do you think I'm going to pick? (If you said "obvious", you don't know me very well...) M-SAPI includes the SemanticResultKey class, a class which allows you to attach a semantic tag to a GrammarBuilder so that the speech recognizer can parse the string for you. All you have to do is create a new SemanticResultKey and add it to a GrammarBuilder:


private const string _MusicKey = "MusicKey";

...

// Assign the semantic result to _MusicKey.
GrammarBuilder gbMusic = new GrammarBuilder(new SemanticResultKey(_MusicKey, chcPhrases));


This GrammarBuilder can now be used to build commands that will include phrases from the Media Player library. "Play" is one music command, but not the only one. So I combine these all into a Choices:


/// <summary>
/// The set of keyed commands.
/// </summary>
private string[] mKeyedCommands;

private const string _Play = "Play";
private const string _PlaySome = "Play Some";
private const string _PlayAny = "Play Any";
private const string _PlayAll = "Play All";
private const string _Add = "Add";
private const string _AddSome = "Add Some";
private const string _AddAny = "Add Any";
private const string _AddAll = "Add All";

private const string _Command = "Command";

...

mKeyedCommands = new string[] {_Play, _PlaySome, _PlayAny, _PlayAll, _Add, _AddSome, _AddAny, _AddAll};

...

// Build the keyed command grammar by appending music key
// to each command.
Choices chcKeyedCommands = new Choices();
foreach (string cmd in mKeyedCommands)
{

GrammarBuilder gbKeyed = new GrammarBuilder(new SemanticResultKey(_Command, cmd));
gbKeyed.Append(gbMusic);
chcKeyedCommands.Add(gbKeyed);

}


Note how I again used a SemanticResultKey to identify each of the phrases in the Choices as a command. Then note how after each command, I appended the gbKeyed GrammarBuilder. So "Play" is a Command, and "Has Been" is a MusicKey.

I also defined a number of commands that don't require a MusicKey:


/// <summary>
/// The set of unkeyed commands.
/// </summary>
private string[] mUnkeyedCommands;

...

private const string _Pause = "Pause";
private const string _Resume = "Resume";
private const string _Skip = "Next";
private const string _Back = "Back";
private const string _5Stars = "5 Stars";
private const string _4Stars = "4 Stars";
private const string _3Stars = "3 Stars";
private const string _2Stars = "2 Stars";
private const string _1Star = "1 Star";
private const string _Louder = "Louder";
private const string _Softer = "Softer";
private const string _Shh = "Hush";
private const string _Shout = "Shout";
private const string _About = "About";
private const string _Exit = "Exit";
private const string _Hello = "Hello";
private const string _Rescan = "Rescan";
private const string _WhatsPlaying = "What's playing?";
private const string _ResetName = "Reset Name";
private const string _WhatCanISay = "What can I say?";
private const string _Help = "Help";

...

mUnkeyedCommands = new string[] {_Pause, _Resume, _Skip, _Back, _5Stars, _4Stars, _3Stars, _2Stars, _1Star, _1Star, _Louder, _Softer, _Shh, _Shout, _WhatCanISay, _Help, _About, _Exit, _Hello, _Rescan, _ResetName, _WhatsPlaying};

// Build the unkeyed command grammar.
Choices chcUnkeyedCommands = new Choices();
foreach (string cmd in mUnkeyedCommands)
{

GrammarBuilder gbUnkeyed = new GrammarBuilder(new SemanticResultKey(_Command, cmd));
chcUnkeyedCommands.Add(gbUnkeyed);

}


I also wanted a command to let the user rename Dee Jay. Users love personalization, and this is an obvious one. So that required a special command, because I couldn't include a list of all possible names. Instead, I need a dictation, an element that matches any spoken phrase:


// Build the rename grammar. Set Command to the rename command,
// and Name to the dictation contents.
GrammarBuilder gbRenameRoot = new GrammarBuilder(_Rename);
GrammarBuilder gbDictation = new GrammarBuilder();
gbDictation.AppendDictation();
GrammarBuilder gbName = new GrammarBuilder(new SemanticResultKey(_Name, gbDictation));
GrammarBuilder gbRename = new GrammarBuilder(new SemanticResultKey(_Command, gbRenameRoot));
gbRename.Append(gbName);


The AppendDictation method adds a dictation to a GrammarBuilder. Note again how I used SemanticResultKeys to identify the elements of the command.

So now I have three kinds of commands: keyed, unkeyed, and rename. I want to combine these into a single element, so that I can precede them with the current name:


// Build the commands.
Choices chcCommands = new Choices(chcKeyedCommands, chcUnkeyedCommands, gbRename);

// Build the DJ name.
GrammarBuilder gbDJNameOnly = new GrammarBuilder(new SemanticResultKey(_DJ, mDeeJayName));
GrammarBuilder gbDJ = new GrammarBuilder(gbDJNameOnly,1,1);
gbDJ.Append(chcCommands);


Finally, I need one special command: "Reset Name". Unlike the other commands, this one shouldn't require the Dee Jay name, because the user might have forgotten it. So this one stands alone:


// Build the nameless commands.
GrammarBuilder gbResetName = new GrammarBuilder(new SemanticResultKey(_Command, _ResetName));


And now, finally, we can build a Grammar from all of these GrammarBuilders:


/// <summary>
/// The current grammar.
/// </summary>
private Grammar mGrammar;

...

// Build the top-level grammar.
GrammarBuilder gbTop = new GrammarBuilder(new Choices(gbResetName, gbDJ));
mGrammar = new Grammar(gbTop);


So now we have a Grammar that represents commands we can speak to Dee Jay. In the next part, we'll start to listen for and recognize those commands.
Posted on Saturday, November 15, 2008 4:35 PM .NET , M-SAPI | Back to top


Comments on this post: Dee Jay, Part 3: Building a Media Player Grammar

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Martin L. Shoemaker | Powered by: GeeksWithBlogs.net