So in Part 4, I said that recognizing the music key would be tricky.
But why? Didn't I spend most of Part 3 explaining how cleverly I used M-SAPI so that users only had to say partial names to be recognized?
Well, yes; but I've long said that programming has a Conservation of Complexity law: the less complex for the users, the more complex for the programmers. (Be glad: that's the short version. My long discussion on Conservation of Complexity would take up the rest of this post.)
The reason why this flexibility leads to complexity is because one short phrase can match multiple long phrases. For instance, one album in my collection is
Forever Gold by B.B. King. It includes these songs:
2. How Blue Can You Get?
3. Every Day I Have the Blues
10. Catfish Blues
14. Other Night Blues
I also have some sample music provided with Windows Vista, including one track from Aaron Goldberg's
Worlds: OAM's Blues. From
Sports by Huey Lewis and the News, I have Honkytonk Blues. From
Jonathan Richman's self-titled album, I have Blue Moon. From
Celebrating the Best of Jazz by Louis Armstrong, there's St. Louis Blues and Black and Blue. From
Am I Cool or What? (yes, that's a Garfield CD — go ahead, laugh, but it has The Temptations, Patti LaBelle, Carl Anderson, Natalie Cole, The Pointer Sisters, Lou Rawls, Diane Schuur, Valerie Pinkston, Desiree Goyette, and B.B. King), there's Monday Morning Blues. From
True Blue by Madonna, there's True Blue. From
Cargo by Men at Work, there's Blue for You. From
All-Time Top 100 TV Themes, there's Hill Street Blues. From
Tropico, there's Outlaw Blues. From
another Forever Gold title with Ray Charles, there's Sentimental Blues. From my fellow
Duelist Geoff Nostrant (a.k.a.
Silvercord), there's blueshift. From
Who's Next by The Who, there's Behind Blue Eyes.
So if all I say to Dee Jay is "Dee Jay, Play Blue", Dee Jay will be really confused. Thirteen different songs have "Blue" in the title. Now that's my fault as the user; but we can't blame the users if we want happy users. We want to cope with what real users do, not just force them to do what we want.
So how do we make Dee Jay understand all these potential matches? As in Part 3, there's the obvious way and the lazy way. And once again, the lazy way (relying on Microsoft to solve the problem) is the smart way. When M-SAPI returns a
RecognizedPhrase (or the subclass,
RecognitionResult), it can include a list of equally good partial matches, called
Homophones. Now we could quibble about that term: in grammar, homophones are words which sound the same but have different meanings. Here, the homophone phrases likely don't sound alike at all; but the recognized words form part of each phrase. But ignoring the terminology, the concept is easy: every phrase in the Homophones list is just as good of a match as the top-level phrase.
So remember from Part 2 that Dee Jay is designed to select one or more songs or albums or artists (i.e., media descriptors) that match a given phrase. Well, now we want the media descriptors that match the phrase
and its Homophones. So the code for selecting all the matches looks something like this:
// Music commands may include a specifier.
string specifier = "";
if (e.Result.Semantics.ContainsKey(_Specifier))
{
SemanticValue valSpecifier = e.Result.Semantics[_Specifier];
if (valSpecifier.Confidence >= 0.8)
{
specifier = e.Result.Semantics[_Specifier].Value.ToString();
}
}
// Add the best match to the media phrase list.
List<RecognizedPhrase> testedPhrases = new List<RecognizedPhrase>();
List<MediaPhrase> phrases = new List<MediaPhrase>();
AddRecognizedMediaPhrase(command, e.Result, testedPhrases, phrases);
...
/// <summary>
/// Add a recognized phrase to a list of music phrases.
/// </summary>
/// <param name="command">The command being built.</param>
/// <param name="reco">The recognized phrase.</param>
/// <param name="testedPhrases">The phrases which have already been tested.</param>
/// <param name="phrases">The current list of music phrases.</param>
private void AddRecognizedMediaPhrase(string command,
RecognizedPhrase reco, List<RecognizedPhrase> testedPhrases, List<MediaPhrase> phrases)
{
// Avoid infinite recursion.
if (testedPhrases.Contains(reco))
{
return;
}
testedPhrases.Add(reco);
// Only confident items with music.
if ((reco.Confidence >= 0.8) && (reco.Semantics.ContainsKey(_MusicKey)))
{
// Only matching commands.
if ((reco.Semantics.ContainsKey(_Command)) && (reco.Semantics[_Command].Value.ToString() == command))
{
// Add the key. Don't duplicate.
string key = reco.Semantics[_MusicKey].Value.ToString();
if (!phrases.Contains(_Map[key]))
{
phrases.Add(_Map[key]);
}
}
}
// If we have homophones, add those, too.
if ((reco.Homophones.Count != null) && (reco.Homophones.Count > 0))
{
foreach (RecognizedPhrase phrase in reco.Homophones)
{
AddRecognizedMediaPhrase(command, reco, testedPhrases, phrases);
}
}
}
So now we have a richer list of possible matches, based on the top phrase and its Homophones. But we could potentially make it richer still. While any RecognizedPhrase can have Homophones, a RecognitionResult can also have
Alternates, a list of lower confidence matches, each possibly including Homophones. So I could conceivably add code like this:
// If we have alternates, add those, too.
if ((e.Result.Alternates != null) && (e.Result.Alternates.Count > 0))
{
foreach (RecognizedPhrase alt in e.Result.Alternates)
{
AddRecognizedMediaPhrase(command, alt, testedPhrases, phrases);
}
}
But so far, I'm not very happy with the results when I do that. I need to experiment with different Confidence thresholds, and maybe tolerance on individual SemanticValues (as discussed in Part 4), to see if there's a good way to filter out "good" alternates from "bad".
So now we have a great big list of possible media phrases that the user might have meant. How is Dee Jay to know which one is correct? Well, the same way any M-SAPI application should clarify user intentions: it's going to ask. But I have other commitments and some flaky hardware, so it will be a while before I can get to that.