Geeks With Blogs
Chris Falter .NET Design and Best Practices

We have a web service that serves as a gateway to a remote service provider.  Part of the design is a log of all interactions between our service and the remote service.  We keep the log in a SQL Server table with this definition:

CREATE TABLE [dbo].[Request](
    [RequestId] [int] IDENTITY(1,1) NOT NULL,
    [TxCode] [char](2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [ResponseText] [varchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [ResponseTime] [datetime] NULL,
    [CompanyId] [varchar](25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
 CONSTRAINT [PK_Request] PRIMARY KEY CLUSTERED
(
    [RequestId] ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

The TxCode field identifies what type of transaction is taking place.  If the transaction is an AddressServer report, the TxCode will be 'AV'; if the transaction is a Location report, the TxCode will be 'L'; and so forth. The ResponseText field holds the XML document provided by the remote service (if any).  The CompanyId field holds a unique identifier within our systems, usually a quote number or a policy number.

The XML in the AddressServer reports identifies the algorithm used for address matching and the algorithm used for "geocoding" (determining the latitude and longitude of an address).  Recently I had to determine the frequency of the various AddressServer algorithms reported within a certain time period.  My first inclination was to write a query that operated on every AddressServer report within the desired range, like so:

Select count(*) from dbo.[Request]
where requestid between 1000000 and 1100000
and txcode = 'AV'
and (charindex('MatchLevel_StreetInCity', ResponseText) > 0)
and (charindex('GeocodeMatch="Postal"', ResponseText) > 0)

However, I realized that this query did not operate over the correct set of records, because for each unique CompanyId, our gateway might order several AddressServer reports--and the only report we were interested in was the final one ordered. This situation might occur, for example, when a user orders a report for a particular address, discovers that the address contains a typo, then corrects the address and orders another report.  To discover which algorithm the service used for valid addresses, I had to find a way to exclude the reports for invalid addresses.  So I needed to restructure the query so it would not operate on all the 'AV' reports, but instead only on the final report for a particular CompanyId (quote or policy).

I was able to come up with a clever query, but before I share it with you, I want to give my readers the opportunity to figure out the correct query for themselves.  If you want to show the world you're a SQL guru, leave a comment with a query that operates on the correct set of records.  For those who do not have the time or inclination to work on this puzzler,  I will share my own query in a comment to this post on Monday afternoon, December 17, 2007.

EDIT: 17 December, 2007  - Since no one wanted to claim SQL guru status by leaving a comment, I will provide my answer in the post, where I can apply helpful formatting.

The fundamental task is this: for each CompanyId you have to find the record (with txCode 'AV') that is most recent.  Since the RequestId field is auto-incremented, the most recent record is also the one with the greatest RequestId.  Thus we can reduce our set of records by restricting it to those with the Max(RequestId) for any particular CompanyId:

First draft of T-SQL solution:

Select count(*)
 from dbo.[Request]
 where requestid in
   (select max(requestId) from dbo.[Request]
    where requestid between 1000000 and 1100000
    and txcode = 'AV'
    group by companyId)
 and (charindex('MatchLevel_StreetInCity', ResponseText) > 0)
 and (charindex('GeocodeMatch="Postal"', ResponseText) > 0)

Notice that once we have sub-selected the correct request IDs based on TxCode and requestId range, we can remove those conditions from the (outer) select's predicate. 

However, we still have a problem: what if the last 'AV' transaction for a particular companyId is not within the range?  For example, the rquestId for the first 'AV' record for a companyId might be 1099999, and the final one might be 1100001 (which is not between 100000 and 1100000).  In this condition, the only request associated with a valid address lies outside our range, so all the requests within the desired range are invalid and should be excluded.  How do we deal with this problem?  We simply have to add another condition:

Final T-SQL solution:

Select count(*)
 from dbo.[Request]
 where requestid in
   (select max(requestId) from dbo.[Request]
    where requestid between 1000000 and 1100000
    and txcode = 'AV'
    group by companyId)
and companyid not in (select companyId from dbo.[Request] where txcode = 'AV' and requestid > 1100000)
 and (charindex('MatchLevel_StreetInCity', ResponseText) > 0)
 and (charindex('GeocodeMatch="Postal"', ResponseText) > 0)

Now we have excluded any 'AV' records which are not truly the final one for a particular companyId.  The task is finished! 

But why does it have to be so hard? :)  Comments welcome!

Posted on Saturday, December 15, 2007 5:20 PM Database Considerations | Back to top


Comments on this post: A T-SQL Stumper

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Chris Falter | Powered by: GeeksWithBlogs.net