Background:
Out of first few tasks that were assigned to me in my new project, I was asked to help testing team with promoting an existing partially tested BizTalk interface. After little bit of code re-factoring and ensuring that the design has been simplified (removed some unwanted logging etc.), we put code to the test. Functionality wise there were not many issues which surprised us and we quickly got over it. However, it wasn’t for long that we realized that there was some obvious issue hidden somewhere beneath the layers of the code, which ultimately got highlighted during our stress testing of the interface.
Interface Scenario:
Customers make a call into BizTalk - HTTP receive-location hosted under IIS (normal setup). This in turn instantiate an orchestration that makes a call into a backend ERP application (slow in nature) using custom-written socket code. In case of positive response from ERP application it simply maps a result back on to the HTTP response body. In case of an exception; orchestration tries to fetch a custom response configured in one of the custom SQL Server tables and maps this to the body of HTTP response.
(I believe that SQL adapter was not used as the .net code to manage configuration held in SQL Server was presented as a library code and was been used from various other locations of the solution/s)
Crunch Time:
All was going reasonably well while we stressed the interface with the gush of inbound HTTP requests though we felt that it was slightly below our expectations. It was expected performance though. Then we attempted to perform a little bit of negative testing. One of the guys (always happy chap) here, kindly obliged by switching ERP into the mode that started to refuse any request. We thought that this should reduce overall latency time greatly as ERP processing time has simply been discarded out of the equation. To our surprise, interface slowed down heavily, BizTalk started to queue up messages, overall thread usage shot up, spool table started growing (and was very resistant to SQL purge jobs) etc etc..
All this behavior at first came as a shock to us and drove us mad for the good few minutes. I started looking for bottleneck/s. Surprisingly, BizTalk was not throttling, which indicated to me that thresholds haven’t been broken yet. So what the hell was slowing interface down? Well, it soon became obvious when I looked into the HAT and noticed that the call to the .Net code that pulls custom response message back from the custom table was taking horribly long (in the region of 15-20 seconds). That was the first time I looked into that part of the solution and found that the way configuration was interrogated was by using inline “Select” statements which were fired against the SQL Server. No need to say that this style of coding has a major affect on the performance. Anyways, I quickly modified the code by pushing the logic into the stored procedure and do another round of stress testing. The performance gains were astonishing. We gained 80% in performance and all quickly fell in place.
The bit that surprised me was that this is a well known fact in the industry and still we sometimes manage to overlook it and end up paying performance penalties. Good that we picked this up in our test rounds; otherwise on production it could have been really difficult (with lot more and types of interfaces) to isolate the performance bottlenecks.
Therefore, the moral of a story is “Never-ever use inline SQL statements in your .Net code or for that matter any front end production ready application code”.
Ps: One of the reasons that I wrote this blog is; on numerous occasions I have come across situations when people bash BizTalk for all sorts of issues without really understanding the underline design and real bottlenecks in the solutions.
Below are some numbers that backs up the story above:
HTTP-Sync message processing benchmarking
|
|
|
|
|
Inputs
|
|
|
|
|
|
|
|
|
Number of threads
|
|
150
|
|
Submission batch size
|
|
150
|
|
Sleep interval between each submission (ms)
|
|
1
|
|
Total num of messages
|
|
600
|
|
|
|
|
Result
|
|
|
|
|
|
|
|
|
Old Code (With inline SQL execution)
|
|
|
|
|
|
|
RA1
|
Start Time
|
|
25/11/2009 13:07
|
RA2
|
Stop Time
|
|
25/11/2009 13:14:28
|
RA3
|
Total time taken (seconds)
|
|
430.9
|
RA4
|
Number of messages processed /sec
|
|
1.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
New Code (Using stored procedure)
|
|
|
|
|
|
|
RB1
|
Start Time
|
|
25/11/2009 13:19
|
RB2
|
Stop Time
|
|
25/11/2009 13:21
|
RB3
|
Total time taken (seconds)
|
|
80.85
|
RB4
|
Number of messages processed /sec
|
|
7.42
|
|
|
|
|
|
|
|
|
|
Performance gain (secs)
|
|
350.05
|
|
Performance gain (%)
|
|
81.23694593
|