Those of you who subscribe to the various microsoft.public.biztalk.* newsgroups may have seen a number of posts from myself re. convoy processing, and the perils of correlation set subscriptions. The end result of all of this has been quite dramatic, in terms of our understanding of convoys, and has resulted in us removing them from our design. This is obviously quite a serious step, and so I thought it might be worth passing on some of our recent experience.
First, a quick recap on the messaging architecture. The pub-sub model is described pretty well here, amongst others, so I won't repeat it; suffice to say that messages are delivered to the messagebox through receive locations, and then matched to subscriptions using a combination of message type (as defined by the schema), and any applied filters. Subscribers include send ports (for content-based routing), orchestrations, and orchestration instances (in the case of correlation, and therefore convoys.)
If a subscription exists, then any matching message that arrives in the messagebox will be consumed.
So - how are subscriptions created, and when? The easiest way to check this is to use the BTSSubscriptionViewer utility, found in the SDK\Utilities directory. Using this you can see that in the case of send ports and orchestrations, the subscriptions exist all the time that the artefact is enlisted. For correlation sets things are a little more complex, and this is where it started to unravel for us.
Correlation subscriptions are created when the correlation set is initialized - through a receive or send shape within an orchestration. In a common sequential convoy scenario, the set is intialized, and then a Listen shape is used to pick up the correlated messages (see Alan Smith's sample here.) Without thinking about it, it's easy to assume that before the listen shape is reached, messages will be discarded, regardless of correlation matches, however this is not so! Once the subscription is created, messages will be consumed by the orchestration regardless of the listen shape.
There are two scenarios that we have come across where this causes problems:
1. When theres is a delay of some sort after the initialisation, during which new messages might reasonably be expected to be discarded, and not consumed.
2. When receiving correlated messages at a faster rate than the orchestration is capable of processing them.
We hit the first scenario when using an orchestration to manage a publication schedule. We were receiving an initial message, sleeping until a given date, then sending the first message on, and entering a Listen-Loop, which was being used to hoover up updates to the initial message. We found that updates received during in the initial delay were being published rather than discarded. There is a work around for this, and that is to use a fake Send shape just before the Listen-Loop to initialise the correlation at the last minute. It's ugly, but it works.
The second scenario is much more serious, and has proved a show-stopper for us. Consider the situation where an orchestration is not only using a convoy to batch up messages, but processing the messages as well. As in Alan's sample, the batch limits ("completeness conditions") are set by one of two parameters - a batch size, and a timeout value. This means that either the number of messages processed reaches a set limit, at which point the batch is delivered, and the orchestration dies, or there is a sufficiently long delay in between individual incoming messages for the orchestration to deliver the batch as it currently stands and then die. (e.g. If the convoy picks up 10 messages, output them, else if the convoy picks up 5 messages, then sits for 10 minutes waiting for the next message, output the batch of 5 only.)
In our test orchestration we had the following setup:
- A receive location delivering messages at a rate of 2/second.
- An orchestration picking up the messages in a convoy, and processing each message.
- The processing of each message takes 2 seconds (simulated using a delay.)
- A batch limit of 10 messages, output to a flat file, after which the orchestration dies.
We then sent 100 messages in to the receive location, expecting to see 10 flat files appear.
What we actually saw was 3 files appearing, with no sign of the missing messages. The explanation appears to be as follows:
The correlation set is initialised when the first message is received, at which point a subscription is created for all further messages (all 100 messages matched the correlation.)
The orchestration takes 20 seconds to process the ten messages it requires. However, as the messages are being received at a rate of 2/sec, 40 messages have been delivered to the messagebox in this time, and all of them match the subscription of the correlation set. They are therefore consumed, and not discarded (yet). Neither is a new orchestration instance created. This means that a net 30 messages are consumed, but NOT processed. At the end of the 20 seconds, the orchestration dies, and the outstanding 30 messages discarded.
If you then look at the services report in HAT, you should see that the orchestration is marked as "Completed with discarded messages". The missing messages should be visible in the messages report, again with the status "Suspended", "Completed with discarded messages". You could, of course, save these messages and manually resubmit them, but obviously in a production environment this is not an option.
The lesson from all of this seems to be that you should always think of convoys in light of the subscriptions that they use to consume messages, and understand when these subscriptions are created, and what might happen to the messages that fall in between the gaps.
Caveat convoy, as they say.
UPDATE: see this for a very informative posting on the background to this problem.