Toggle menu
862
3.8K
30.2K
279.1K
Catglobe Wiki
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Cattaskv2009 Communication: Difference between revisions

From Catglobe Wiki
No edit summary
Wikicatglobe (talk | contribs)
No edit summary
 
(139 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Communication in Cattask v2009 =
<accesscontrol>Main:MyGroup</accesscontrol>
[[Category:Miscellaneous]]
= Communication in Cattask v2009 =


== <br>What kind of communication do we need?<br> ==
The new CatTask model which was discussed in&nbsp;[[Cattaskv2009 overview of the new system|Cattaskv2009_overview_of_the_new_system]]&nbsp;shows that there will be a lot of communication among the 3 CatTask instances. Besides, experience in working with the current CatTaskService tells me that this is the most error-prone part in production environment. Those can explain why we have spent so much attention on building a good communication component.


In the real production environment, because of the use of network balancing, one CatGlobe site is deployed in three separate servers. Besides, we decided that there will be one "Cattask" for one deployed instance of a site. The running production environment should look like:
[[Image:Cattask deployment-overview.JPG]]&nbsp;


[[Image:Cattask deployment-overview.JPG]]<br>
= So what communication technology should we use?  =
 
We have investigated 3 communication techniques so far: remoting, WCF and MSMQ.&nbsp;Besides, we found some interesting tricks.&nbsp;You can find the whole story&nbsp;in [[Remoting,WCF and MSMQ for CatTask|Remoting,WCF and MSMQ&nbsp;for CatTask]].
 
At the moment, we are designing the module using MSMQ with the help of Rhino Service Bus.
 
= Rhino Service Bus  =
 
[http://ayende.com/Blog/archive/2008/12/17/rhino-service-bus.aspx Rhino Service Bus] (RSB) is an [http://en.wikipedia.org/wiki/Enterprise_service_bus ESB] which is built on the top of MSMQ. Since the bus behaviours are mainly specified by its configuration file, we'd better look at the configuration to learn how the bus works:
 
<source lang="XML">
<facility id="rhino.esb" >
<bus threadCount="1" numberOfRetries="5" endpoint="msmq://localhost/ownqueue" />
<messages>
<add name="CatGlobe.Messages.WebShop" endpoint="msmq://web/WebShop"/>
<add name="CatGlobe.Messages.CatTask" endpoint="msmq://catmaxb/CatTask"/>
</messages>
</facility>
</source>&nbsp;


<br>
<br>


The problem is that the three cattasks won't run independently. Instead, they must contact with each others to share information about scheduled tasks, tasks execution...  
[[Image:CatTask A simple bus.JPG]]&nbsp;
 
In short, a Rhino service bus:
 
*In the &lt;bus&gt; element we can see an end point. It is the queue which the bus monitors for incoming messages. When messages come, the bus receive messages from the queue and invoke the appropriate consumers&nbsp;to process them. For example: in the image below, we have a consumer called CatGlobeMessageController which implements the IConsumerOf&lt;HelloCatGlobe&gt; interface. When messages of the type come, the bus invokes CatGlobeMessageController to process them.
 
[[Image:CatTask A simple consumer.JPG]]
 
*Can send messages to other queues (of course!!!). The point here is that it has two Send APIs:
 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; - Send with an explicitly specified end point (queue).  


== So what communication technology should we use?  ==
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- Send without a specified endpoint. We need to specify the queues (message owners) of the message type. Notice the &lt;messages&gt; section in the configuration block above: it says that all the messages of&nbsp;types which are defined in the&nbsp;CatGlobe.Messages.WebShop namespace will be&nbsp;sent to the&nbsp;"msmq://web/WebShop" end point. So is the second setting for CatTask.


We have investigated 3 communication techniques so far: remoting, WCF and MSMQ. You can find the whole story here [[Remoting,WCF and MSMQ&nbsp;for CatTask]]
*Publish/Notify: another feature of RSB is the ability to publish/notify messages to all the buses who are interested in. In order to receive published messages, a consumer must subscribe&nbsp;itself to the&nbsp;producer bus. For example,&nbsp;a bus can subscribe to the CatTask bus that it is interested in&nbsp;messages of&nbsp;the type CatGlobe.Messages.CatTask.TaskCompleted. After the subcription is done, whenever the&nbsp;CatTask bus&nbsp;publishes&nbsp;a&nbsp;TaskCompleted message, one will be sent to the subscriber bus.


=== Remoting and WCF ===
== Should we use RSB? ==


Remoting and WCF are rather the same in term of how we send data from one CatTask to the another: CatTask01, for example, sends data *directly* to CatTask02 using known configured port. However,&nbsp;WCF is newer and easier to use than remoting.
At the moment, my answer is YES. Let's consider pros&nbsp;and cons of it:


- Pros:  
- Pros:  


&nbsp;&nbsp;&nbsp; + We can sure that they meet our requirements, i.e. they can be used to develope the new CatTask.  
*It was made and is being contributed by many good developers, well unittested.
*It solves many issues which I&nbsp;ran into when I tried to write my own MSMQ code. Well, I'm not a giant, but I can stand on the shoulder of giants.
*The built-in logger is very good. It can help us figure out any problem easily.
 
- Cons:
 
*RSB&nbsp;is used in the distributed contexts where there may be&nbsp;a delay time between when a message is sent and when it is received. The problem will be raised in the next section.
*The help file is not good. Yeah, as I just said, we can stand on the shoulder of giants, but we have to start from the ground and&nbsp;there is no stairs for us to climb to the shoulder! (On the contrary, with Microsoft framework, we often have more than one stairs to use, but some of them have a gap in the middle and some others lead you to a dead-end!!!)
 
= Buses design for CatTask  =
 
*We are using the option #3 for Controller.
*The real implementation may vary a little bit: it is possible to use one queue for both LD and Controller. We will decide&nbsp;it later.&nbsp;In this design, we will use one queue for each. In my opinion, it may help us understand the system more easily.
 
== Buses diagram  ==
 
- A CatTask instance has two buses:
 
*One, which is LD_Bus in the&nbsp;image below,&nbsp;for the local part of it: LD and Worker.
*The another is Controller_Bus which is used for&nbsp;the Controller.
 
[[Image:CatTask Bus design overview.JPG]]&nbsp;
 
== Message namespaces  ==
 
(Not finish yet)
 
At&nbsp;the lowest level, all messages which are sent to MSMQ are of type of System.Messaging.Message. However, at the RSB level, we have the ability of typed messages. For example, the producer may send a TaskInstanceInfo message, and the consumer will receive the exact TaskInstanceInfo object.
 
*CatTask.Messages.LocalDispatcher
*CatTask.Messages.Worker
*CatTask.Messages.Controller
 
== Buses configuration  ==
 
There will be images + configuration for buses here
 
= Behaviours  =
 
Term definition:
 
*TaskMessage: messages which are related to a task, such as scheduling, report status... messages
*ControllerMessage: messages whose the main actor is the Controller, such as Controller announcement, Controller election, query for the active Controller messages...
 
 
 
Local Dispatcher, Worker, Controller: there may be a bit confusion about the usages of these terms
 
*From the communication point of view: Local Dispatcher and Worker are the same END&nbsp;POINT.
*From the functional point of view: they are two difference modules.
 
 
 
In this section, you will find how the CatTask's sub-modules should behave. There is some interesting&nbsp;things here:
 
*The use of usual language makes&nbsp;them understandable to everyone.
*They are unittest cases!!! One behaviour must have at least one test case. For example, with the behaviour below
<blockquote>When an inactive Controller receives an AreYouController message </blockquote><blockquote>It will reply an ControllerReport{IsActiveController = No} message to the sender &nbsp; </blockquote>
Unittest: <source lang="csharp">
[Test]
 
public void When_An_Inactive_Controller_Receives_An_AreYouController_Message_It_Should_Reply_No()
 
{
 
}
</source>
 
<br>Yeah, after we finish designing these behaviours and the interfaces for relevant classes, we can start writing unittests '''''before'''''writing actual code.
 
*They are also very close to the real implementation
 
&nbsp;<source lang="csharp">
 
public void Consume(AreYouController message)
 
{
 
If (!this.isActiveController)
 
{
 
this.bus.Reply(new ControllerReport {IsActiveController = false, CorrelationId = message.Id});
 
return;
 
}
 
// else, do it here
 
}
</source>
 
<br>
 
== Messages look up table  ==
 
*The fact that Rhino Service Bus' messages are strongly typed is great. I&nbsp;have a message type; I make a producer and a consumer for it. The bus will take care of get them to work together.
*However, on one hand, having one message type for each communication will&nbsp;end&nbsp;up at&nbsp;an explosion of types. Each type needs to be implemented&nbsp;by a consumer. For example:
 
<source lang="csharp">
public class Worker : IConsumerOf<CanYouExecuteTask>, IConsumerOf<PleaseExecuteTask>, IConsumerOf<TaskExecutionStatus>
</source>
 
that's why I'd like to keep message types at an acceptable amount by grouping similar message types&nbsp;to one with a property which can be used to distinguish them.
 
*On the other hand, I want to&nbsp;make the&nbsp;behaviours&nbsp;descriptions below&nbsp;as much understandable as possible. Having multi-meaning messages will be a backward step.
*Thus, in the behaviours descriptions below I will use single meaning messages. Some of them are used for the ease of understanding only. The mapping between them and the actual message types can be found here [[CatTask v2009 Message types mapping table|Message types mapping table]].
 
== Tasks scheduling (Use case 1)  ==
 
=== Main path  ===
<blockquote>When a Task is scheduled</blockquote><blockquote>The LD will send a '''''TaskInstanceInfo'''''{TaskInstanceId, DeliveryStatus = Unknown}&nbsp;message to the Controller and also put the message to local storage to keep trace of it. </blockquote>
----
<blockquote>When an active Controller receives a '''TaskMessage'''</blockquote><blockquote>It will reply a '''TaskReceived '''message to the sender.</blockquote>
----
<blockquote>When an LD receives a&nbsp;'''TaskReceived''' reply message</blockquote><blockquote>It will&nbsp;delegate the work to the [[CatTask v2009 ProcessControllerReport|ProcessControllerReport]] function to mark the stored-to-keep-trace-of message's&nbsp;status&nbsp;to DELIVERED</blockquote>
=== Exceptional path  ===
<blockquote>When an inactive Controller receives a '''TaskMessage''<br>'''''<i>It will reply a '''Error'''_'</i>'''IAmNotTheActiveController '''message to the sender.</blockquote>
----
<blockquote>When an LD receives an '''Error_IAmNotTheActiveController '''message</blockquote><blockquote>It will move the source message to the '''delaySendingList '''and asks for the&nbsp;new active Controller ('''ProcessControllerReport''')</blockquote>
== Task Execution (Use case 2)  ==
 
=== Main path  ===
<blockquote>When the start time of a task comes</blockquote><blockquote>The Controller will broadcast a '''CanYouExecuteTask '''message to all Workers.</blockquote>
----
<blockquote>When a&nbsp;Worker receives a '''CanYouExecuteTask '''message and it can execute the task<br>It will reply a '''CanExecuteTaskResult'''{'''Yes'''} message </blockquote>
----
<blockquote>When the active Controller receives a '''CanExecuteTaskResult'''{'''Yes'''} message from an LD</blockquote><blockquote>It will send a '''PleaseExecuteTask '''message</blockquote>
----
<blockquote>When&nbsp;a Worker receives a '''PleaseExecuteTask '''message</blockquote><blockquote>It will reply a '''TaskExecutionStatus'''{'''AboutToStart'''} message and start the task.</blockquote>
----
<blockquote>When&nbsp;a Worker receives a '''TaskExecutionStatusRequest '''message</blockquote><blockquote>It will reply a '''TaskExecutionStatus'''{CurrentStatus, CurrentProgression} message </blockquote>
----
<blockquote>When the Controller receives a '''TaskExecutionStatus'''</blockquote><blockquote>&nbsp;It will update its data about the task.</blockquote>
----
<blockquote>When a SystemTask is finished</blockquote><blockquote>The Controller will send a '''TaskExecutionResult''' message back (reply) to the sender</blockquote>
 
=== Alternative path<br> ===
 
*If the same task is currently running:&nbsp;what "the same" means?
*Determine if the task depends on currently scheduled or running tasks:&nbsp;perhaps the Controller doesn't need to ask all Workers about this. It can determine by itself.
*
 
=== Exceptional path  ===
<blockquote>When a&nbsp;Worker receives a '''CanYouExecuteTask '''message and it '''cannot''' execute the task<br>It will reply a '''CanExecuteTaskResult'''{'''No'''} message </blockquote>
----
<blockquote>When all the '''CanExecuteTaskResult '''messages the active Controller receives are '''NO'''</blockquote><blockquote>It will pickup one LD randomly</blockquote><blockquote></blockquote>
 
== CatTask&nbsp;starts up&nbsp;(Use case 4)  ==
 
In this section, the verb phrase "subscribe to the Controller" means "subscribe to the Controller for message types of which it is the consumer". As a result, when the Controller&nbsp;publishes messages of those types, one is sent to the subscriber.
 
----
<blockquote>When an LD starts</blockquote><blockquote>&nbsp;It will subcribe to all Controllers</blockquote>
----
<blockquote>When an LD receives the IAmTheNewController announcement from&nbsp;the a Controller</blockquote><blockquote>It will subscribe to the Controller </blockquote>
----
<blockquote>When a Controller starts</blockquote><blockquote>It will send a PleaseElectANewActiveController message to all Controllers </blockquote>
 
== Cache Invalidation<br> ==
<blockquote>When a CacheInvalidation system task is scheduled</blockquote><blockquote>&nbsp;The LD will send a CacheInvalidation message to the active Controller </blockquote>
----
<blockquote>When the active Controller receives a CacheInvalidation message</blockquote><blockquote>It will broadcast a RemoveCacheItem message to all Workers </blockquote>
----
<blockquote>When a Worker receives a RemoveCacheItem message</blockquote><blockquote>It will remove the relevant cache item from the Cache. </blockquote>
== Looking&nbsp;for Controller's endpoint (Use case 4) ==
<blockquote>When an LD wants to know where the active Controller is</blockquote><blockquote>It will broadcast an&nbsp;AreYouController message to all Controllers</blockquote>
----
<blockquote><span>When an LD has a message to send but it doesn't know where the active Controller is</span></blockquote><blockquote><span>It will put the message to the delaySendList</span></blockquote>
----
<blockquote>&lt;span /&gt;<span id="fck_dom_range_temp_1238065355531_300">When an LD&nbsp;updates the end point of the active Controller</span></blockquote><blockquote><span>It will send all messages in the delaySendList.</span></blockquote>
----
<blockquote>When&nbsp;the active&nbsp;Controller receives an AreYouController message</blockquote><blockquote>It will reply an&nbsp;IAmTheActiveController message&nbsp;to the sender </blockquote>
----
<blockquote>When an inactive Controller receives an AreYouController message</blockquote><blockquote>It will reply an IAmNotTheActiveController message to the sender </blockquote>
 
== Controller election (Use case 4, 5)<br> ==
 
Presuming that we have a place where a Controller can come to ask if it can be an active one. Obviously that there must be one and&nbsp;only one Controller gets a YES answer. An election is held when:
 
#The Controllers receive&nbsp;a '''PleaseElectANewActiveController''' message from an LD.
#The Controllers receive a PleaseElectANewActiveController message from a newly started Controller.
 
----
<blockquote>When an Active&nbsp;Controller figures out that it should still be the Controller</blockquote><blockquote>It will send an '''IAmTheNewController''' message to all LDs and Controllers </blockquote>
----
<blockquote>When&nbsp;an inactive Controller figures out that it&nbsp;has no chance to be a Controller at the moment </blockquote><blockquote>Do nothing/waiting for confirmation from the Active one/send a message to the active one to confirm? </blockquote>
----
<blockquote></blockquote><blockquote>
When a Controller becomes the Active Controller
 
It will publish a '''IAmTheNewController '''message to everyone
</blockquote>
== Message lifetime&nbsp;  ==
 
In the context of CatTask, message lifetime is a big issue. By default, a message stays in a queue as long as no one picks it up. Let's look at a example:
 
-&nbsp;At 7h00 AM: a local dispatcher wakes up and wants to know where the active controller&nbsp;is. It will broadcast an AreYouTheActiveController to all controllers. And if it receive no positive reply, it will broadcast a PleaseElectANewController message to all controllers' queues.
 
- We can assume that the #3 site (thus, controller and dispatcher) is not running at the moment. Therefore, all the messages which are sent to it are not picked up in time.
 
- At 8h00 AM: the&nbsp;#3 is started and process the messages in queue. Oops, PleaseElectANewController message is 1 hour old and it is probably that one of the other two controllers has been taken the active role.
 
We have two choices:
 
#Messages should have a receive timeout. If the timeout time passes, they should be removed from queue. MSMQ does support this, but Rhino Service Bus doesn't.
#Add a Send-date property to messages so that the system (CatTask) can check if they are out-of-date and ignores them. We should take into account the fact that there may be a little bit difference between clocks (current time) of the servers.


- Cons:<br>
&nbsp;


&nbsp;&nbsp;&nbsp; + Writing stable cross domain code is not an easy job. We had some annoying and strange remoting bugs&nbsp;with the current CatTaskService.
== Sequence diagrams  ==


=== MSMQ  ===
*This diagram illustrates the flow when a task is scheduled:


Although we can use WCF to send MSMQ messages, I don't group them to one category because the criteria here is the directness of communication.  
[[Image:Cattaskv2009-Dispatcher sends tasks.JPG]]&nbsp;


With MSMQ, there is no direct communication between the two CatTasks. One will send data (messages) to a queue, the others will monitor the queue, receive messages and process them.
*Controller receives a task from the queue:


- Pros:  
[[Image:Cattaskv2009-Controller receives message.JPG]]&nbsp;
 
*Controller sends task to a worker:
 
[[Image:Cattaskv2009-Controller send task.JPG]]&nbsp;
 
*LocalDispatcher receives messages from&nbsp;its queue: it may be a report of the controller or a result of executing a SystemTask:
 
[[Image:Cattaskv2009-LocalDispatcher receives message.JPG]]&nbsp;
 
*Worker receives a message from its queue:
 
[[Image:Cattaskv2009-worker receives message.JPG]]&nbsp;
 
== Place holder  ==
 
== Place holder  ==


&nbsp;&nbsp;&nbsp; + With MSQM, the constraint between the sender and the receiver applications are very loose (loose-coupling). Both the sender and the receiver work with the queue; one doesn't need to care about the other.
== Place holder  ==


&nbsp;&nbsp; &nbsp;+ Writing unittests, including&nbsp;isolation test and integration test,&nbsp;is easier (than remoting or WCF).
== Place holder  ==


- Cons:
= Active controller election  =


&nbsp;&nbsp;&nbsp; + We don't have experience on using MSMQ. Whether it can be used for CatTask or not?  
When is a controller not available?  


&nbsp;&nbsp;&nbsp; + MSMQ is designed for distributed systems, where the term which you usually hear when a message is sent is "Fire and Forget". Checking if a message is delivered (comes to a *remote* destination queue) is a pain.  
*The server/IIS is down.  
*The site is stopped to be disted or upgraded.


=== What should we choose?  ===
<br>


A brief history&nbsp;:D:
When is a controller not '''running'''?


1. At first, I tried to use MSMQ. Everything seemed to be fine until I found out that if I&nbsp;send a message to a non-existing *remote* queue, I will get no exception (BTW, you will it in the case of a local queue).&nbsp;MSDN&nbsp;document of the Send method doesn't tell me about this!!!&nbsp;There is a way to check for such undelivered messages: monitoring an special local administrative queue. Yeah, I ran into a big problem because I was going to write a messages consumer to use&nbsp;inside a messages producer. I'm building&nbsp;the module&nbsp;from scratch and the amount of code which I need to write and maintain is growing rapidly!
*The site is recycled by IIS and there has been no request to it since then.  
*Unhandled exception. The site crashes.
 
<br>


2. Ok, so coming back to WCF seems to be a good choice. I&nbsp;did make some real code (not finished yet) and wrote unittest for them. What I don't like about this (in compared to MSMQ) is that when I was writing client code (sender), I had to though about the server one. If the interface was changed, we would need to regenerate the proxy file, change both client code and server code. Be noticed that I wanted to use event-based&nbsp;asynchronous model, so I&nbsp;had no choice but generating a proxy using the provided utility tool.
When does a controller start up?


3. MSMQ again!!! I got an assignment from&nbsp;Dennis to investigate the Rhino Service Bus to figure out if it can be used for our CatTask. The framework looks promising. The&nbsp;bus is well unittested and provides a lot of functions we need. In other words, by using it I won't need to write so much code as I mentioned in history #1.
*Recover from crash.  
*Start after disting/upgrading.  
*Recover from recycling.

Latest revision as of 10:37, 17 October 2013

<accesscontrol>Main:MyGroup</accesscontrol>

Communication in Cattask v2009

The new CatTask model which was discussed in Cattaskv2009_overview_of_the_new_system shows that there will be a lot of communication among the 3 CatTask instances. Besides, experience in working with the current CatTaskService tells me that this is the most error-prone part in production environment. Those can explain why we have spent so much attention on building a good communication component.

 

So what communication technology should we use?

We have investigated 3 communication techniques so far: remoting, WCF and MSMQ. Besides, we found some interesting tricks. You can find the whole story in Remoting,WCF and MSMQ for CatTask.

At the moment, we are designing the module using MSMQ with the help of Rhino Service Bus.

Rhino Service Bus

Rhino Service Bus (RSB) is an ESB which is built on the top of MSMQ. Since the bus behaviours are mainly specified by its configuration file, we'd better look at the configuration to learn how the bus works:

<facility id="rhino.esb" >
 <bus threadCount="1" numberOfRetries="5" endpoint="msmq://localhost/ownqueue" />
 <messages>
 <add name="CatGlobe.Messages.WebShop" endpoint="msmq://web/WebShop"/>
 <add name="CatGlobe.Messages.CatTask" endpoint="msmq://catmaxb/CatTask"/>
 </messages>
</facility>

 


 

In short, a Rhino service bus:

  • In the <bus> element we can see an end point. It is the queue which the bus monitors for incoming messages. When messages come, the bus receive messages from the queue and invoke the appropriate consumers to process them. For example: in the image below, we have a consumer called CatGlobeMessageController which implements the IConsumerOf<HelloCatGlobe> interface. When messages of the type come, the bus invokes CatGlobeMessageController to process them.

  • Can send messages to other queues (of course!!!). The point here is that it has two Send APIs:

       - Send with an explicitly specified end point (queue).

       - Send without a specified endpoint. We need to specify the queues (message owners) of the message type. Notice the <messages> section in the configuration block above: it says that all the messages of types which are defined in the CatGlobe.Messages.WebShop namespace will be sent to the "msmq://web/WebShop" end point. So is the second setting for CatTask.

  • Publish/Notify: another feature of RSB is the ability to publish/notify messages to all the buses who are interested in. In order to receive published messages, a consumer must subscribe itself to the producer bus. For example, a bus can subscribe to the CatTask bus that it is interested in messages of the type CatGlobe.Messages.CatTask.TaskCompleted. After the subcription is done, whenever the CatTask bus publishes a TaskCompleted message, one will be sent to the subscriber bus.

Should we use RSB?

At the moment, my answer is YES. Let's consider pros and cons of it:

- Pros:

  • It was made and is being contributed by many good developers, well unittested.
  • It solves many issues which I ran into when I tried to write my own MSMQ code. Well, I'm not a giant, but I can stand on the shoulder of giants.
  • The built-in logger is very good. It can help us figure out any problem easily.

- Cons:

  • RSB is used in the distributed contexts where there may be a delay time between when a message is sent and when it is received. The problem will be raised in the next section.
  • The help file is not good. Yeah, as I just said, we can stand on the shoulder of giants, but we have to start from the ground and there is no stairs for us to climb to the shoulder! (On the contrary, with Microsoft framework, we often have more than one stairs to use, but some of them have a gap in the middle and some others lead you to a dead-end!!!)

Buses design for CatTask

  • We are using the option #3 for Controller.
  • The real implementation may vary a little bit: it is possible to use one queue for both LD and Controller. We will decide it later. In this design, we will use one queue for each. In my opinion, it may help us understand the system more easily.

Buses diagram

- A CatTask instance has two buses:

  • One, which is LD_Bus in the image below, for the local part of it: LD and Worker.
  • The another is Controller_Bus which is used for the Controller.

 

Message namespaces

(Not finish yet)

At the lowest level, all messages which are sent to MSMQ are of type of System.Messaging.Message. However, at the RSB level, we have the ability of typed messages. For example, the producer may send a TaskInstanceInfo message, and the consumer will receive the exact TaskInstanceInfo object.

  • CatTask.Messages.LocalDispatcher
  • CatTask.Messages.Worker
  • CatTask.Messages.Controller

Buses configuration

There will be images + configuration for buses here

Behaviours

Term definition:

  • TaskMessage: messages which are related to a task, such as scheduling, report status... messages
  • ControllerMessage: messages whose the main actor is the Controller, such as Controller announcement, Controller election, query for the active Controller messages...


Local Dispatcher, Worker, Controller: there may be a bit confusion about the usages of these terms

  • From the communication point of view: Local Dispatcher and Worker are the same END POINT.
  • From the functional point of view: they are two difference modules.


In this section, you will find how the CatTask's sub-modules should behave. There is some interesting things here:

  • The use of usual language makes them understandable to everyone.
  • They are unittest cases!!! One behaviour must have at least one test case. For example, with the behaviour below

When an inactive Controller receives an AreYouController message

It will reply an ControllerReport{IsActiveController = No} message to the sender  

Unittest:

[Test]

public void When_An_Inactive_Controller_Receives_An_AreYouController_Message_It_Should_Reply_No()

{

}


Yeah, after we finish designing these behaviours and the interfaces for relevant classes, we can start writing unittests beforewriting actual code.

  • They are also very close to the real implementation

 

public void Consume(AreYouController message) 

{ 

 If (!this.isActiveController) 

 { 

 this.bus.Reply(new ControllerReport {IsActiveController = false, CorrelationId = message.Id}); 

 return; 

 } 

 // else, do it here 

}


Messages look up table

  • The fact that Rhino Service Bus' messages are strongly typed is great. I have a message type; I make a producer and a consumer for it. The bus will take care of get them to work together.
  • However, on one hand, having one message type for each communication will end up at an explosion of types. Each type needs to be implemented by a consumer. For example:
public class Worker : IConsumerOf<CanYouExecuteTask>, IConsumerOf<PleaseExecuteTask>, IConsumerOf<TaskExecutionStatus>

that's why I'd like to keep message types at an acceptable amount by grouping similar message types to one with a property which can be used to distinguish them.

  • On the other hand, I want to make the behaviours descriptions below as much understandable as possible. Having multi-meaning messages will be a backward step.
  • Thus, in the behaviours descriptions below I will use single meaning messages. Some of them are used for the ease of understanding only. The mapping between them and the actual message types can be found here Message types mapping table.

Tasks scheduling (Use case 1)

Main path

When a Task is scheduled

The LD will send a TaskInstanceInfo{TaskInstanceId, DeliveryStatus = Unknown} message to the Controller and also put the message to local storage to keep trace of it.


When an active Controller receives a TaskMessage

It will reply a TaskReceived message to the sender.


When an LD receives a TaskReceived reply message

It will delegate the work to the ProcessControllerReport function to mark the stored-to-keep-trace-of message's status to DELIVERED

Exceptional path

When an inactive Controller receives a TaskMessage
It will reply a Error_'IAmNotTheActiveController message to the sender.


When an LD receives an Error_IAmNotTheActiveController message

It will move the source message to the delaySendingList and asks for the new active Controller (ProcessControllerReport)

Task Execution (Use case 2)

Main path

When the start time of a task comes

The Controller will broadcast a CanYouExecuteTask message to all Workers.


When a Worker receives a CanYouExecuteTask message and it can execute the task
It will reply a CanExecuteTaskResult{Yes} message


When the active Controller receives a CanExecuteTaskResult{Yes} message from an LD

It will send a PleaseExecuteTask message


When a Worker receives a PleaseExecuteTask message

It will reply a TaskExecutionStatus{AboutToStart} message and start the task.


When a Worker receives a TaskExecutionStatusRequest message

It will reply a TaskExecutionStatus{CurrentStatus, CurrentProgression} message


When the Controller receives a TaskExecutionStatus

 It will update its data about the task.


When a SystemTask is finished

The Controller will send a TaskExecutionResult message back (reply) to the sender

Alternative path

  • If the same task is currently running: what "the same" means?
  • Determine if the task depends on currently scheduled or running tasks: perhaps the Controller doesn't need to ask all Workers about this. It can determine by itself.

Exceptional path

When a Worker receives a CanYouExecuteTask message and it cannot execute the task
It will reply a CanExecuteTaskResult{No} message


When all the CanExecuteTaskResult messages the active Controller receives are NO

It will pickup one LD randomly

CatTask starts up (Use case 4)

In this section, the verb phrase "subscribe to the Controller" means "subscribe to the Controller for message types of which it is the consumer". As a result, when the Controller publishes messages of those types, one is sent to the subscriber.


When an LD starts

 It will subcribe to all Controllers


When an LD receives the IAmTheNewController announcement from the a Controller

It will subscribe to the Controller


When a Controller starts

It will send a PleaseElectANewActiveController message to all Controllers

Cache Invalidation

When a CacheInvalidation system task is scheduled

 The LD will send a CacheInvalidation message to the active Controller


When the active Controller receives a CacheInvalidation message

It will broadcast a RemoveCacheItem message to all Workers


When a Worker receives a RemoveCacheItem message

It will remove the relevant cache item from the Cache.

Looking for Controller's endpoint (Use case 4)

When an LD wants to know where the active Controller is

It will broadcast an AreYouController message to all Controllers


When an LD has a message to send but it doesn't know where the active Controller is

It will put the message to the delaySendList


<span />When an LD updates the end point of the active Controller

It will send all messages in the delaySendList.


When the active Controller receives an AreYouController message

It will reply an IAmTheActiveController message to the sender


When an inactive Controller receives an AreYouController message

It will reply an IAmNotTheActiveController message to the sender

Controller election (Use case 4, 5)

Presuming that we have a place where a Controller can come to ask if it can be an active one. Obviously that there must be one and only one Controller gets a YES answer. An election is held when:

  1. The Controllers receive a PleaseElectANewActiveController message from an LD.
  2. The Controllers receive a PleaseElectANewActiveController message from a newly started Controller.

When an Active Controller figures out that it should still be the Controller

It will send an IAmTheNewController message to all LDs and Controllers


When an inactive Controller figures out that it has no chance to be a Controller at the moment

Do nothing/waiting for confirmation from the Active one/send a message to the active one to confirm?


When a Controller becomes the Active Controller

It will publish a IAmTheNewController message to everyone

Message lifetime 

In the context of CatTask, message lifetime is a big issue. By default, a message stays in a queue as long as no one picks it up. Let's look at a example:

- At 7h00 AM: a local dispatcher wakes up and wants to know where the active controller is. It will broadcast an AreYouTheActiveController to all controllers. And if it receive no positive reply, it will broadcast a PleaseElectANewController message to all controllers' queues.

- We can assume that the #3 site (thus, controller and dispatcher) is not running at the moment. Therefore, all the messages which are sent to it are not picked up in time.

- At 8h00 AM: the #3 is started and process the messages in queue. Oops, PleaseElectANewController message is 1 hour old and it is probably that one of the other two controllers has been taken the active role.

We have two choices:

  1. Messages should have a receive timeout. If the timeout time passes, they should be removed from queue. MSMQ does support this, but Rhino Service Bus doesn't.
  2. Add a Send-date property to messages so that the system (CatTask) can check if they are out-of-date and ignores them. We should take into account the fact that there may be a little bit difference between clocks (current time) of the servers.

 

Sequence diagrams

  • This diagram illustrates the flow when a task is scheduled:

 

  • Controller receives a task from the queue:

 

  • Controller sends task to a worker:

 

  • LocalDispatcher receives messages from its queue: it may be a report of the controller or a result of executing a SystemTask:

 

  • Worker receives a message from its queue:

 

Place holder

Place holder

Place holder

Place holder

Active controller election

When is a controller not available?

  • The server/IIS is down.
  • The site is stopped to be disted or upgraded.


When is a controller not running?

  • The site is recycled by IIS and there has been no request to it since then.
  • Unhandled exception. The site crashes.


When does a controller start up?

  • Recover from crash.
  • Start after disting/upgrading.
  • Recover from recycling.