Messaging overview

A Structs and Nodes Development (SAND) application is composed of nodes that get work done using synchronous (call/response) and/or asynchronous (publish/subscribe) messaging. The messages are created from data structs, and are either the Query / Collection / Update forms of the struct, or the transmittable form of the struct message itself. Message transmission is handled via a Messager implementation, typically combining direct in-memory transfer between nodes in the same process space with other protocols (JMS, WS, ESB etc) for inter-server comms. All inter-server comms require authorization to filter access to information. Communications between nodes are established in the deployment configuration. These system aspects are described below.


Table of Contents:



Messaging Declarations

Categories of messages:

For every struct definition, the build process generates a SandStructMessage ("struct message") that inherits the struct data definitions and adds accessor/mutator/utilty methods. For messages to be sent between nodes they must also be between nodes they must also be transmittable, which is accomplished by either:

  1. Adding a transmit tag, indicating that the struct message can sent between nodes directly by itself.
  2. Adding one or more verb form declarations, resulting in the creation of SandVerbMessages such as Update, Query, or Collection.

The Update message is used to create new message instances, update existing instances, or delete instances. To update a message instance, a node makes a synchronous call with an update message, and receives the result in the returned update message. So for example if a node calls with a TaskUpdate, it gets a TaskUpdate in return containing the written information. In most cases update messages are used for persistent messages, which are ultimately saved by the DataManager node.

Retrieving information is done by calling with a Query message and receiving a Collection message in return. So for example if a node calls with a TaskQuery, it gets a TaskCollection in return.

Updates and queries typically pass through one or more other nodes in the call chain before reaching the DataManager,but this process is transparent to the original calling node. The original calling node simply makes the call and receives the result. Updates to persistent messages are version checked ACID transactions: a synchronous message call is atomic (a single operation that either succeeds or fails), consistent (leaves the system as a whole in a sensible state), and isolated (independent of any other message), so an update call to the DataManager causing a durable change is transactionally safe.

Because updates to one message instance may require updates to other message instances (to enforce referential integrity and application logic), update messages are typically sent as part of an AggregateUpdate, to be processed collectively as a single transaction.

TOC



Input and output declarations:

A node definition results in the creation of messaging methods that define possible input and output for that node. The messaging topology for the application as a whole is then defined by the node instances in the deployment configuration. There are two kinds of messaging:

  1. synchronous messaging refers to one node making a call to another node, and waiting for a result to be returned.
  2. asynchronous messaging refers to one node posting a notification, which is then delivered to zero or more other nodes.

A node definition supports the following message declarations:

Subscriptions and other messaging initialization are handled automatically at startup, based on the deployment configuration.

An application may rely on dependable messaging, meaning it may assume that communication has succeeded unless a SandException is thrown during processing. When making a synchronous call, the application must also check the sandTransmitStatus of the returned message.

At any given point in time, the reference state of a running deployment is defined to be its persistent state. So if all processing suddenly ceased, and the system was subsequently rebooted, only temporary state (such as uncommitted transactions and cached references) would be lost. It is up to the application architecture, and the Messager implementation used for the deployment, to ensure this is the case. There are variety of strategies such as "fire and forget", "subscriber sync", and "durable subscriber" (guaranteed messaging) to map performance and reliability into business requirements. For help with approaches, or for consultaion on more advanced architectural considerations like non-repudiation or auditing, contact Structs And Nodes Development Services.

ALL MESSAGING CALLS ARE MULTI-THREADED. The same method may (and in most cases will) have several threads running through it at the same time. The application is responsible for ensuring access to state information is properly synchronized.

TOC



Messaging configuration:

Message flow between all node instances is set up in the deployment configuration. The types of messaging are:

Communications outside the application can be accomplished by:

  1. an adaptor node instance which handles the external connection.
  2. specifying the name of the node in the external configuration as the target. This is called a logical node and works when the underlying messaging system can rely on the messaging implementation to resolve the name.

For optimum speed, SAND messaging is done through direct method calls where possible, provided this default behavior has not been disabled in the messaging configuration. This behavior is regulated by the optimize flag for each messaging route being set to IF_POSSIBLE or NEVER.

SAND messaging can go directly from one node to another (this is typical for helper nodes within a single server), or it can be subject to security considerations. This behavior is regulated by the mode flag for each messaging route being set to DIRECT or SECURE.

All SECURE messaging is done via Authorizer nodes. So a message which would normally have travelled from node A to node B directly:

instead travels through Authorizer nodes:

as configured in the gateway messaging parameters at each end.

The Authorizer for a node should always be running on the same server as the node, or secure messaging will not function and the node will be unreachable. Primary uses for Authorizer nodes include:

Authorizers can also be used for application-specific security, and/or as a hook point for advanced messaging like dynamic reconfiguration, failover processing etc). The best way to get a feel for how this works in practice is to look at some examples.

TOC



Authorizer messaging configuration:

In a fully trusted environment where all nodes are behind a shared firewall, there is no need for Authorizer nodes. However in systems that are geographically distributed, or which have more than moderate security concerns, multiple firewalls may be used. An Authorizer node serves as a bridge for secure communications between nodes on different sides of the firewall. To bridge communications, an Authorizer is installed on either side to provide encryption/decryption, authentication, and authorization services. This includes logon, access control, and information filtering based on authorization.

Authorizer nodes are responsible only for secure, authorized communications and do not directly create message traffic on their own. They do not have their own independent message routing. When an Authorizer requires additional data, it will retrieve this via direct calls to other local nodes as configured. How an Authorizer functions is dependent on the application requirements, Messager technology implementation, and other supporting techology.

Typical example processing:

Unfortunately there is no one-size-fits-all Authorizer. An application may even define multiple classes of Authorizers to handle specific communication scenarios. For more information on authorizers related to your application requirements, contact Structs And Nodes Development Services

TOC



General message processing:

There are some general purposes nodes which accept a large portion of all the defined messages. Rather than declare each method individually, these nodes declare the general SandMessage interface as the message type for input or output. When a node does this, it assumes the responsibility of handling every defined message type itself, and it loses the message type checking provided by the framework.

The common cases where general messaging is appropriate include:

The query and subscription targets for any node are set or changed by editing the deployment configuration.

TOC



Messaging implementation:

SandBoss ships with a DirectCallMessager which handles delivery of messages through direct java method calls if possible. This much is a common need for all system deployments we have been involved with. After that it gets significantly more complicated.

SandBoss will probably include a default JMS implementation of the Messager interface at some point in the future. This will be adequate to support inter-server communications for full-JMS systems, and for systems where the JMS implementation provides necessary bridging to all other protocols needed. In practice however, were finding that:

In these, and other cases, the application will want to provide their own Messager implementation(s). For more information on Messager implementations related to your application requirements, contact Structs And Nodes Development Services

TOC



Node examples:

For more information on configuration samples and patterns, contact Structs And Nodes Development Services

TOC



Messaging Transactions

Transactional messaging is part of the Messager implementation, and not something that a SAND application programmer needs to be concerned with. The SAND programmer simply overrides the onReceive or onDelivery methods to handle incoming messages, and makes use of the query* and send methods for outgoing messages. The transactional processing is handled automatically.

synchronous messaging (query/receive):

Synchronous messaging begins with a call to a query method. This call will either return a message, or throw a MessagerException. The exception can be due to an error in processing, or from waiting too long for a result. Timeout errors result in an indeterminate global state (true success or failure of the call is impossible to determine locally, and communications may be down) so they require manual intervention to ensure system integrity.

To avoid timeout problems:

  1. Keep the systemwide query timeout value high enough to avoid timeouts under normal processing conditions, but within the tolerances of an interactive user. The default value that ships with the system is our best guess for a generally good value.
  2. Monitor the node statistics. If a node is starting to get too slow, you may need to add more processing power and/or alert users through the UI.
  3. Listen/watch for timeout errors through the notification mechanisms and the system log.

If a timeout can be predicted, it is better to return a determinate failure (e.g. "XyzNode is too busy to accept this query for processing"), than return an indeterminate timeout requiring manual intervention.

It is not necessary for a programmer developing within the SAND environment ("SAND programmer") to be concerned with exception handling in the messaging process, unless they want to explicitely trap particular situations. By default, exceptions are wrapped and returned back through to the original query call. The result is a trace of the entire call structure, which can be used for analysis. The UI will typically provide a way to submit this contextual information automatically for incident tracking purposes.

A SAND programmer should never catch a MessagingTimeoutException, or interfere with the timeToLive value in a query, as this may interfere with timing and collection of the full trace information between nodes.

To process an incoming query:

  1. The node receive method is invoked (code for this method is autogenerated as specified by the Messageable interface).
  2. Determine the incoming message type, and check its timeToLive against the expected time required for processing (table lookup of rolling averages per onReceive method). If we don't have enough time, throw a MessagingNodeOverloadedException.
  3. Insert the incoming message into the node instance queryContexts member (a collection of messages and related info indexed by Thread objects) using the current Thread.
  4. Call the appropriate onReceive method as generated from the @receive declarations. If the method throws an exception, wrap the contents, create an appropriate return message, set the error information in it (by definition it's a SandTransmitMessage), and return it. Otherwise return the result of the onReceive method.

To process an outgoing query:

  1. The node query method is invoked (code for this method is autogenerated according to the node @query declarations).
  2. Lookup up our source query message in the queryContexts. If found:
    1. add a line to the status message that we are making this query.
    2. take the timeToLive, subtract the elapsed time since we began processing the query, subtract the processing buffer value, and set the timeToLive value in the outgoing message.
  3. Lookup the destination node instance name for this message type from the corresponding generated configuration parameter.
  4. Call the generalized query method (code for this method is autogenerated as specified by the Messager interface).
  5. The query method makes the call, waiting at most timeToLive millis. If no result is returned, throw a MessagingTimeoutException. If a result is returned, but it is a message with an error status, then pull the error information and throw a MessagerException with the contents. Otherwise return the result.

TOC



asynchronous messaging (send/subscribe):

In asynchronous messaging, a node will send a message as general output, which is then received by zero or more subscriber nodes. A subscriber node is guaranteed to receive the message (provided it is running) but there are absolutely no timing guarantees as to when. Best efforts on timely delivery are made, and updates to the same persistent message instance are guaranteed to arrive in order.

Except for adaptor nodes to specific technologies, there is no concept of a persistent (durable) subscriber in SAND. If message instances need to persist, that is accomplished via the DataManager. Nodes that monitor persistent message instances fall into two categories:

  1. Need the most recent message instance information: In this case the node should make use of a MessageCache, specifying the unique ID of the message. The cache will automatically retrieve the instance from the db if necessary, and handle all updates through subscription to the DataManager or local update distribution channel.
  2. Need all instances since they were last shutdown/suspended: In this case the node will need to perform a historical query on resume/startup to retrieve all instances they missed. The node will have to subscribe, and then query, and check for any overlap in the interim.

A SAND programmer does not need to handle exceptions in asynchronous message processing unless there are specific situations they wish to catch. There are no timout issues. Exceptions are due to a send failure or a delivery processing failure. In both cases a runtime error is logged, and the node fails. If the error can safely be handled, then the node developer must explicitely trap the error.

To subscribe to incoming messages:

  1. When a node switches to a RUNNING state, autogenerated code calls each subscribe method autogenerated for the node from the @subscribe declarations.
  2. Each subscribe method:
    1. Looks up the node it will be subscribing to
    2. Calls the generalized subscribe method (code for this method is autogenerated as specified by the Messager interface) passing the message type and the source node.
  3. The generalized subscribe method makes the call to start asynchronous delivery of the specified messages from the specified source. If any error occurs, a MessagerException is thrown.

To unsubscribe from incoming messages:

  1. When a node switches from a RUNNING state to any other state, autogenerated code calls each unsubscribe method autogenerated for the node from the @subscribe declarations.
  2. Each unsubscribe method:
    1. Looks up the node it is unsubscribing from
    2. Calls the generalized unsubscribe method (code for this method is autogenerated as specified by the Messager interface) passing the message type and the source node.
  3. The generalized unsubscribe method makes the call to stop asynchronous delivery of the specified messages from the specified source. If any error occurs, a MessagerException is thrown.
  4. Autogenerated code waits for any outstanding onDelivery processing to complete before returning.

To receive an incoming message:

  1. The generalized deliver method is called (code for this method is autogenerated as specified by the Messageable interface).
  2. Determine the incoming message type, and insert the incoming message into the node instance deliveryContexts member (a collection of messages and related info indexed by Thread objects) using the current Thread
  3. Call the appropriate onDelivery method as generated from the @subscribe declarations. If the method throws an exception, log the error, remove our deliveryContext, and fail the node. Otherwise just remove our deliveryContext.

To send an outgoing message:

  1. The node send method is invoked (code for this method is autogenerated according to the node @send declarations).
  2. Lookup up our source query message in the queryContexts. If found:
    1. add a line to the status message that we are sending this message.
  3. Lookup our unique node instance name as configured.
  4. Call the generalized send method (code for this method is autogenerated as specified by the Messager interface).
  5. If the send fails, throw a MessagerException. This will fall through to the specific send method, which will fall through to the enclosing context (onReceive, onDelivery, onStartup etc), which if not caught will fall through to the enclosing context, which will trigger a node failure.

TOC



Relationship to control:

Each node instance is Controllable and can transition between a variety of runtime states. Messaging state reactions:

Any specific node instance may override any behavior as required.

TOC



Differences from original architecture:

Wording can be ambiguous or misleading based on context, and some questions may not be adequately addressed:

  1. The architecture specification states "If a call to an onReceive method completes normally, then message delivery was complete. Otherwise the message will be redelivered to the node". The first sentence should read something like "If a call to an onReceive method terminates, then message delivery was complete". Throwing an exception does not cause message redelivery.

TOC











© 2002-2005 SAND Services Inc.
All Rights Reserved.