Comms 101: Troubleshooting MTA Message Queues

I've been working with two of our Messaging Server experts to come up with an article on Messaging Server's Message Transfer Agent (MTA) and how it handles the build-up of messages in its message queues. This has been a fairly frequent topic of inquiry from the Comms community that presents itself as, 'I'm noticing a build-up of messages in my queues, should I be worried about it.'

The short of it is, not necessarily, because of the way the MTA is designed to work as a store-and-foward message system. The long of it is, maybe, but there's more to consider than just number of messages in the queue.

So, here's part one (of two) on my take on this situation, based on the experiences of two of our support engineers. I'd be interested in hearing from others and their experiences, to see if this article should be expanded/corrected.

Troubleshooting Sun Java System Messaging Server MTA Message Queues

Part I

This article describes how to troubleshoot the Sun Java System Messaging Server Message Transfer Agent (MTA), specifically message build-up in channels, including the TCP channels (tcp_local and tcp_intranet) and ims-ms channel.

Products covered by this article are:

  • Sun Java System Messaging Server 6

  • iPlanet Messaging Server 5


Note - This technical note assumes you are running Sun Java System Messaging Server 6. Where appropriate, iPlanet Messaging Server 5 commands are also mentioned.


This article contains the following topics:

  • About the MTA and Channels

  • When Is a Message Queue Having a Problem?

  • Main Causes of Backed-up Message Queues

  • Before You Begin<

  • Troubleshooting TCP Channels

  • Destination Host Problems

  • Troubleshooting the ims-ms Channel

  • Configuration Issues

  • Additional Information


About the MTA and Channels

To begin understanding why messages build up in a channel queue, and whether an actual problem might be occurring on your system, you must first understand how mail messages flow through the MTA.

The MTA can be thought of as Messaging Server's central brain, responsible for message routing. The MTA takes in messages via SMTP sessions from other systems and decides what to do with those messages. The first stop for a message is the MTA SMTP server, which executes programs to handle the SMTP session. Based on numerous configuration possibilities, the SMTP server processes the message, which could include message blocking, address changing, or channel enqueueing.

Actually, the sequence of events is a bit more complex. The dispatcher spawns the SMTP server by listening on port 25 (or whichever port is defined for the SMTP server in the dispatcher.cnf file). When the dispatcher detects an attempt to connect to port 25, it starts an SMTP process to handle the incoming connection. The SMTP server typically decides whether to accept or reject the message based on numerous configuration possibilities. During the SMTP dialogue, the MTA machinery kicks in to decide what to do with the message. However, the PORT_ACCESS mapping table works with the dispatcher rather than the SMTP server to allow or block access to certain ports such as the SMTP port (port 25).

The focus of this technical note is the decision to route the message to a channel where the message is to be enqueued. A channel is a message connection with another system or destination. Once enqueued to a channel, the message's destination could be another server (either on the Internet or on your company's intranet), a remote message store, a specific domain name, a channel for extra processing, such as virus filtering, or a local message store.

When a channel contains messages but is not delivering them, the messages build up in the channel's message queue. The message queues are directories, located by default in the msg-svr-base/data/queue/channel/ directory. In this way the MTA holds the messages for future delivery when whatever situation preventing their delivery is resolved.

The specific channels discussed in this technical note are the TCP channels (tcp_local and tcp_intranet) and the ims-ms channel. The tcp_local channel is responsible for routing messages to the Internet, while the tcp_intranet channel is responsible for delivering messages to remote message stores on your company's intranet. The tcp_intranet channel also routes messages to any intermediary internal systems on their way to another system. The ims-ms channel is responsible for delivering messages to the local message store.

For a complete description of the MTA architecture and message flow, see MTA Architecture and Message Flow Overview in Sun Java System Messaging Server 6.3 Administration Guide.


When Is a Message Queue Having a Problem?

Though this might be counter-intuitive, in general, it is not a problem for messages to build up in a message queue. Indeed, the MTA is designed to handle this situation. Internet mail (SMTP) is a store-and-forward mail system. Internet mailers are designed to store the messages that cannot yet be delivered. Keep in mind that it is perfectly normal to not be able to deliver outgoing SMTP messages immediately. Network problems, problems on remote hosts, problems with remote users' mailboxes, and so forth, are common cases that cause the MTA to hold and not deliver messages. MTAs, including Messaging Servers', therefore can store and retry outgoing messages. An often encountered case has to do with message store users being over quota and hence their messages are waiting for retry. From the MTA design point of view, this exactly the same case as being unable to deliver a message because of a network problem. The MTA handles the over quota situation with exactly the same underlying mechanisms: the messages are stored in queues where they will be retried for delivery. Thus, just because you are seeing a build-up of messages in queues is not necessarily cause for alarm.

The real issue about backed-up queues is when should you have cause to worry and so try to troubleshoot the source of the problem. The next sections explain the main causes of queue backups and how to go about troubleshooting such situations.


Main Causes of Backed-up Message Queues

In general, four situations can cause messages to back up in the MTA message queues:

  • Performance problems. Simply put, this occurs when your system is unable to keep up. Performance problems manifest themselves by high system loads and processes running flat out. In such situations, you need to perform an in-depth look at the system to be able to tune it and reduce the load.

  • Destination host problems. Regardless of what is causing problems on the destination host (networking problems, system problems, and so on), this situation can cause all delivery threads to become “stuck,” stopping valid email from getting through, or causing a lot of queued emails. The “stuck” problem presents itself as a lot of active messages waiting to be processed but with a low load. The “queued” problem shows as lots of queued email with long delivery attempt history.


    Tip - To contact the administrator of a domain that is causing you problems, use the whois command or http://www.netsol.com/cgi-bin/whois/whois.


  • Problems with Messaging Server itself. An example of a Messaging Server problem is when the stored process has an orphan lock for a user account, resulting in message build up for users in the ims-ms channel.

  • Configuration issues. Two common configuration issues include the job controller ignoring queued email or queued email due to overquota accounts, or queued email due to slow directory response, common for big mailing lists.

Part Two of this article will discuss courses of action to the above situations, and provide more information on troubleshooting. Stay tuned.
Comments:

Hi,

This information is very helpful for troubleshooting messaging server-related problems. However, I found no clues how to identify an orphan lock by store process.

In our system, we have deployed iPlanet 5, but every mondays(or days after holiday), our system always has encountered message build-up on ims-ms channel. Well, we checked server's resource but nothing seemed to be weird (resources such as memory, CPU are more than enough).

We've tried to find out the root problem, but yet have found nothing. Therefore, I would like to ask some questions as stated below:

Is this the problem of ims-master or an orphan lock by store process as you mentioned? Can we increase the number of ims-master which is now set to 2 in ims-ms channel? How many messages can be processed by one ims-master at given life time?

Best regards,
vanny

Posted by vanny soeun on February 22, 2008 at 02:39 AM MST #

Vanny,

I checked with a Messaging Server expert, and he recommended that you upgrade. iMS5.x is now in limited support and newer versions have improved job_controller behaviour (the job_controller is responsible for handing emails to various processes, for example, ims_master) and overall performance.

Having said that (and that may not be what you wanted to hear), it's hard to determine a more specific answer to your particular situation. Emails could queue up in the ims-ms channel for any number of reasons -- there isn't enough information here to say one way or another.

In troubleshooting iMS5.2 and ims-ms, you may want to read through the following email thread:

http://lists.balius.com/pipermail/info-ims-archive/2005-March/021296.html

HTH,

Joe

Posted by Joe Sciallo on February 25, 2008 at 07:08 AM MST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Reporting about Unified Communications Suite Documentation, including news, Comms 101, documentation updates, and tips and tricks.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today