qmail bounces email sent to the existing mailbox email@example.com
with the error message "user's homedir not found
Hi. This is the qmail-send program at mail.provider.com.
I am afraid I was not able to deliver your message to the following addresses.
This is a permanent error; I have given up. Sorry it did not work out.
In the /var/log/maillog folder on the qmail MTA server, the following error message may be found:
Apr 12 22:52:28 mta1 splogger: 1333399948.387424 status: local 1/10 remote 0/120
Apr 12 22:52:28 mta1 qmail-lspawn: LDAPWARN: searchEmailInLDAP_: didn't find email firstname.lastname@example.org by filter '(&(objectclass=mailname)(|(email@example.com)(firstname.lastname@example.org)(email@example.com))(!(homedir=#)))': exit code -1
Apr 12 22:52:28 mta1 qmail-lspawn: LDAPWARN: searchEmailInLDAP_: didn't find email CATCH-ALL@customer.com by filter '(&(objectclass=mailname)(|(mailname=CATCH-ALL@customer.com)(mailalias=CATCH-ALL@costomer.com)(mailalias=CATCH-ALL@customer.com))(!(homedir=#)))': exit code -1
Apr 12 22:52:28 mta1 qmail-lspawn: LDAPWARN: lookupEmailInLDAP: email 'firstname.lastname@example.org' wasn't found and searchEmailInLDAP_(catchall=CATCH-ALL@customer.com) returns No such object
Apr 12 22:52:28 mta1 splogger: 1333399948.388056 delivery 82194: failure: user's_homedir_not_found/
At the first glance, it looks like the customer's email is absent from the LDAP database.
However, the entry for the problem mailbox exists in the LDAP database and appears to be correct:
The homedir of the /usr/local/qmail/shared/mailnames/eb/l/-66/66/ mailbox exists; permissions on file system are correct; the NFS server is available; and the NFS share /usr/local/qmail/shared is mounted on all MTA servers in the qmail cluster.
Network monitoring tools do not show any packet loss on any servers in the qmail cluster, including the MTA, LDAP, WHOSON, and NFS servers. Routing is configured correctly as well, and all hosts are available to each other.
The problem is intermittent. It is not reproduced on a permanent basis; some emails may be delivered correctly to the mailbox, and then some emails may be bounced. The problem appears randomly and may affect different mailboxes hosted in the POA-managed qmail cluster.
A possible reason for the problem is an incorrectly configured LDAP server. It may close the established connection between qmail and the LDAP server after an idle timeout, while qmail still thinks that the connection is established and is trying to send the request to the LDAP server (to the closed connection). As a result, qmail does not get any reply from the LDAP server because the connection is already closed on the LDAP server side and incorrectly treats this situation as an attempt to send email to a nonexistent mailbox, bouncing the mail message.
The reason for the problem in this case is the idletimeout parameter in the LDAP server configuration file; its value is set to low, and as a result, the LDAP server closes the connection too early.
The full picture of the problem is demonstrated in the two screenshots below, which show the packets between qmail and LDAP servers captured by tcpdump and processed by WireShark.
The sequence of actions in the "good" case is as follows:
qmail server 10.184.18.6 sends a bindRequest request to LDAP server 10.184.19.23 (packets #18, #19)
LDAP server accepts the connection (packets #20, #21)
qmail server performs search requests (packets #22, #23, #24, #25)
qmail sends an unbindRequest request to the LDAP server to close the connection (packets #26, #27)
Finally, the LDAP server closes the connection correctly (packets #28, #29)
The sequence of actions in the "bad" case is as follows:
qmail server 10.184.18.6 sends a bindRequest request to LDAP server 10.184.19.23 to open the connection (packets #23194, #23195)
LDAP server accepts the connection (packets 23196, #23197)
qmail server performs search requests in LDAP database; they are successful (packets #23198-23515)
After qmail performs the needed searches in the LDAP database, it keeps the connection open and does not send any requests for a certain period of time (because there is no need to ask the LDAP database)
LDAP server closes the connection from its side after idle timeout (packet #24033)
qmail server receives new incoming email and tries to send one more request to the LDAP database and to the existing connection (packets #24034, #24658, #24659)
The last request from the qmail server to the LDAP server fails because the LDAP server already closed the connection from its side due to inactivity and idle timeout (packets #24660, #24661)
qmail bounces incoming email message because attempt to find mailbox in LDAP database failed
Set the parameter idletimeout in the LDAP server configuration file /etc/openldap/slapd.conf to 0 and restart LDAP service.
From man-page on slapd.conf:
Specify the number of seconds to wait before forcibly closing an idle client connection.
A idletimeout of 0 disables this feature. The default is 0.