Updated 12th December 2011. (Scroll to the bottom)
We’ve found here in Oz with the UX that occasionally a random failure code from the carrier is enough to block the entire gateway.
The NET faithfully interprets the applicable standards, mapping the ISDN failure messages to their SIP equivalents, then sends the resulting message to Lync.
Unfortunately then Lync – itself also dogmatically sticking to the standards – decides that the message indicates a failure of the entire route/gateway, not just a single call, and temporarily blocks the gateway for all further outgoing call attempts.
Evidence of this problem will be logged in the SBA’s Event Log (Lync Events 46009, 46046, etc) but the user complaints of “can’t call out” should be enough to clue you up.
Event 46009. LS Outbound routing. An attempt to route to a gateway failed:
Event 46046. LS Outbound routing. A call to a PSTN number failed due to non-availability of gateways:
NET has recommended we create a SIP Override Table to translate these errors into a “harmless failure”. Note that there are a several more entries in this list than are automatically created by the UX’s Wizard (as at 1.3.2 v83):
This resolves the WORST of our problems, but Lync’s not so easily fooled. It sees too many 480s and decides all’s not well.
Event 46009. LS Outbound routing. A PBX gateway has been marked as less-preferred:
What being marked as “less preferred” means isn’t immediately apparent, but I’m sure it can’t be a good thing. If you only have one gateway you might be OK, but if you have several you might find your calls start routing through your backup gateway when you’re not aware of a problem with the first. Here in Oz that would typically manifest itself as complaints that outbound calls are showing the wrong CallerID (if you’ve routed to a different exchange area or state, for example).
The obvious suggestion is to try a different SIP code. I’ve revised my customer’s site to translate all of the above ISDN Cause Codes to good ol’ “busy here” (486), and I’ll come back to you if Lync sees through that cheap trick.
Added Dec 12th: OK, so it turns out Busy is fine if you’re running a single gateway, but if you have multiple gateways it’s going to break your failover to the second gateway. “Busy” is of course a valid, healthy response to an outgoing call, and we end at this point.
I was testing failover at a site I was in the process of commissioning. The UX was online, but with no ISDNs plugged in yet. I was watching Lync send the SIP INVITE to the gateway, which then failed with an ISDN 47 (“Resource Unavailable Unspecified”), which in turn came back to my Lync client as “<called user> is busy” rather than failing over with an INVITE to the UX in the other site.
I’ve now changed all these codes to return a 503 (“Service Unavailable”), so I’ll soak that for a while and let you know how well that works. I’ll get back to you… (We may yet end up with a hybrid config in the table: some returning 486 and others 480 or 503…)