Updated 12th December 2011. (Scroll to the bottom)
We’ve found here in Oz with the UX that occasionally a random failure code from the carrier is enough to block the entire gateway.
The NET faithfully interprets the applicable standards, mapping the ISDN failure messages to their SIP equivalents, then sends the resulting message to Lync.
Unfortunately then Lync – itself also dogmatically sticking to the standards – decides that the message indicates a failure of the entire route/gateway, not just a single call, and temporarily blocks the gateway for all further outgoing call attempts.
Evidence of this problem will be logged in the SBA’s Event Log (Lync Events 46009, 46046, etc) but the user complaints of “can’t call out” should be enough to clue you up.
Event 46009. LS Outbound routing. An attempt to route to a gateway failed:
Event 46046. LS Outbound routing. A call to a PSTN number failed due to non-availability of gateways:
NET has recommended we create a SIP Override Table to translate these errors into a “harmless failure”. Note that there are a several more entries in this list than are automatically created by the UX’s Wizard (as at 1.3.2 v83):
This resolves the WORST of our problems, but Lync’s not so easily fooled. It sees too many 480s and decides all’s not well.
Event 46009. LS Outbound routing. A PBX gateway has been marked as less-preferred:
What being marked as “less preferred” means isn’t immediately apparent, but I’m sure it can’t be a good thing. If you only have one gateway you might be OK, but if you have several you might find your calls start routing through your backup gateway when you’re not aware of a problem with the first. Here in Oz that would typically manifest itself as complaints that outbound calls are showing the wrong CallerID (if you’ve routed to a different exchange area or state, for example).
The obvious suggestion is to try a different SIP code. I’ve revised my customer’s site to translate all of the above ISDN Cause Codes to good ol’ “busy here” (486), and I’ll come back to you if Lync sees through that cheap trick.
Added Dec 12th: OK, so it turns out Busy is fine if you’re running a single gateway, but if you have multiple gateways it’s going to break your failover to the second gateway. “Busy” is of course a valid, healthy response to an outgoing call, and we end at this point.
I was testing failover at a site I was in the process of commissioning. The UX was online, but with no ISDNs plugged in yet. I was watching Lync send the SIP INVITE to the gateway, which then failed with an ISDN 47 (“Resource Unavailable Unspecified”), which in turn came back to my Lync client as “<called user> is busy” rather than failing over with an INVITE to the UX in the other site.
I’ve now changed all these codes to return a 503 (“Service Unavailable”), so I’ll soak that for a while and let you know how well that works. I’ll get back to you… (We may yet end up with a hybrid config in the table: some returning 486 and others 480 or 503…)
Experiencing exactly the same issues with a bunch of VX and UX gateways.
As you did, I implemented the Q.850 to SIP Cause Code Table entries.
What I did however with specific Q.850 codes is add “Unassigned” in the SIP end and that cleared 99.9% of events in the logs.
Bookmarked for further follow-up.
We are experiencing this as well but not with ISDN with an upstream SIP Trunk provider. Do you know if this would apply in the case of a SIP trunk?
Hi Josh. I think it’d certainly apply. At some stage there’s going to be a translation of the signalling messages from Telco/ISDN protocols to SIP, and thus the standards will prevail – sometimes with undesirable outcomes.
Was the original issue being experienced (random failure code from carrier) specific to one site? Or have you experienced this issue across multiple deployments?
Hi Damo. We’ve seen this on MANY deployments (starting from our *first*), and across multiple carriers here in AU. Given that the carriers are all interconnected, it could still be just a problem with one, but is simply echoed through the others.
Try it: check an SBA for instances of 46046 or 46009. If they’re not there, it’s probably because the default Q.850 to SIP table (created by the Wizard) is successfully protecting you from them!