Adventures with Import- and Update-CsUserData Pt.2

I’ve spent a lot of time this year working with the <verb>-CsUserData commandlets in SfB, and it’s been an interesting time.

This is part 2 in a series where I document the traps I’ve fallen into with the Import-CsUserData and Update-CsUserData commands in particular.

Part 1 – Update-CsUserData fails on bad data.
Part 2 – Your Front-End won’t start after Import-CsUserData imports bad data.
Part 3 – Update-CsUserData throws false red and runs slow if your paired pool is offline.
Part 4 – You can’t trust Import-CsUserData to tell you if it fails.

In Part 1 I showed how Update-CsUserData will throw an error if you try to import “bad data” – and in that example this boiled down to two occurrences of the same “buddy” in a contact file exported from Lync 2010.

What do you think happens if you try to import the same XML/ZIP file using Import-CsUserData? I’m glad you asked.

Part 2 – Your Front-End won’t start after Import-CsUserData imports bad data

TL;DR: It looks like there’s not a lot of validity checking in “Import-CsUserData”. Provided the file is valid XML, the DLL(?) that does the work just copies the Buddy data – as XML – directly into the back-end SQL database. The problems arise later when the Front-End SQL database parses that XML and tries to plug it into its own tables where each Buddy is a Primary Key and MUST be unique. Now you’re in strife.

I believe you’ll only be impacted if your source is formatted as Lync 2010 data.

Let’s say you’ve found yourself in this situation.

If you’ve followed the official Microsoft documentation, you might have done something like this:

  1. Stop-CsWindowsService on the user’s Standard Edition Front-End (or stopped all if this is an EE pool)
  2. Import-CsUserData
  3. Reset-CsPoolRegistrarState -PoolFqdn -ResetType FullReset
  4. <Things go pear-shaped about here>

(Why the full reset? Import-CsUserData is pushing this new contact information into the back-end SQL database, and we need to ensure all of the Front-End replicas are flushed and reloaded with this new “single source of truth” that you’ve just Imported. More on this in subsequent parts.)

If the Pool now won’t start, you’ll probably encounter these three events:

57006, LS User Store Sync Agent:

RtcDb Sync Agent sproc failed
#CTX#{ctx:{traceld:1323457082, activityld:"2d456267-8ee5-4114-a4c5-a63917191aca"}}#CTX# Sproc: [XdsUpdateReplicaData]
Exception: [System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_#B9F7D4D_668421806484F2B3‘. Cannot insert duplicate key in object 'dbo. @Contact'. The duplicate key value is (rocky@contoso.net).
###63020:ReppOnDataReceived:Failed to import lcd for user greig@contoso.net.
Duplicate key was ignored.
Duplicate key was ignored.
The statement has been terminated.

57008, LS User Store Sync Agent:

Pull Sync cycle detected an issue preventing data processing from making progress.
#CTX#{ctx:{traceld:1391487082, activityld:"2dc46267-8415-4104-a4b5-a63947d99aca"}}#CTX# Routing Group Id: [ECE7ED4D-BDAF-5BCD-B702-21D10EC0BB96]
Batch Ids:[5bc62ef7-1bc1-433c-9c3a-b21c83bb3bf1 ]
Cause: Possible issues with the pulled data causing it to not be updated in the local db properly. Resolution:
Check Sync Agent logs and resolve any issues with any documents not being applied properly.

32178, LS User Services:

Failed to sync data for Routing group {ECE7ED4D-BDAF-5BCD-B702-21D10EC0BB96} from backup store.
Cause: This may indicate a problem with connectivity to backup database or some unknown product issue.
Resolution:
Ensure that connectivity to backup database is proper. If the error persists, please contact product support with server traces.

An EE-based deployment might reveal a routing group stuck in an infinite loop.

This is an extract of a Start-CsPool:

Status of the Servers:
Server                           Status                   Stage                            Details
-------                          -------                  ------                           --------
sfb2015ee1.azgislab.com          InProgress               RoutingGroup Placement           None
sfb2015ee2.azgislab.com          InProgress               RoutingGroup Placement           None
sfb2015ee3.azgislab.com          InProgress               RoutingGroup Placement           None

WARNING: No progress is being made for the following routing groups:
Routing Group                            Status                       Total Documents      Progress
--------------                           ------                       ----------------     ---------
B489B907-5B01-5883-BF09-1FD8EA118CD4     Infinite Loop                0                    No

Status of Routing Group Placement:
Total Successful RGs             Total Failing RGs        Total Documents                  Total Batches
---------------------            ------------------       ----------------                 --------------
4                                1                        0                                0
Please run following command again with given option.
Start-CsPool -PoolFqdn eepool.azgislab.com -SkipRoutingGroup @("B489B907-5B01-5883-BF09-1FD8EA118CD4")
**********************************************************************************************************************

2019-12-07T15:39:37
The final state of servers:
Server                           Status
-------                          -------
sfb2015ee1.azgislab.com          StartPending
sfb2015ee2.azgislab.com          StartPending
sfb2015ee3.azgislab.com          StartPending

PS C:\>

That last bit of guidance is GOLD – it’s suggesting you discard that particular routing group in an effort to get the pool up. (“The needs of the many …” etc.)

The Quick Fix

OK, let’s recap:

Your problem is that there’s bad data in the back-end Database, and that data is not compatible with the front-end database’s requirement that you only have one occurrence of a Buddy in your Contacts list, and this results in an impasse that’s holding up the restart of the pool.

Now we could go bashing commands into SQL to try and zap the rogue entry, but that’s not somewhere I’m comfortable playing. If the heat’s on and you’re not keen on plugging random commands you found on the internet into a critical production SQL instance, I don’t blame you.

Try this: just capture the offending user (named in the 57006 event) with Get-CsUser, then Disable-CsUser them! That’s the quickest way to “fix” the database. After all, you already have a backup of their contacts!

If the pool’s still trying to come up at this stage, restarting the RTCLocal SQL instance on the Front-End on which you found the 57006 event should be all you need to do now to return the pool to service.

If however you’ve already restarted the pool having used “-SkipRoutingGroup”, add it back into the mix with this:

PS C:\>Reset-CsRoutingGroup B489B9075B015883BF091FD8EA118CD4

A subsequent Get-CsPoolFabric state on the pool will hopefully now be clean, reporting no yellow.

Am I susceptible?

Have you been editing contacts/buddy lists?

This issue first surfaced in the real world, but I’ve since tested and duped it in no fewer than 4 Lab environments, of both SfB 2015 and 2019 and a mixture of Standard Edition and Enterprise Edition deployments.

First off, if your data’s NOT coming out of Lync 2010 – or not formatted by a third party tool into that format – I think you’re OK. The tell here seems to be the “HomedResource” header at the top of the CsUserData XML export file. If it looks like this, I believe you’re at risk:

Contrast this with a more recent CsUserData export:

Extra SQL bits

If you’re at the very least curious and you have some time, once you know the affected user – sourced from the 57006 message – this query into the rtcxds database of your back-end will reveal the user and the data in question:

SELECT TOP 1000 [DocId],[ItemId],[BatchId],[BatchPartialVersion],[Data],[CreationTime],[UpdateTime]
  FROM [rtcxds].[dbo].[Item]
  WHERE Data.exist('//*:HomedResource[@UserAtHost="jessica@azgislab.com"]') =1

This variation will report everyone in the shop who has the named user as a contact:

SELECT TOP 1000 [DocId],[ItemId],[BatchId],[BatchPartialVersion],[Data],[CreationTime],[UpdateTime]
  FROM [rtcxds].[dbo].[Item]
  WHERE Data.exist('//*:Contact[@Buddy="rocky@azgislab.com"]') =1

Fix it in SQL!

It look me a while, but this seems to do the trick at a SQL level.

UPDATE [rtcxds].[dbo].[Item]
SET Data.modify ('delete //*:HomedResource//*:Contacts//*:Contact[@Buddy="dupe_user@contoso.com"]')
WHERE (Data.exist ('//*:HomedResource[@UserAtHost="broken_user@contoso.com"]') =1)
AND ((Data.exist ('//*:Contact[@Buddy="dupe_user@contoso.com"]') =1))

The syntax of this query reads as
UPDATE <the rtcxds data>, deleting *any* contact where the Buddy=”dupe_user@contoso.com”
WHERE the data contains [“exist”] “UserAtHost=broken_user@contoso.com”
AND there’s a Buddy=”dupe_user@contoso.com” in this row.

The point of the seemingly redundant test of “AND Buddy=” is to make sure we ONLY make changes to the row in the database where the buddy data is stored. The rtcxds database contains multiple records (rows) for each user, but only one of those has the buddy data. It’s probably unnecessary, but given we’re doing open heart surgery on SfB, I liked the idea of being overly specific in the query, with the intent of limiting the potential for damage.

Credits

My thanks go to Dan C for his invaluable help preparing this post, and to the Kiwi connection (Dave and Paul in Auckland) for their SQL guidance.

Resources

Revision History

7th December 2019: This is the initial publication.
21st March 2020: Added the “Fix it in SQL” query to zap the offending duplicate.

– G.

2 Comments

  1. The SQL fix it awesome! I have the same dilemma, but with the value stored here:

    SELECT d.Name,i.Data FROM [rtcxds].[dbo].[Document] AS d
    INNER JOIN [rtcxds].[dbo].[Item] AS i ON i.DocId = d.DocId
    WHERE d.Name = ‘urn:hcd:LAB9/1E106@sipdomain.com’

    Somehow, back in the past, the “/” was allowed in the sip address and now, when I try to run export-csuserdata, the process fails due to that entry. How can I remove it from the rtcxds database?

    • Hey Chet,

      I’m not sure what you’re trying to achieve. Can’t you just update the SIP address in PowerShell to remove the offending slash, or has the user already been updated or deleted and this stale data is stuck?

      If it’s the latter, is the SQL query objecting to the slash in the WHERE clause? Reformulate the query to still pull an accurate record without referencing the slash character, like this:

      WHERE d.Name LIKE ‘urn:hcd:LAB9%’ AND d.Name LIKE ‘%1E106@sipdomain.com’

      – Greig.

Leave a Reply

Your email address will not be published.

... and please just confirm for me that you're not a bot first: Time limit is exhausted. Please reload the CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.