What to Do when Your RID and PDC Emulator DC is having AD Database Corruption?

One of my engineer ask me if demote and promote the DC can solve the problem… I almost fainted and give call him back immediately before he were to do that.

Why Not? And actually, if he were to try, he will not be able to do so… Why? 🙂

Ok, let me start everything from the beginning.

How we found out that the Domain Controller is having AD Database Corruption?

From Directory Service Event Log, you will see the following events appearing constantly…

NTDS ISAM Event ID: 467 database corruption Error,
NTDS Replication Event ID: 1084 Replication Error
NTDS Replication Event ID: 2108 Replication Error
NTDS General Internal Event ID: 1173 Processing Warning

From Observation, user account deleted from other DC is not replicated to this particular DC.

If this is just a child domain controller, I could have slept well last night already~ What make it worse is that under my enterprise AD, the one having the issue is my DC holding my RID and PDC emulator roles~!! Which some people will know them as part of the Flexible Single Master Operations (FSMO) roles.

Quick recall of which are the FSMO roles:

Schema Master (Whole Forest exist 1 only)
Domain naming master (Whole Forest exist 1 only)
Infrastructure Master (Whole Domain exist 1 only)
Relative ID (RID) Master (Whole Domain exist 1 only)
PDC Emulator (Whole Domain exist 1 only)

How to resolve the issue? (preferred solution for me)

Summary of my environment and situation:

  • Single Forest with 2 Child Domains running on AD 2003
  • One of the Domain Controller under one of my Child Domain is having AD Database Corruption
  • The Domain Controller is holding on to RID and PDC emulator roles
  • There are 3 other Domain Controller where one is holding on to the Infrastructure Master role and the other 2 are the bridgehead servers

For me, I will always play safe where I will not attempt to repair my AD database as this is one method that one should never try unless the DC is the only domain controller in the forest (no child domain) and you never backup your system state regularly!

For my situation, I will perform the following steps:

  1. Backup whatever data is on the DC that is having the issue
  2. Record down any special configuration such as DNS forwarding or even WINS
  3. Take the server offline (Boot to CD to begin your refresh of OS)
  4. From one of the bridghead server, Seize the RID master and PDC emulator master roles using ntdsutit.exe command. To do that, follow: Using Ntdsutil.exe to transfer or seize FSMO roles to a domain controller
    • Why not the one with infrastructure Master? It is not preferred but if one has no choice, you can use it too. But remember to make it a GC first for the meanwhile.
    • Why not transfer but seize? You can try but it should not work (For me, dun work). And think carefully, AD Database on that DC is already corrupted, you want to transfer the roles and DC-demote it? Better not, treat the server as if it is having OS Corruption instead. One should not risk the fact that the corrupted data will spread to other DC(s).
    • Why not disable replication of the server and try to force dc-promote? For me, I have timeline to meet, since I will treat it as a case of OS corruption, I will not waste time to perform that. I will just take the DC offline.
  5. At the same time, rebuilding of the server with the issue.
  6. After ensure that other DC(s) has confirmed that the roles are taken over, perform metadata cleanup. You need to manually remove all remaining entries of the corrupted DC from AD database. To do that, follow: How to remove data in Active Directory after an unsuccessful domain controller demotion
    You should also remove any DNS related entry to that DC.
  7. Wait or force replication to ensure rest of the DC(s)know about the changes made (Can use Active Directory Site and Services or the Repadmin command line tool).
  8. After knowing that changes were successfully replicated to all existing DCs, the DC should completed the rebuild process and ready to be promoted back as a domain controller.
  9. Before you perform DCpromo, please make sure you have resolve the cause of the AD database corruption! (Hardware or software –anti-virus or even network firewall)
  10. Once the DC is promoted back as a domain controller, wait for it to stabilize before transferring back the RID and PDC emulator roles to it. (GUI method should work now).
  11. Last of all, monitor the event logs of other DC to make sure that replication is normal and configure what is necessary to the DC!

So, for those who are still running AD on a single DC, time for you all to plan your Disaster Recovery!! As this can happen anytime!

Well, sure glad to be able to share this exercise with my new engineers and hope that they will learn from this! 🙂

Good Day…

Advertisements
This entry was posted in Microsoft Active Directory, Troubleshooting. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s