One of my engineer ask me if demote and promote the DC can solve the problem… I almost fainted and give call him back immediately before he were to do that.
Why Not? And actually, if he were to try, he will not be able to do so… Why? 🙂
Ok, let me start everything from the beginning.
How we found out that the Domain Controller is having AD Database Corruption?
From Directory Service Event Log, you will see the following events appearing constantly…
NTDS ISAM Event ID: 467 database corruption Error,
NTDS Replication Event ID: 1084 Replication Error
NTDS Replication Event ID: 2108 Replication Error
NTDS General Internal Event ID: 1173 Processing Warning
From Observation, user account deleted from other DC is not replicated to this particular DC.
If this is just a child domain controller, I could have slept well last night already~ What make it worse is that under my enterprise AD, the one having the issue is my DC holding my RID and PDC emulator roles~!! Which some people will know them as part of the Flexible Single Master Operations (FSMO) roles.
Quick recall of which are the FSMO roles:
Schema Master (Whole Forest exist 1 only)
Domain naming master (Whole Forest exist 1 only)
Infrastructure Master (Whole Domain exist 1 only)
Relative ID (RID) Master (Whole Domain exist 1 only)
PDC Emulator (Whole Domain exist 1 only)
How to resolve the issue? (preferred solution for me)
Summary of my environment and situation:
- Single Forest with 2 Child Domains running on AD 2003
- One of the Domain Controller under one of my Child Domain is having AD Database Corruption
- The Domain Controller is holding on to RID and PDC emulator roles
- There are 3 other Domain Controller where one is holding on to the Infrastructure Master role and the other 2 are the bridgehead servers
For me, I will always play safe where I will not attempt to repair my AD database as this is one method that one should never try unless the DC is the only domain controller in the forest (no child domain) and you never backup your system state regularly!
For my situation, I will perform the following steps:
- Backup whatever data is on the DC that is having the issue
- Record down any special configuration such as DNS forwarding or even WINS
- Take the server offline (Boot to CD to begin your refresh of OS)
- From one of the bridghead server, Seize the RID master and PDC emulator master roles using ntdsutit.exe command. To do that, follow: Using Ntdsutil.exe to transfer or seize FSMO roles to a domain controller
- Why not the one with infrastructure Master? It is not preferred but if one has no choice, you can use it too. But remember to make it a GC first for the meanwhile.
- Why not transfer but seize? You can try but it should not work (For me, dun work). And think carefully, AD Database on that DC is already corrupted, you want to transfer the roles and DC-demote it? Better not, treat the server as if it is having OS Corruption instead. One should not risk the fact that the corrupted data will spread to other DC(s).
- Why not disable replication of the server and try to force dc-promote? For me, I have timeline to meet, since I will treat it as a case of OS corruption, I will not waste time to perform that. I will just take the DC offline.
You should also remove any DNS related entry to that DC.
So, for those who are still running AD on a single DC, time for you all to plan your Disaster Recovery!! As this can happen anytime!
Well, sure glad to be able to share this exercise with my new engineers and hope that they will learn from this! 🙂