OK. To start off, I’d like to point out that this is WAY less complicated than it sounds. The actual change takes about 5 minutes. It was the research and planning that took ALL of the time. Anyway, here goes!
THE PROBLEM
We recently identified a problem through System Center Operations Manager 2007 R2 with our Active Directory environment. A while back we had to forcibly demote our primary domain controller for DNS, DHCP, and the fSMORoleOwner. As a result, when the new DCs were put in there were still traces of the old configuration.
This was the error that allowed us to identify the problem.
AD Replication Partner Op Master Consistency : The script 'AD Replication Partner Op Master Consistency' failed to execute the following LDAP query: '<LDAP://DC3.MYDOMAIN.com/CN=Configuration,DC=MYDOMAIN,DC=com>;(&(objectClass=crossRefContainer)(fSMORoleOwner=*));fSMORoleOwner;Subtree'. The error returned was 'The server is not operational.' (0x80040E37)
I found that the fSMORoleOwner in ForestDNSZones and DomainDNSZones were both different and neither were correct. It should have been set to DC1 but the ForestDNSZone showed DC2 (a current DC) and the DomainDNSZone showed OLD-DC1.
CHECKING YOUR SYSTEM
You’ll need the following to check this out. Add these to ADSI Edit…
1. Configuration
2. DC=DomainDNSZones,DC=MyDOMAIN,DC=COM
3. DC=ForestDNSZones,DC= MyDOMAIN,DC=COM
After scouring the forums I found that the fix was to copy the settings from the distinguishedName attribute under Configuration in ADSI Edit for the correct fSMORoleOwner. I took that information and updated the owner attribute under DomainDNSZones and ForestDNSZones.
THE GOTCHAS
A couple of gotchas along the way…
1. This must be done under on the actual infrastructure master.
2. Formatting. This really is common sense but it got me. I read in several places to copy the distinguished name. However, that value is in a different format than the fSMORoleOwner property. Just be sure to grab a copy of the existing fSMORoleOwner value for two reasons.
a. Copy the proper formatting
b. Revert if there was some weird reason you’d need to (unlikely).
3. For some reason this kept throwing me off. CN=Infrastructure is at the root of the Domain/ForestDNSZones. If you just expandeded that container you wouldn’t see it.
distingushedName example
CN= DC1,CN=Servers,CN=Site1,CN=Sites,CN=Configuration, DC=MyDOMAIN,DC=com
fSMORoleOwner example
CN=NTDS Settings,CN=DC1,CN=Servers,CN=Site1,CN=Sites,CN=Configuration,DC=MyDOMAIN,DC=com
So back to the forums for one last double check.
1. Some people pointed to Anti-virus issues. In our environment this was NOT the issue. OpsMgr was right on the money.
2. One thing that I had questioned but wasn’t necessary for us was a metadata cleanup. I would say it is definitely something anyone should do in this situation. However, we found that the metadata was actually removed properly. I point this out so people don’t assume the cleanup will fix the issue. It’s just good advice and best practice.
3. There is a script “fixfsmo.vbs” that will do the same. I didn’t use it because it was just as easy to update it manually. The script is pretty basic. It checks the current, checks what it should be, updates if it doesn’t match.
CHECKLIST
Phase 1 - fSMO cleanup
1. Perform AD Backup (system state and system drive) NTBACKUP.exe
2. Turn off extra domain controller (if you have one available)
a. This was a great idea my supervisor had. If anything when wrong we could make it look like this DC had the latest version of settings and replicate that back to the others.
b. Update fSMORoleOwner in forestDNSZones and domainDNSZones
3. TEST!
a. Confirm logging in on all DCs works, VPN, etc. etc.
b. If no issues, power on the extra DC.
4. Validate replication using repadmin /replsum (I had to force the replication to the DC that was powered off)
5. Confirm that OpsMgr errors go away
Phase 2 - DC metadata cleanup
- Repeat backup process
- Run cleanup utility
Scripted Method– pretty cool
Manual Method from command prompt
Ntdsutil.exe
Ntdsutil.exe: metadata cleanup
metadata cleanup: remove selected server <server name>
- Repeat testing
RESOURCES
Here were some other valuable resources I referenced along the way
Hopefully someone finds this helpful. I know it was painful for me assembling all of this information, planning testing. In the end there were no reboots required, no downtime, nothing like that. But there’s something to be said for the peace of mind after having all of the necessary information.
-Shep
No comments:
Post a Comment