I was working with a client recently who had various Active Directory replication issues. They had a multi-domain forest in Windows Server 2008 R2 functional levels. Now the two fundamental errors were
- DNS zones duplicated between the domains (including the zones used by the subdomains)
- Network firewalls between the domain controllers
One big call out here is: if you’re considering placing firewalls between domain controllers with the aim of restricting traffic between them… don’t! It is not supported by Microsoft (hence no articles covering ports needed any longer) and there is little sense in mistrusting Domain Controllers and the people on them.
By the way you should ensure that Domain Admins, Schema Admins, Enterprise Admins can ONLY log into Domain Controllers (and other highly secured servers) with no one else allowed to log onto those, and using GPOs to prevent those privileged users from logging onto any workstation and member server.
Anyway we digress. After the network firewall rules were adjusted and DCs started replicating again, there were the expected issues with lingering objects which are relatively easy to resolve. There were some stale Domain Controller records for servers that had been decommissioned, again easy to fix. The next and bigger issue was “It has been too long since this machine last replicated with the named source machine. The time between replications with this source has exceeded the tombstone lifetime. Replication has been stopped with this source”
The normal response to this is demote and repromoted the Domain Controller. The complication here is that the subdomain had three DCs, two of them had the above error and a third had been more recently introduced and didn’t exist anywhere else in the Configuration partition.
- The subdomain was totally isolated in terms of replication
- The tombstone issue only happened for their domain partition (i.e. GC in the rest of the forest could only replicate the sub-domain with each other)
- Rebuilding the entire subdomain was not a great option
We decided to increase the tombstonelifetime by going into ADSIEdit, connecting to the Configuration partition, drilling down into “Services”; “Windows NT” and opening the properties of “Directory Services”. We changed the tomestonelifetime interval from 180 to 720 and then rebooted both sides of the replication relationship. Lo and behold the subdomain was accepted for replication and the problem resolved itself.
We changed the tombstonelifetime back to 180 days to go back to default. So far all has been good. It was a bit of a punt to try, but given we were at the stage of potentially needing to rebuild an entire live subdomain we felt it was worth a try.
Now if you have the same error message please be very careful that you understand the implications of what you’re doing. If you have any doubt then please get in touch BEFORE you take this step, we’re happy to have a quick chat to help you decide.