Hi all,
An interesting issue - we have 2 DC's - one Primary DC FSMO role holder/DNS Server and other Secondary DC/DNS Server that both replicate with each other. All of a sudden in the past few days we are not getting any replication occurring on DC2.
Basic tests performed through AD Sites & Services:
DC1 > DC2 replication PASS
DC2 > DC1 replication FAIL
DNS has been checked and I believe it is causing the problem on DC2.
In the Event Log on DC2, the DNS Server Service is currently smashing the event log every second of the day:
EVENT 4015
The DNS server has encountered a critical error from the Active Directory. Check that the Active Directory is functioning properly. The extended error debug information (which may be empty) is "%1". The event data contains the error.
We can restart DC2, and then DNS Server Service runs fine for approximately 10 minutes, replication DOES occur (new users show up in AD/passwords are matched to DC1), and then the service eventually bombs out again, causing THOUSANDS of Event ID 4015's.
on DC2: repadmin /syncall:
CALLBACK MESSAGE: Error contacting server 9694a21f-dc86-4dde-87bc-5595c3d5967e._msdcs.DC.local (network error): -2146893022 (0x80090322):
The target principal name is incorrect.
CALLBACK MESSAGE: SyncAll Finished.
SyncAll reported the following errors:
Error contacting server 9694a21f-dc86-4dde-87bc-5595c3d5967e._msdcs.DC.local (network error): -2146893022 (0x80090322):
The target principal name is incorrect.
on DC1: repadmin /syncall: Runs successfully.
Currently on DC1, all DCDIAG /test:DNS tests return back successfully.
Here is the output of DC2 DCDIAG /test:DNS results:
Directory Server Diagnosis
Performing initial setup:
Trying to find home server...
Home Server = DC2
* Identified AD Forest.
[DC1] LDAP bind failed with error 8341,
A directory service error has occurred..
Got error while checking if the DC is using FRS or DFSR. Error:
A directory service error has occurred.The VerifyReferences, FrsEvent and
DfsrEvent tests might fail because of this error.
Done gathering initial info.
Doing initial required tests
Testing server: Cloud\DC2
Starting test: Connectivity
......................... DC2 passed test Connectivity
Testing server: Cloud\DC1
Starting test: Connectivity
The GUID based DNS Name resolved to several IPs
(fd4b:49f1:f07e::1, 172.16.1.1), but not all were pingable.
Replication and other operations may fail if a non-pingable IP is
chosen. The first pingable IP is 172.16.1.1.
Got error while checking LDAP and RPC connectivity. Please check your
firewall settings.
......................... DC1 failed test Connectivity
Doing primary tests
Testing server: Cloud\DC2
Testing server: Cloud\DC1
Starting test: DNS
DNS Tests are running and not hung. Please wait a few minutes...
Starting test: DNS
......................... DC1 failed test DNS
......................... DC2 passed test DNS
Running partition tests on : ForestDnsZones
Running partition tests on : DomainDnsZones
Running partition tests on : Schema
Running partition tests on : Configuration
Running partition tests on : DC
Running enterprise tests on : DC.local
Starting test: DNS
Test results for domain controllers:
DC: DC1.DC.local
Domain: DC.local
TEST: Authentication (Auth)
Error: Authentication failed with specified credentials
TEST: Basic (Basc)
Error: No LDAP connectivity
Error: No WMI connectivity
No host records (A or AAAA) were found for this DC
Summary of test results for DNS servers used by the above domain
controllers:
DNS server: fd4b:49f1:f07e::1 (DC1.DC.local.)
1 test failure on this DNS server
PTR record query for the 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa failed on the DNS server fd4b:49f1:f07e::1
Summary of DNS test results:
Auth Basc Forw Del Dyn RReg Ext
_________________________________________________________________
Domain: DC.local
DC1 FAIL FAIL n/a n/a n/a n/a n/a
......................... DC.local failed test DNS
Right so I know there is an IPv6 configuration error on DC1/DC2, it's been like that for 6 months as we havn't set up the records 100%, but replication has been occurring fine and only stopped in the past 2 days.
Everything works fine and should replicate on IPv4.
Around the same time the FIRST EVENT 4015 started on DC2 we ran Windows Updates, and there is about 47 pending updates for Server 2012 . Some updates have failed around the same time 4015 was first generated, but it hasn't happened since that day. 2 days ago,
it's been constant and no replication has occurred. AS mentioned above, we can reboot DC2, get replication working for approx. 10 minutes, then DNS Server seems to trip out according to Event Viewer, and no replication will work from DC2 > DC1.
We've run netdom passwd reset on both DC1/DC2 and reset both DC's. Nothing will stop event 4015 on DC2!!
It seems like a textbook DNS related issue on DC2 but so far we have had no luck repairing it.
Any ideas or suggestions would be appreciated.