I've gone through a whole range of scenarios/fixes for this problem, but I can't seem to fully pinpoint the cause. So I'm hoping someone here might have some insight.
Problem:
Clients (Windows 7, both physical and VDI) are able to log on, but aren't getting policies between 10 and 30% of the time.
DC's are 2003 and 2008 R2. (Two or three in each site, three sites).
What I've done:
-Gone over DNS setup and verified that everything is correct. No old entries, no missing entries (that I can see).
-Run dcdiag, which gives the A-OK; not a single error.
- Tried setting policies to wait for network on startup, even the dial-up wait policy.
-Enabled netlogon-logging for both DC's and clients (VDI)
-Verified that total query received/sec and sent/sec is in line in perfmon on the mail DC (2008 R2, holding all FSOM roles)
What I see:
-Netlogon on clients gives a whole bunch of these:
[CRITICAL] NetpDcGetNameIp: (Primary DC): No data returned from DnsQuery.
[CRITICAL] NetpDcGetName: (Primary DC): IP and Netbios are both done.
[MISC] DsGetDcName function returns 1355: Dom:(Primary DC).domain.local Acct:(null) Flags: LDAPONLY RET_DNS
[SITE] DsrGetSiteName: Returning site name '(Primary site)' from local cache.
[MISC] DsGetDcName function called: Dom:domain.local Acct:(null) Flags: LDAPONLY RET_DNS
[MISC] NetpDcInitializeContext: DSGETDC_VALID_FLAGS is c01ffff1
[MISC] NetpDcGetName: domain.local using cached information
[MISC] DsGetDcName function returns 0: Dom:domain.local Acct:(null) Flags: LDAPONLY RET_DNS
Patricularily "using cached information" is repeated. So I'm enterpreting this as it's using a whole lot of cached info due to not getting anything from DNS.
-Netlogon on DC holds a zilllion of these:
[MAILSLOT] (domain): Ping response 'Sam Logon Response Ex' (null) to \\(file server) Site: (Primary site) on UDP LDAP
[MAILSLOT] Received ping from (file server).domain.local. (null) on UDP LDAP
These:
[MISC] NetpDcInitializeContext: DSGETDC_VALID_FLAGS is c01ffff1
And these:
[MAILSLOT] Received ping from (Primary DC) (secondary dc).domain.local (null) on <Local>
[CRITICAL] Ping from (Primary DC) for domain (secondary dc).domain.local (null) for (null) on <Local> is invalid since we don't host the named domain.
[CRITICAL] NetpDcGetNameIp: (secondary dc).domain.local: No data returned from DnsQuery.
[MISC] (Domain): DsGetDcName function returns 1355: Dom:(Primary DC) Acct:(null) Flags: WRITABLE LDAPONLY RET_DNS
This is very obvious in the VDI environment, as profiles aren't roaming and users aren't getting their desktops and files.
Beyond it being a possible performance problem (this is all virtual/VMWare based), I'm stumped at this point.