Hi all,
I've been busy in a Lab environment with a LDAP_MATCHING_RULE_IN_CHAIN query to validate its performance on a Windows 2012 R2 domain controller. The query comes from IBM's PureApps Administration console, and is pretty "hardcoded", using
the root of the Directory as basedn to check in a specified group which (nested) members it has.
The pureapp guys were complaining that the query took too much time to finish in our production environment, seeing it to take from 75 seconds to even time-outs. We were complaining that this query was consuming all cpu, causing decrease of speed in
LDAP service to other clients.
So I've started digging :) - details in the detail section...
Our conclusion so far:
In our AD DS setup, the query took by average
51 seconds to complete.
In our AD LDS setup, the query took by average 4 seconds to complete! Almost 13x faster!
Same ESX, same ammount of vCPU's, same amount of memory, same users, group and membership data (the AD LDS is a synchronized from AD DS via FIM).
Anyone a clue on this? Sure a Domain Controller is not a AD LDS server and has far more tasks & complexity. But that big difference? Me and colleagues are surprised to see this! We can duplicate this outside our lab in other environments (production,
acceptance, test, ...).
Kind regards,
David
Details
all on these queries:
https://msdn.microsoft.com/en-us/library/aa746475(v=vs.85).aspx
BaseDn: DC=contoso,DC=com
Scope: Subtree
Filter:
(&(objectCategory=person)(memberOf:1.2.840.113556.1.4.1941:=CN=Admins,OU=Groups,DC=contoso,DC=com))
which enumerates the users of this Admins Group, with support if nested. There are 3 users in that group, via Direct membership (not nested). The AD DS or AD LDS has 40.000 users.
Chapter 1: cpu! add more cpu!
Using perfmon on a dedicated ESX (32 vcpu's, PE M620 E5-269V2 2.7GHZ 12C (DELL) XEON E5 SAN attached) with only my VM running on it with 8Gbyte RAM + reboot between each test:
W2k12 R2 fully patched with x vCPU, fault 2 seconds:
1 vCPU: 62 seconds
2 vCPU: 58 seconds
3 vCPU: 57 seconds
4 vCPU: 48 seconds
6 vCPU: 51 seconds
8 vCPU: 50 seconds
16 vCPU: 59 seconds
Average: 51 sec
conclusion: one query is actually allocated to one cpu. The speed does not change with adding cpu's on a non loaded machine. The overal impact to the total cpu usage is of course lower with every extra cpu.
Chapter 2: There's caching!
In the above test in scenario 2, we've been repeating the same queries. When we did these within 1-2 minutes of the previous query, we could clearly see an improvement of+35% in answer time! resulting in an average of 33 seconds.
Conclusion; the caching helps, but is pratically not of use.
Chapter 3: our production is Windows 2008 R2, the lab is Windows 2012 R2!
Doing tests on the same hardware in the lab with a W2K8 R2 DC, we see these times:
2 vCPU: 75 seconds
(simmilar for increasing amount of vCPU's as in chapter 1)
Let's try if there's caching: 71 seconds. That's like 5% improvement.
Conclusion; Windows 2012 R2 is more efficient! :D
Chapter 4: well we also have AD LDS with the same data, let's try that!
Our AD LDS is a setup where we synchronize 2 AD DS environments to AD LDS with FIM, using an userproxyfull user class to make LDAP authentication (and authorization) nicely transparant. So the AD LDS is even 10.000 "users" bigger than the AD DS
where we've been testing against. Many user attributes like manager, telephone, name, location, company are included.
We see these times:
2 vCPU: 4 seconds average (fastest was 3, slowest 5)
Surprise, where did this come from? that's like almost 13 times faster!
Chapter 5: let's index "memberof"!
Why didn't Microsoft do that by default in AD DS? Mmm let's try anyway, knowing that indexing is only efficient if data is different enough for each user, and whith group memberships, we know, it isn't much different.
Result: we did not see improvement with every test.
Chpater 6: let's go social :)