In-Depth

Get Active Directory Replication Right!

There’s a method to the madness of Active Directory replication, but many of the concepts can be tough to decipher...

There’s a method to the madness of Active Directory replication, but many of the concepts can be tough to decipher. The following four tales demonstrate the range of problems you can encounter with this little-understood aspect of AD.

Scene 1: Deny the Defaults

A company with a hub and spoke network topology was unable to figure out why they had so many AD replication connections across their WAN links. They were tapping into one of AD’s most important features—the ability to use Sites to control replication and authentication traffic. Their AD design called for a single domain spanning all their WAN links, so they really needed to make sure replication was as tuned as possible. I’d seen the same problem they were having on numerous occasions and was confident of the solution.

The first thing I told them was they hadn’t done anything wrong with their configuration. Instead, they were seeing the results of a particular default AD setting. In order to minimize the amount of replication latency in AD, all Site Links are bridged by default. This means a domain controller from any site will try to create replication connections to DCs in any other site that has a DC from the same domain. In addition to that connection, there will also be a replication connection between every Site in order to replicate AD Schema and Configuration information.

Figure 1 shows that there were connection objects between all of the DCs in all the Sites. This is because all four DCs are from the same domain and, by default, all the Site Links are bridged. Changing the default settings involves the following steps:

  1. Open AD Sites and Services.
  2. Expand InterSite-Transports.
  3. Right click on IP and select Properties.
  4. On the General tab uncheck the box that says “Bridge all Site Links.”

After you uncheck this box, the number of replication connections will be reduced, after the Knowledge Consistency Checker (KCC) runs on every DC in the topology. This happens every 15 minutes by default but can be triggered manually through AD Sites and Services by highlighting the NTDS Settings under each DC, clicking Action and selecting, “Check Replication Topology”. Figure 2 shows how the replication connections changed after removing the Bridge All Sites feature.

Site Link settings left at defaults
Figure 1. Before: Leaving the default Site Link settings results in a plethora of connection objects and lots of replication traffic.

 

Changing Site Bridge properties
Figure 2. After: The same domain after making changes to the Site Bridging properties.

The company’s last requirement was to reduce replication latency between their two manufacturing sites. To facilitate this, we simply created a site link bridge and added the two manufacturing site links to it.

Scene 2: Satellite Slowdown
A company with several satellite WAN links was having problems getting AD replication to complete successfully. The WAN link bandwidth should have been enough to allow smooth replication, and they were confused about why it wasn’t happening. This was an issue I’d dealt with myself a few years ago, so I was familiar with the problems they were having. Figure 3 shows an example of their setup.

The first problem was that satellite links are notorious for having higher amounts of latency than other connections like frame relay. AD uses Remote Procedure Calls (RPC) as its default replication protocol; RPC is extremely susceptible to network latency. The first thing I suggested was the possibility of upgrading their WAN links from the satellite connections they were using. They said no, since they were committed to making replication work over their current connections.

Satellite links replicating poorly
Figure 3. Before: The satellite links of this company’s WAN weren’t replicating properly.

 

Reworked network topology
Figure 4. After: The reworked network topology included two new domains and addition of the SMTP protocol.

Active Directory replication has just two available protocols: RPC and Simple Mail Transport Protocol (SMTP). Since their links weren’t able to support RPC replication, their only other option was to switch to SMTP replication across the satellite connections.

First, though, we had to address some major Windows 2000-related SMTP replication restrictions. One is that SMTP replication is only available between sites, while RPC is the only protocol that you can use within a site. This makes sense, since you should have plenty of bandwidth within a site for RPC replication to work without any problems.

The most important restriction is that DCs from the same AD domain can’t use SMTP replication. So if this company wanted to use SMTP replication, they’d have to create a separate AD domain for every remote site that had a satellite connection.

They weren’t particularly excited about doing this, but in order to get their replication working and keep their satellite WAN links, they decided it would be the only solution that made sense. Global Catalog server, Schema, and configuration data is available through SMTP replication, so they were still able to provide a local Global Catalog server for these remote sites. Figure 4 shows what the SMTP replication topology looked like.

Configuring SMTP replication was a fairly straightforward process. For a step-by-step guide to setting it up, see “Additional Information.”

Scene 3: Beware Consultants Who Know Nothing
A company had been working with another consultant on its AD design but was questioning his recommendations. The consultant told them they should have DNS installed on every DC in their environment because AD replication wouldn’t work if you didn’t. Fortunately, I was able to help them go through a redesign before they implemented a solution that would have been difficult to maintain and support.

The advice they’d received was absolutely incorrect. AD was designed to use DNS to locate services running on DCs. It shouldn’t change the way you’d normally configure a DNS infrastructure; rather, it should just build off what’s already in place. Many companies choose to use BIND for DNS, which wouldn’t be running on the DCs since BIND is typically installed on either a Unix or Linux platform. Before talking too much about their DNS infrastructure, we revisited their AD domain design to ensure that they knew exactly what they wanted. This is always a good idea since every AD domain requires a DNS domain with the same name. Figure 5 shows an example of what their AD domain structure and DNS infrastructure looked like after following the advice of the consultant.

Deciding to create multiple forests is a big decision and one I never take lightly. Talking with this company’s IT department convinced me that they had good reason to have the division within their environment. The reason for having two forests is that they had a section of the network that was not as trusted as the rest, so they wanted those minimally trusted domains to have limited access to the rest of the network resources. They also wanted to ensure that the only DNS records accessible from the external network were for resources that should be seen.

DNS on every domain controller - bad!
Figure 5. Before: This company’s proposed network would have had DNS installed on every domain controller—not a good idea.

 

Using Shadow Zones
Figure 6. After: The redesigned DNS structure, using “Shadow zones” for the external forest. S.P. represents a Standard Primary zone, S.S. a Standard Secondary zone.

They were aware that with Win2K DNS, security can only be set on AD-integrated zone files, but they were still having trouble figuring out if their proposed solution would work. But since the DNS records are stored in the domain partition in AD, only DCs in the same domain can have an AD-integrated copy of a DNS zone file. So, for example, if a DC from the public1.net domain hosts an AD-integrated copy of the public1.net DNS domain, only other public1.net DCs can hold AD-integrated copies of that zone.

Another interesting caveat is that a DNS server that’s also a DC can host any DNS zone as an AD-integrated zone, including a zone that will be hosting records for a separate AD domain.

The company was also curious about what a change to their proposed DNS infrastructure would do to their AD replication topology. I explained that since the external network had a separate forest, there wouldn’t be any AD replication between the external and internal networks. I also showed them another option that would satisfy all their requirements.

Since they were going to stick with Win2K DNS, there was really only one feasible option to allow them to control what records were seen by the external network: Shadow zones. When using this method, the DNS servers in the external network actually have a primary copy of zone files used in the internal network. The internal domain admins ensure that any records for machines that should be seen by the external network are manually added to the external zone file. In this situation, the number of records was small, so it didn’t add much of an administrative burden. None of the AD service location records was needed in the external zone files because there wouldn’t be any replication between the two forests. Figure 6 shows the redesigned DNS infrastructure with the public1.net and public2.net name servers hosting shadow copies of the internal zones. This allows the internal administrators to control exactly what records they want visible to the public network.

Scene 4: Hidden Costs
A company with multiple redundant WAN links was having trouble getting their replication connections to work the way they wanted. The company had connections between two of their branch offices for redundancy, but figured that since there wasn’t much traffic going over the link it could be used to reduce replication latency. They’d changed the costs on their AD site links but still weren’t getting the desired result. Their main problem was a misunderstanding of how site costs work.

Although the AD connection objects showed the connections between the two branch offices, the replication traffic was still going over the two T1 links. To truly see what was going on, we diagrammed their router and site link costs in their environment (see Figure 7).

Diagramming costs
Figure 7. The excessive router cost between the two branches was forcing this company's traffic through the more saturated T1 links.

Notice that the actual network routing cost between the two branch offices is more than the combined cost between the branch offices and the corporate hub. This is obviously because the network traffic has been designed to go through the corporate site with the 256k link designed to be a backup connection. The AD site costs show that the cost between the branch offices is less than the combined cost between the branches and the corporate office; however, the traffic was actually going through the corporate office.

I’ve always felt that the costs of AD site links were one of the most difficult concepts to understand. The costs placed on Site Links affect only where the connection objects will connect within the replication topology. So for example, even though the Site Link cost will ensure that the connection objects will be directly between the DCs in the two branch office locations, the network costs force the actual traffic through the routers at the corporate office. One way to get the actual traffic to go directly over the 256 link between the two branch offices would be to change the network routing costs so that the cost between the two branch offices was less than the combined through the corporate office. This wasn’t optimal in the scenario, however, because that would force all traffic between the branch offices to follow that same path. The better way to get just the AD replication traffic to follow that path was to add routes directly to the DCs. This was done simply by using the command line “route add” command on the Win2K DCs in the branch offices. Normally DCs would communicate to each other through the use of their default gateways. The command, “route add destination ip mask 255.255.255.255 remote office router ip”, caused the DCs to communicate across the 256K connection. Note: The reason that the destination IP address was used and not the subnet is because we only wanted traffic between the DCs to go across that connection.

Additional Information

Read TechNet's "Active Directory Branch Office Planning Guide Series," to learn more about AD replication components and examples for implementing a branch office replication topology. It's available here: TechNet home | Products & Technologies | Active Directory | Windows 2000 Server | Deploy | Active Directory Branch Office Guide Series (or click here).

You'll find useful information in the Windows 2000 Resource Kit on AD architecture here: TechNet home | Products & Technologies | Windows 2000 Server | Resource Kits | Windows 2000 Server Distributed Systems Guide (or click here).

To learn more about configuring SMTP replication, visit TechNet home | Products & Technologies | Windows 2000 Server | How-To Resources | Step-by-Step Guide to Setting up ISM-SMTP Replication (or click here).

Replication Gratification
I’ve faced many challenges in the last couple of years working with AD. Every company I’ve worked with has had a unique environment, and I’m never surprised to see something I haven’t before. I hope that these tales will help you along your path to a smoothly replicating AD environment.

Featured

comments powered by Disqus

Subscribe on YouTube