If you missed the other articles in this series please read:
- Designing and Implementing Effective Disaster Recovery Strategies with Citrix Technology Part 1: The Plan
- Designing and Implementing Effective Disaster Recovery Strategies with Citrix Technology Part 3: The Data Store
In my last article, I discussed some basic steps to evaluate your environment and consider DR implications and solutions for it. Obviously this can be an incredibly difficult task, and no one article can encompass all of it. Hopefully however it provided you with some ideas about how to tackle your environment from a 1000 mile view. In this article, we will look at a traditional Citrix environment and how to apply DR techniques to the critical components that your Citrix environment might contain.
Let’s look at this from the point of view of a traditional Citrix environment using Presentation Server 4. We will take a farm with 5 PS4 application servers and an NFuse/Secure Gateway machine. The data store is hosted in a SQL2000 environment along with the Resource Manager database. Installation Manager is installed and used to deploy applications when possible, but there are some manual installations you just have to do. The applications installed are SAP, Office, Email, and a home grown database application. The applications are installed as follows:
ServerA: Office, Email
ServerB: Office, Email
ServerE: DB App
There are two datacenters available to use. Servers A, C, and E are hosted in DC1 along with the SQL server, license server, and NFuse/Secure Gateway box. Servers B and D are in DC2.
Do I want DR or Fault Tolerance?
Looking at the scenario above, the first question you have to ask is whether the solution you are looking for is Disaster Recovery or Fault Tolerance. There can be a considerable difference between the two goals. Fault Tolerance is the constant uptime of an environment for user access. In the scenario above, we are providing a Fault Tolerant solution for Office, Email, and SAP. Although there is no fail-over for user sessions, the application is load balanced and thus hosted on multiple servers. If one of the application servers goes down there is still availability to the application. This only takes into account the application server pieces however. Fault Tolerant solutions can and should be a part of your DR strategy for key applications, but they are not a true DR on their own.
Suppose in the above scenario that all the servers are hosted in the same datacenter. Something catastrophic happens, and your datacenter is completely down. That fault tolerant solution isn’t doing you a whole lot of good right now is it! Take it back to the scenario… you are a smart admin and have the advantage of multiple data centers, so you have divided your servers between the locations. You now have fault tolerance and disaster recovery capability for those application servers. What about the rest of the environment? If your users are dependent on the secure gateway/NFuse box for access, you are going to be faced with challenges to get them reconnected.
Breaking Down the Components
Application Servers – We have already covered this a little bit above. The application servers have been split between the DCs to provide full fault tolerance and DR. If one datacenter should go down, your application servers are still available in the other for user connections with the caveats discussed below. If you want to go a step further in providing disaster recovery, you can also maintain a set of machines in a DR location that are ONLY utilized in the case of a true DR situation. We currently do this for our own mission critical applications. Several Citrix servers sit in a hosted DR site. A second set of applications, identical to the originals, is published from that server in a folder labeled DR on the user’s Program Neighborhood. These boxes are only accessed if a true DR is declared, and the users have instructions about when to switch to their DR folder application set.
The servers are an insurance policy. If they are never used in their lifecycle then it is probably a good thing. The problem with this configuration is the reliance on the data store replication. See the data store section for some issues with a true split DR environment and how you can look to deal with them.
Licensing Server – MPS3 and PS4 moved licensing out of the data store and to an actual license server process. It needs to be hosted on a machine running IIS, and your application servers will periodically query the box for a license status. Every time an application server is rebooted it connects to the license server and then caches the license count locally. This allows your license server to be down up to 30 days before the application servers will no longer allow connections, which is a welcome change from the old model. License files are downloaded from MyCitrix.com and imported into the License Server console. Unfortunately, license files are generated based on host name and will not work on a different license servers without reclaiming them in Mycitrix and then reallocating them to the new server.
Citrix does suggest some methods for creating a DR environment for your license server. The first is to use MS Clustering on your web servers to provide High Availability. You would simply tie the license host name to the cluster service name, and you now have a High Availability environment. This will only help however if you experience a total server failure, since a network failure alone is not enough to trigger the Active to Passive failover. Having a clustered environment means that you will have to have a dedicated License Server. In a non-clustered environment you could host it on one of your existing boxes like the SQL server. And finally, it doesn’t really give you DR capability since the cluster has to be located in the same datacenter.
A second solution is to create backup license servers. This can simply be a clone of the production server that is kept offline until needed. Alternatively, you can build another identical box with a different name and then rename it and adjust your DNS if it is required. Honestly, given the 30 day grace period around the license server, it is one of the least critical components of the DR. The best alternative is to document your login to Mycitrix and have build instructions for recreating the license server and changing the farm settings so that it points to the new machine. It’s cheap and fairly easy.
Data store – One of the most complex issues around creating a true DR environment is how you handle your data store. These days, most administrators are choosing to host the data store on an external machine like a SQL 2000 server. Prior to MPS3, the loss of your data store was truly a critical event. You had 96 hours to recover the data store or no connections could be made. Since licensing has been separated from the data store and given its own grace period of 30 days, the criticality of the data store has decreased significantly. If your data store is down, configuration changes cannot be made to the farm environment. Your users will not have any impact however to their application access, and you have in essence a static environment until you can bring it back up.
In larger environments however there might be a critical need for data store accessibility. As an example, without the ability to change the configuration you can’t point your servers to a new license server. For situations like this there are several options available. If a High Availability solution is sought, then clustering the OS and SQL environments is a very valid tactic. It’s also an expensive one! For multi-site fault tolerance and DR, SQL replication can be established. This is what we chose to do with our hosted DR environment. The live SQL data store is replicated to our hosted DR environment where the DR Citrix servers sit. Those DR machines must point to the live data store (replication is a one way street, and your data would be out of sync otherwise) and be switched to the failover SQL server in case of a DR. The Replica server has to be promoted to be the live server to take it out of read-only mode.
For smaller environments that use a local data store, it is important that you regularly create backup copies of your data store using the dsmaint backup command. This will create a flat file that can be backed up and restored using whatever backup methodology you have in place. In the final part of this article series, we will address restoring the data store files and DR strategies for the rest of our Citrix environment. We will also look at alternative strategies for providing application access during downtime periods. So stay tuned!
If you missed the other articles in this series please read: