|
DSX Access Systems, Inc. |
Clustered SQL Servers
![]()
SQL Server 2000 Failover Clustering
Continuous uptime in a production environment is becoming a
common security requirement. This paper describes Microsoft® SQL Server™ 2000
failover clustering, one method of creating high availability. Failover
clustering is available only in SQL Server 2000 Enterprise Edition.
Failover clustering is a process in which the operating
system and SQL Server 2000 work together to provide availability in the event
of an application failure, hardware failure, or operating-system error.
Failover clustering provides hardware redundancy through a configuration in
which mission critical resources are transferred from a failing machine to an
equally configured server automatically. Failover clustering also allows system
maintenance to be performed on one computer while another computer (node) does
the work. This benefit can also ensure that system downtime due to normal
maintenance is minimized.
The goal of failover clustering is to provide high
availability for an overall solution that accommodates backups, redundancy, and
performance. If software and/or hardware problems occur, failover clustering
can enable a production environment to stay up and running.

The clustered nodes use a heartbeat (see above) to check
whether each node is alive, at both the operating system and SQL Server level.
At the operating system level, the nodes in the cluster compete for the
resources of the cluster. The primary node reserves the resource every 3 seconds,
and the competing node every 5 seconds. The process lasts for 25 seconds and
then starts over again. For example, if the node owning the instance fails due
to a problem (network, disk, and so on), at second 19, the competing node
detects it at the 20-second mark, and if it is determined that the primary node
no longer has control, the competing node takes over the resource.
From a SQL Server perspective, the node hosting the SQL
Server resource does a looks-alive check every 5 seconds. This is a lightweight
check to see whether the service is running and may succeed even if the
instance of SQL Server is not operational. The IsAlive
check is more thorough and involves running a SELECT @@SERVERNAME Transact SQL
query against the server to determine whether the server itself is available to
respond to requests; it does not guarantee that the user databases are up. If
this query fails, the IsAlive check retries five
times and then attempts to reconnect to the instance of SQL Server. If all five
retries fail, the SQL Server resource fails. Depending on the failover
threshold configuration of the SQL Server resource, Windows Clustering will
attempt to either restart the resource on the same node or fail over to another
available node. The execution of the query tolerates a few errors, such as
licensing issues or having a paused instance of SQL Server, but ultimately
fails if its threshold is exceeded.
During the fail over from one node to another, Windows
clustering starts the SQL Server service for that instance on the new node, and
goes through the recovery process to start the databases. The fail over of the
SQL Server virtual server will take a short time (probably seconds). After the
service is started and the master
database is online, the SQL Server resource is considered to be up. Now the
user databases will go through the normal recovery process, which means that
any completed transactions in the transaction log are rolled forward, and any
incomplete transactions are rolled back. The length of the recovery process
depends on how much activity must be rolled forward or rolled back upon
startup.
WinDSX SQL connects to a SQL Server 2000 Virtual Server with
the SQL Server Name or IP address of the SQL Server 2000 Virtual Server. The
Cluster name, the Cluster IP address, or even the individual node names are not
used by the connections. From a client or application perspective, it does not
need to worry about which node owns the resources, because connecting to the
SQL Server 2000 Virtual Server appears as a normal SQL Server. During the
failover process, any active connections are broken. WinDSX SQL checks to see
whether the connection exists, and if not, reconnects.
DSX Example
of Clustered SQL Servers
DSX configured and tested Clustered SQL Servers in
the following manner. Listed below is the equipment
used and the configuration.
·
PCs - 3 GHZ with
1G of
·
OS - Advanced
Server 2000 and MS SQL Server 2000
·
Dell Power Vault 220 shared
drive array. The Power Vault can handle
up to 14 drives but you lose 1 slot when you cluster the array. The Power Vault
has two drives that are mirrored for the quorum files that the Cluster
administrator uses. It also has 3 drives that are configured Raid 5 for the SQL
server data files. This raid drive also has our file share resource so that
when the SQL server fails over so does our shared folder. You could also have
another 3 drives for the SQL transaction logs if you wanted. There are several hotswap drives for failover redundancy.
·
Each PC has an
\\\ Both PC’s run the cluster administration program
and are capable of taking control of the cluster. We are using SQL server in an
Active/Passive setup. Since this is a stand alone system both PC’s are domain
controllers and run Active Directory as well as DNS servers. The Heartbeat is
facilitated by using a second Network card in both PCs connected together via a
network HUB.