Public and Private Networks
Each node of a cluster must have at least two network
cards to be a fully supported installation. One network card is
connected to the public network, and the other network card will be
connected to a private cluster network. (Read Part 1 of this two-part article here.)
- The public network is the network to which the client applications connect. This is how they communicate to a clustered SQL Server instance using the clustered IP address and clustered SQL Server name. It is recommended to have two teamed network cards for the public network for redundancy and to improve availability.
- The private network is used solely for communications between the
clustered nodes. It is used mainly for the heartbeat communication. Two
forms of communications are executed:
- LooksAlive: Verifies that the SQL Server service runs on the online node every 5 seconds by default
- IsAlive: Verifies that SQL Server accepts connections by executing sp_server_diagnostics.
This health detection logic determines if a node is down and the passive node then takes over the production workload.
The SQL Server Instance
Surprisingly, SQL Server client applications don’t need
to know how to switch communicating from a failed cluster node to the
new active node or anything else about specific cluster nodes (such as
the NETBIOS name or IP address of individual cluster nodes). This is
because each clustered SQL Server instance is assigned a Network name
and IP address, which client applications use to connect to the
clustered SQL Server. In other words, client applications don’t connect
to a node’s specific name or IP address but instead to the cluster SQL
network name or cluster SQL IP address that stays consistent and fails
over. Each clustered SQL Server will belong to a Failover Cluster
Resource Group that contains the following resources that will fail
together:
- SQL Server Network Name
- IP Address
- One or more shared disks
- SQL Server Database Engine service
- SQL Server Agent
- SQL Server Analysis Services, if installed in the same group
- One file share resource, if the FILESTREAM feature is installed
Assume that a single SQL Server 2012 instance runs on
the active node of a cluster and that a passive node is available to
take over when needed. At this time, the active node communicates with
both the database and the quorum on the shared disk array. Because only a
single node at a time can access the shared disk array, the passive
node does not access the database or the quorum. In addition, the active
node sends out heartbeat signals over the private network, and the
passive node monitors them, so it can take over if a failover occurs.
Clients are also interacting with the active node via the clustered SQL
Server name and IP address while running production workloads.
Now assume that the active node stops working because of
a power failure. The passive node, which is monitoring the heartbeats
from the active node, notices that the heartbeats stopped. After a
predetermined delay, the passive node assumes that the active node has
failed and initiates a failover. As part of the failover process, the
passive node (now the active node) takes over control of the shared disk
array and reads the quorum, looking for any unsynchronized
configuration changes. It also takes over control of the clustered SQL
Server name and IP address. In addition, as the node takes over the
databases, it has to perform a SQL Server startup and recover the
databases.
The time this takes depends on many factors, including
the performance of the hardware and the number of transactions that
might have to be rolled forward or back during the database recovery
process. When the recovery process is complete, the new active node
announces itself on the network with the clustered SQL Server name and
IP address, which enables the client applications to reconnect and begin
using the SQL Server 2012 instance after this minimal interruption.
No comments:
Post a Comment