This tutorial will explain step-by-step how to setup Sun Cluster 3.2 (two-node cluster) using VMware Server (free).
This environment is very useful for lab purposes and also as a tool to study for SC 3.2 certification.
First of all, you need to download and install VMware Server (I'm currently using version 1.0.8). After installing VMware, we'll need to setup two Solaris 10 VM servers (use the custom option on the virtual machine configuration).
Now, this is a very important part to pay attention (#1):
2 - If your CPU doesn't have support for 64-bit virtual machines (in my case), VMware will show a message stating that 64-bit OSs won't be supported:
3 - If this message doesn't appear, then you probably have a CPU that supports 64-bit virtualization (so you can proceed with Solaris 10 - 64-bits).
4 - If you don't have support for 64 bits VM, you will have to use "Solaris 10" version (I will give further instructions regarding how to download/install 32-bit modules for Sun Cluster - without this module you won't be able to use SC 3.2 under 32-bit VMs).
5 - Memory for VM - I strongly recommend setting at least 1GB for each VM, if you plan to use Oracle DBs under the cluster management (otherwise 512-768 MB should suffice).
6 - Network connection - Leave the default option (Use bridged networking - later on we'll setup two additional network adapters that will be used as the transports for the cluster).
7 - SCSI Adapter - Choose "LSI Logic"
8 - Disk - Choose "Create a new virtual disk", SCSI as virtual disk type and 15 GB of disk space
9 - Now, we need to setup two additional network adapters for the cluster transporter (for that, click on Edit virtual machine settings and then click on Add button.
10 - Choose Ethernet Adapter as the hardware type.
11 - Choose Custom and VMnet1 (Repeat the steps 9-10-11, but now you will have to choose VMnet2)
12 - Go to Host >> Virtual network settings in the menu.
14 - Now we need to add another disk that will be the quorum disk (click on Edit Virtual Machines and then click on the ADD button)
15 - Add the disk as a SCSI disk, name it as quorum.vmdk and use around 100MB of disk space.
16 - Create a similar VM (another hostname) with the same HW settings (after that, add the quorum.vmdk using the same SCSI ID using the option "Use an existing virtual disk")
17 - Now, another important part - you will need to edit your VM config file to allow SCSI reservations (for that you need to disable disk locking and add the following lines into your VM config file - Hint: VM config file ends with a .VMX extension):
disk.locking = "false"
scsi0.sharedBus = "virtual"
-- These lines must be added on both VMs config file --
Now we are all set to start installing Solaris 10 and configure Sun Cluster. Boot your VM from the CD-ROM using a recent Solaris 10 ISO image (I'm assuming you have enough knowledge to make a fresh Solaris 10 install, so I will only specify the filesystem sizes to be set during the install):
Wait for the installation/reboot to finish and we should have 2 VMs running Solaris 10 (32 or 64 bits). Now we need to install VMware tools to convert the PCN devices to VMXNET devices (I had some problems while trying to install SC 3.2 using PCN, so I strongly recommend to install it).
To install VMware tools, click with the right button on your VM name and choose "Install VMware tools" (have in mind that you need to unmount any cd-rom currently mounted - after clicking it, VMware will mount a virtual cdrom on the server(s) containing the drivers you will have to install):
/vol/dev/dsk/c0t0d0/vmwaretools 1568 1568 0 100% /cdrom/vmwaretools
Go to /cdrom/vmwaretools (or whatever the mount point is), copy the file to /var/tmp/vmtools, uncompress it and run the installation perl script:
# cd /cdrom/vmwaretools
total 2973
dr-xr-xr-x 2 root sys 2048 out 30 22:39 .
# mkdir /var/tmp/vmtools
# cp vmware-solaris-tools.tar.gz /var/tmp/vmtools
# cd /var/tmp/vmtools
# gzcat vmware-solaris-tools.tar.gz | tar xvf -
# cd vmware-tools-distrib
# ./vmware-install.pl
Hit ENTER for all options (default) and when you finish don't forget to do the following:
* Rename your hostname.* files on /etc:
# ls -l /etc/hostname*
-rw-r--r-- 1 root root 6 mar 12 16:48 /etc/hostname.pcn0
# mv /etc/hostname.pcn0 /etc/hostname.vmxnet0
* Replace entries on /etc/path_to_inst (from PCN to VMXNET):
# cp /etc/path_to_inst /etc/path_to_inst.backup ; sed -e 's/pcn/vmxnet/g' /etc/path_to_inst >> /etc/path_to_inst.2 ; mv /etc/path_to_inst.2 /etc/path_to_inst
* Reboot the server (init 6)
After reboot process you should have the vmxnet module properly loaded and your vmxnet0 interface up (check with ifconfig -a).
To make things easier during the install, create a new ssh key as root (hit ENTER for everything) and copy the public file for the secondary node (/.ssh).
# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (//.ssh/id_dsa):
Created directory '//.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in //.ssh/id_dsa.
Your public key has been saved in //.ssh/id_dsa.pub.
The key fingerprint is:34:8e:32:54:b4:bb:d2:5e:71:ec:55:01:09:cd:34:a3 root@earth
Connect on the secondary node and rename the public file to authorized_keys
Try to connect by SSH/SFTP and see if you will be asked for a password (if you did everything right, you won't be asked for a password).
We are ready to start installing/configuring Sun Cluster. Download Sun Cluster 3.2 x86, upload it on both VMs, unzip and run installer (on Solaris_x86 subdir).
* Choose option 4,5,6 to install (everything else is default - hit ENTER)
Enter a comma separated list of products to install, or press R to refresh the list [] {"<" goes back, "!" exits}: 4,5,6
Choose Software Components - Confirm Choices
--------------------------------------------
Based on product dependencies for your selections, the installer will install:
[X] 4. Sun Cluster 3.2
* State in the email that you wan to use it only for *LAB* purposes (he told me that Sun doesn't give support for 32-bit clusters, thus you will only be able to obtain this module by sending him an email)
When you download the 32-bit module, install it after unpacking by running pkgadd -d SUNWscka (if you do not install it and your VMs are 32-bit, the cluster WILL NOT work and the modules WILL NOT be loaded during the boot - so, remember to install it if your VMs are 32-bit).
# scinstall
*** Main Menu ***
Please select from one of the following (*) options:
* 1) Create a new cluster or add a cluster node
2) Configure a cluster to be JumpStarted from this install server
3) Manage a dual-partition upgrade
4) Upgrade this cluster node
5) Print release information for this cluster node
* ?) Help with menu options
* q) Quit
Option: 1
*** New Cluster and Cluster Node Menu ***
Please select from any one of the following options:
1) Create a new cluster
2) Create just the first node of a new cluster on this machine
3) Add this machine as a node in an existing cluster
?) Help with menu options
q) Return to the Main Menu
Option: 1
*** Create a New Cluster ***
This option creates and configures a new cluster.
You must use the Java Enterprise System (JES) installer to install
the Sun Cluster framework software on each machine in the new cluster
before you select this option.
If the "remote configuration" option is unselected from the JES
installer when you install the Sun Cluster framework on any of the
new nodes, then you must configure either the remote shell (see
rsh(1)) or the secure shell (see ssh(1)) before you select this
option. If rsh or ssh is used, you must enable root access to all of
the new member nodes from this node.
Press Control-d at any time to return to the Main Menu.
Do you want to continue (yes/no) [yes]?
>>> Typical or Custom Mode <<<
This tool supports two modes of operation, Typical mode and Custom.
For most clusters, you can use Typical mode. However, you might need
to select the Custom mode option if not all of the Typical defaults
can be applied to your cluster.
For more information about the differences between Typical and Custom
modes, select the Help option from the menu.
Please select from one of the following options:
1) Typical
2) Custom
?) Help
q) Return to the Main Menu
Option [1]:
>>> Cluster Name <<<
Each cluster has a name assigned to it. The name can be made up of
any characters other than whitespace. Each cluster name should be
unique within the namespace of your enterprise.
What is the name of the cluster you want to establish? cluster-lab
>>> Cluster Nodes <<<
This Sun Cluster release supports a total of up to 16 nodes.
Please list the names of the other nodes planned for the initial
cluster configuration. List one node name per line. When finished,
type Control-D:
Node name (Control-D to finish): earth
Node name (Control-D to finish): mars
Node name (Control-D to finish): ^D
This is the complete list of nodes:
earth
mars
Is it correct (yes/no) [yes]?
Attempting to contact "mars" ... done
Searching for a remote configuration method ... done
The Sun Cluster framework is able to complete the configuration
process with secure shell access (sshd).
Press Enter to continue:
>>> Cluster Transport Adapters and Cables <<<
You must identify the two cluster transport adapters which attach
this node to the private cluster interconnect.
For node "earth",
What is the name of the first cluster transport adapter? vmxnet1
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
All transport adapters support the "dlpi" transport type. Ethernet
and Infiniband adapters are supported only with the "dlpi" transport;
however, other adapter types may support other types of transport.
For node "earth",
Is "vmxnet1" an Ethernet adapter (yes/no) [no]? yes
Is "vmxnet1" an Infiniband adapter (yes/no) [no]?
For node "earth",
What is the name of the second cluster transport adapter? vmxnet2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "earth",
Name of the switch to which "vmxnet2" is connected [switch2]?
For node "earth",
Use the default port name for the "vmxnet2" connection (yes/no) [yes]?
For node "mars",
What is the name of the first cluster transport adapter? vmxnet1
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "mars",
Name of the switch to which "vmxnet1" is connected [switch1]?
For node "mars",
Use the default port name for the "vmxnet1" connection (yes/no) [yes]?
For node "mars",
What is the name of the second cluster transport adapter? vmxnet2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "mars",
Name of the switch to which "vmxnet2" is connected [switch2]?
For node "mars",
Use the default port name for the "vmxnet2" connection (yes/no) [yes]?
>>> Quorum Configuration <<<
Every two-node cluster requires at least one quorum device. By
default, scinstall will select and configure a shared SCSI quorum
disk device for you.
This screen allows you to disable the automatic selection and
configuration of a quorum device.
The only time that you must disable this feature is when ANY of the
shared storage in your cluster is not qualified for use as a Sun
Cluster quorum device. If your storage was purchased with your
cluster, it is qualified. Otherwise, check with your storage vendor
to determine whether your storage device is supported as Sun Cluster
quorum device.
If you disable automatic quorum device selection now, or if you
intend to use a quorum device that is not a shared SCSI disk, you
must instead use scsetup(1M) to manually configure quorum once both
nodes have joined the cluster for the first time.
Do you want to disable automatic quorum device selection (yes/no) [no]?
Is it okay to create the new cluster (yes/no) [yes]?
During the cluster creation process, sccheck is run on each of the
new cluster nodes. If sccheck detects problems, you can either
interrupt the process or check the log files after the cluster has
been established.
Interrupt cluster creation for sccheck errors (yes/no) [no]?
Cluster Creation
Log file - /var/cluster/logs/install/scinstall.log.1312
Testing for "/globaldevices" on "earth" ... done
Testing for "/globaldevices" on "mars" ... done
Started sccheck on "earth".
Started sccheck on "mars".
sccheck completed with no errors or warnings for "earth".
sccheck completed with no errors or warnings for "mars".
Configuring "mars" ... done
Rebooting "mars" ... done
Configuring "earth" ... done
Rebooting "earth" ... done
------------------------------------------------------------------
--------- ------
Cluster node: mars Online
Cluster node: earth Online
-------- -------- ------
Transport path: mars:vmxnet2 earth:vmxnet2 Path online
Transport path: mars:vmxnet1 earth:vmxnet1 Path online
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node --
--------- ------- -------- ------
Node votes: mars 1 1 Online
Node votes: earth 1 1 Online
-- Quorum Votes by Device --
----------- ------- -------- ------
Device votes: /dev/did/rdsk/d3s2 1 1 Online
------------ ------- ---------
-- Device Group Status --
------------ ------
-- Multi-owner Device Groups --
------------ -------------
------------------------------------------------------------------
--------- ----- ------ ------- ------
IPMP Group: mars sc_ipmp0 Online vmxnet0 Online
For example, if you do an scstat, or an scstat -W you see:
Transport path: mail-store1:e1000g2 mail-store0:e1000g2 faulted
Transport path: mail-store1:e1000g1 mail-store0:e1000g1 Path online
(at boot it might be “waiting” for quite some time)
In some cases you can disconnect and reconnect the adapter in VMware. However, in others you may have to be more drastic.
Check you can ping the other node via this path - if you can, then you should be all good to run the following commands:
scconf -c -m endpoint=mail-store0:e1000g2,state=disabled
where mail-store0 is your current node, and e1000g2 is the failed adapter. After you’ve done this, you can re-enable it:
scconf -c -m endpoint=mail-store0:e1000g2,state=enabled
And you should now have an online path shortly afterwards:
bash-3.00# scstat -W
Endpoint Endpoint Status
-------- -------- ------
Transport path: mail-store1:e1000g2 mail-store0:e1000g2 Path online
Transport path: mail-store1:e1000g1 mail-store0:e1000g1 Path online
Cluster Panics with pm_tick delay [number] exceeds [another number]
Try the following:
- Stop VMs being paged to disk in VMWare (only use physical memory for your VMs). This is a VMWare server, host setting from memory
- Ensure Memory Trimming is disabled for your VMware Server Sun Cluster Guests
- On each Cluster node, in order, configure the heartbeats to be father apart, and have a longer timeout:
scconf -w heartbeat_timeout=60000
scconf -w heartbeat_quantum=10000
Hopefully this will leave you with a much more stable cluster on VMware.
Making a Customized Application Fail Over With a Generic Data Service Resource
In this task, you can see how easy it is to get any daemon to fail over in the cluster, by using the Generic Data Service (so that you do not have to invent your own resource type).
Perform the following steps on the nodes indicated:
1. On all nodes (or do it on one node and copy the file to other nodes in the same location), create a daemon that represents your customized application:
# vi /var/tmp/myappdaemon
#!/bin/ksh
while :
do
sleep 10
done
2. Make sure the file is executable on all nodes.
3. From any one node, create a new failover resource group for your application:
# clrg create -n node1,node2,[node3] myapp-rg
4. From one node, register the Generic Data Service resource type:
# clrt register SUNW.gds
5. From one node, create the new resource and enable the group:
# clrs create -g myapp-rg -t SUNW.gds \
-p Start_Command=/var/tmp/myappdaemon \
-p Probe_Command=/bin/true -p Network_aware=false \
myapp-res
# clrg online -M myapp-rg
6. Verify the behavior of your customized application.
a. Verify that you can manually switch the group from node to node.
b. Kill the daemon. Wait a little while and note that it restarts on the same node. Wait until clrs status shows that the resource is fully online again.
c. Repeat step b a few times. Eventually, the group switches over to the other node.
Thanks for the great post! Very clear and helpful.
ReplyDelete-Kapil
Good write up. Thanks.
ReplyDeleteTim Read
Hi,
ReplyDeleteCan you advise what update of Solaris 10 and Sun Cluster 3.2 you used?
I'm using 3.2u2 sol10 10/08 u6 x86 and
I'm running into all sort of difficulties when installing the 32bit package (kindly sent by Matthias Pfuetzner).
When installing the package (after in sc installer and before a scinstall/reboot), it comes up with during the reboot:
...
Could not load DID instance list
...
WARNING: CCR: cant read CC metadata
...
...
I then cannot boot the node - only in failsafe mode.
If I do not install the 32bit package, I can run scinstall and reboot the node, but obviously the cluster isnt active.
Any advice would be helpful
Regards,
Jag
Hi Jag,
ReplyDeleteThanks for visiting my blog.
The solaris version that I’m using is: Solaris 10 5/08 x86
SC version:
# clnode show-rev -v
Sun Cluster 3.2 for Solaris 10 i386
SUNWscu: 3.2.0,REV=2006.12.05.21.06
SUNWsccomu: 3.2.0,REV=2006.12.05.21.06
SUNWsczr: 3.2.0,REV=2006.12.05.21.06
Try to check if you have /globaldevices mounted on both nodes before installing/configuring Sun Cluster and also if you have a SCSI quorum disk shared for both nodes (disk locking must be disabled, as I have explained in the blog).
This issue might be related with /globaldevices (Sun Cluster can’t reach the list of device IDs).
Try to follow my steps carefully and hopefully you will make it… Feel free to keep in touch if you have any question.
Hi Rod
ReplyDeleteThank u for the wonderful post. I finally installed cluster 3.2u2 on Solaris 10 u5. This going to help me for my cluster certification. Thanks once again.
Jibby
Valeu rod!! Estava precisando de um passo para começar estudar suncluster!!
ReplyDeleteAbraço!
HI Rod,
ReplyDeleteI was unable to configure the sun cluster.
My virtual machine are not coming up after booting of the servers.
I have few doubts
1)U have renamed the pcno interface to vmxnet0 how abt the private interface ? do we need to change the name even for it to vmxnet1.
2)The name given to tranport adapter is given as vmxnet1 and vmxnet2 . Can this be any name or we nedd to give only vmxnet1 and vmxnet2.
My interface names are e1000g0 and e1000g1 and i have not renamed e1000g0 to vmxnet0 is this the reason why my cluster configuration failed.
Please do provide with ur valuable inputs.
Hi Rod,
ReplyDeleteI have some doubts regarding the transport adaptor
1) Is it necessary to rename the interface name to vmxnet0.
2) Do we need to change the name for private interface also.
3)Can the names given to trasnport adaptor be any name or it should be vmxnet1 and vmxnet2 only.
Please do provide me with ur valuble inputs as i ma encountering with some issues while installing.
Cluster Creation
ReplyDeleteLog file - /var/cluster/logs/install/scinstall.log.1312
Testing for "/globaldevices" on "mac1" ... done
Testing for "/globaldevices" on "mac2" ... done
Started sccheck on "mac1".
Started sccheck on "mac2".
sccheck completed with no errors or warnings for "mac1".
sccheck completed with no errors or warnings for "mac2".
Configuring "mac2" ... done
Rebooting "mac2" ... done
(mac2 gets rebooted) but displays error message on the primary m/c unable to my own domain name (mac2) -- using shart name.......
Can anyone please help me in resolving this error...
@phani:
ReplyDelete1) You only need to change the name of the transport interfaces (this is an optional step to install VMware Tools - its not needed if you won't install it)
2) Same as above (after installing VMwareTools, the installation process will create two files to start your network interfaces, they are named as /etc/hostname.vmxnet[X]). By looking at these files, you will know the network names/instances
To be honest, I'm not sure if this step is mandatory, but I thought useful to add it in the tutorial, since VMware Tools provide good hardware/software improvements.
@Anonymous:
Please read the answers above (the VMwareTools step is optional - so you won't need to rename/change your interfaces)
You only need to do that if you are going to install VMwareTools
@Yogish:
Have you added the hostnames/IP addresses for both servers on /etc/hosts? Please post your error messages.
Sorry all for the huge delay to reply msgs (been travelling and working alot).
Cheers :)
i have installed sun solaries 10 u9 (32 bit).when try to install sun cluster 3.2u2 using installer ,its throughing error "you can invoke to install on X86-64 platform".how to install sun cluster software? please help me.
ReplyDeleteI've sent a request to Matthias Pfuetzner and he said that "since acquisition by Oracle, he can't give out SUNWscka" and suggested OpenSolaris HA or running 64-bit Solaris.
ReplyDeleteRunning 64-bit over 32-bit OS isn't directly available over my working laptop.. :(
Am Anand
ReplyDeleteI have a sent a request to Mr.Matthias. he replied me (ie below)
Anand Raj,
after the acquistion of Sun by Oracle I'm no longer capable of handing out
these modules.
Sorry!
Matthias
So i kindly req u send the SUNWscka Package & lets i will start SC 3.2
Thanks & Regards
Anandraj
anandraj.msc@gmail.com
Am anand
ReplyDeleteI've sent a request to Matthias Pfuetzner and he said that "since acquisition by Oracle, he can't give out SUNWscka"
I need the SUNWscka Pkg to proceed further
Thanks & regards
Anadraj
anandraj.msc@gmail.com
Hi
ReplyDeleteIt has been a very difficult process to setting up solaris cluster on 2 ESX boxes with a shared storage.
I had lots of failures at the end I thought that pain finished but...
cluster software installed successfully and rebooted the 2nd node ok, and rebooted itself then everytime solaris login screen comes 5 seconds responds to my ping then it goes down. If you try to type anything on the screen the node reboots itself.
there is no error log or anything.
If i boot them in non cluster mode they work just fine!
I have setup a shared disk between them with a different scsi adapter and set them to PHYSICAL to make it shared accross boxes. see 1.png
network communications seems to be ok. i couldnt make it work with 3 ethernet adapter so i use 2 instead 3.
I use custom cluster installation instead typical.
I wonder what am i missing??
is there anyone out there setted up Solaris Cluster on Vmware ESX 4.1 accross boxes using SAN ?
I do aware that this config is not supported but some guys out there make it happen!
Please let me know if you know anything about reboot issue on cluster mode.
I have installed everything and installation rebooted 2 nodes one by one. After that solaris logon screen came up and when i try to type anything on logon screen, solaris reboots. it is same for both nodes.
ReplyDeleteThen i tried to restart solaris in non cluster mode and it works just fine.
I have disabled the memory trim rate with adding comment on vmx file
MemTrimRate = "0"
I wonder what i have done wrong that system keeps rebooting all the time without any error message.
Please help thanks!
did u get past this? i have same problem
DeleteI am not able to register disk group please suggest
ReplyDeleteroot@sun001 #cldevicegroup create -t vxvm -n sun001,sun002 -p failback=true nfsdg
cldevicegroup: Disk "c2t000iqn.1992-08.Com.nEtApp%3Asn.99932695FFFFd0" in VxVM disk group "nfsdg" is not found in the Sun Cluster global-devices namespace.
cldevicegroup: (C579008) Inconsistencies are detected in the device group configuration.
cldevicegroup: (C527552) Creation of device group "nfsdg" failed.
root@sun001 #
Hi Rodrigo,
ReplyDeleteThanks for your perfect blog! I have problem on 3 steps and solve, maybe helps visitors;
1-Make sure you install Solaris 64 bit othervise it doesnt work :) ( Look for your bios settings and set VT enable ,than poweroff your laptop. Boot it and continue )
2-Analysis: It seems that /globaldevices file system was not created with 'newfs -i 512' because the 'total blocks' of /globaldevices are higher than the 'free
files'. Ignore this warning if /globaldevices is not planned to be used as global file system.
I got this error message and enter command newfs -i 512 /dev/dsk/c0d0s7 but still gave me same error. I use lofi instead of globaldevices but i dont know how to solve this problem still. ( it always higher than free files)
3-***Use Custome than select adapter one by one , if you use typical, your cluster wont work :)
---------------
Question;vmxnet0 is in IPMP group but its only a single interface. What is the reason of single interface beeing in IPMP group ? Must we add 4 interface before being installation? Hearbeat interfaces not in ipmp group? ( vmxnet1-2)
Is this normal?
Thanks for your opinions.
Fuat