先谈谈几个核心判断。首先,Oracle 官方的安装文档(下文会多次提及 doc 2 和 doc 3)中存在不少错误,即使最新的 Release Notes(截至 2006 年 8 月)修正了一些问题,仍遗留了不少错误。其次,文中提供的三个脚本在运行前需要仔细阅读,若因不当使用导致系统损坏,恕笔者不承担责任。最后,本文的安装步骤将尽量遵循官方文档,不过“Configuring UDP parameters”这一节中,官方方法完全不可行,只能另寻解决方案。
这是一份在 Solaris 9 SPARC 64 位企业版上,借助 Oracle 集群软件安装 Oracle 10g RAC 的完整指南。整个安装过程主要参考了以下三份文档:
1. Oracle Database Release Notes 10g Release 2 (10.2) for Solaris Operating System (SPARC 64-Bit) Part Number B15689-03
2. Oracle Clusterware and Oracle Real Application Clusters Installation Guide 10g Release 2 (10.2) for Solaris Operating System Part Number B14205-06
3. Oracle Database Installation Guide 10g Release 2 (10.2) for Solaris Operating System (SPARC 64-Bit) Part Number B15690-02
之所以说这套方案更适合测试环境,是因为硬件配置确实比较“骨感”。两台 Ultra-2 企业级服务器,CPU 为 300MHz 双核,内存 1GB,存储采用 Sun A1010 磁盘阵列。每个节点都配备了 Sun 光纤通道主机适配器(X1057A)连接到 A1010,另外还用一个 Sun multi-pack 挂载在其中一个节点上。至于心跳网络,使用的是 10BT 网卡,通过交叉网线直接连接——虽然 doc 2 提到 Oracle 10g 不支持交叉网线,但当前环境只能如此。两个节点都配置了 SBUS 帧缓冲(X359A)。
操作系统方面,两个节点均安装了 Solaris 9 4/04(64-Bit)以及最新的补丁集。同时配置了 Tcpwrappers 和 NTP,这两个软件包是 Solaris 9 自带的。此外,还安装了 openssh-4.3p2-sol9-sparc-local,并参考 SANS 的 Solaris 安全加固文档对系统进行了强化。
安装介质为 Oracle Database 10g Release 2 10.2.0.1.0 for Solaris SPARC 64-bit 企业版。如果条件允许,建议直接下载 10.2.0.2 及更高版本,这样可以减少大量补丁的安装。由于 Ultra-2 没有 DVD 光驱,只能将两张 DVD 拷贝到本地硬盘。为了方便节点 2 也能运行集群验证工具(CVU),可以在节点 1 上配置 NFS 服务器,然后在节点 2 上配置自动客户端。
整个安装配置的流程如下:
1. 执行预安装任务
2. 安装 Oracle 集群软件
3. 测试/验证集群软件
4. 仅安装 Oracle 数据库软件(含 RAC 选项)
5. 配置 ASM 并确保其在两个节点上运行
6. 使用 DBCA 创建 RAC 数据库
---------------------------------------------------------------------------------------------------------- packages required real type mount point partition ---------------------------------------------------------------------------------------------------------- DB software 4GB 4GB UFS /u01/app/oracle c0t1d0s0 (not shared) DB datafiles 1.2GB 12GB ASM c3t0d2s7,c3t1d2s7,c3t3d2s7 OCR 100MB 128MB raw c3t0d4s6,c3t2d4s6,c3t4d4s6 Voting disk 20MB 32MB raw c3t0d4s7,c3t2d4s7,c3t4d4s7 swap 400MB 1GB c0t0d0s1 ----------------------------------------------------------------------------------------------------------
集群软件将安装在节点 1 的专属主目录中。OCR 和投票磁盘的冗余级别设置为“正常”。由于仅用于测试,没有为闪回恢复区和日志归档分配空间。
Pre-Installation Tasks
A. 网络配置
这部分需要手动完成。以下是节点 1 的配置文件示例:
/etc/hosts
127.0.0.1 localhost # node1 192.168.1.64 rac1 rac1.abc.com loghost 10.10.10.1 rac1-priv rac1-priv.abc.com 192.168.1.69 rac1-vip rac1-vip.abc.com # node2 192.168.1.77 rac2 rac2.abc.com 10.10.10.2 rac2-priv rac2-priv.abc.com 192.168.1.76 rac2-vip rac2-vip.abc.com
/etc/inet/netmasks
192.168.0.0 255.255.255.0 10.10.10.0 255.255.255.0
/etc/hostname.hme0 内容为:rac1
/etc/hostname.le0 内容为:10.10.10.1
/etc/defaultrouter 内容为:192.168.1.1
/etc/hosts.allow 内容为:ALL: 192.168.1. 127.0.0.1 10.
然后运行以下命令来启用 le0 这个网络接口并测试:
# chown root:root /etc/hostname.le0 # ifconfig le0 plumb # ifconfig le0 10.10.10.1 netmask 255.255.255.0 up # ifconfig -a lo0: flags=1000849mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 hme0: flags=1000843 mtu 1500 index 2 inet 192.168.1.64 netmask ffffff00 broadcast 192.168.1.255 ether 8:0:20:82:47:d4 le0: flags=1000843 mtu 1500 index 3 inet 10.10.10.1 netmask ffffff00 broadcast 10.10.10.255 ether 8:0:20:82:47:d4 #
要检查网络设置,可以现在就运行 CVU,或者等所有任务都完成后再说。命令是:
/ora10.dvd2/clusterware/cluvfy/runcluvfy.sh comp nodecon -n rac1,rac2 -verbose
B. 其他预安装任务(除 SSH 配置外)
这里创建了一个 shell 脚本 `pre.install.ora10.conf.sh` 来自动完成大部分工作。这个脚本会执行集群软件和 Oracle 数据库的大部分预安装任务。注意,它只适用于刚安装好的 Solaris 9 系统,需要以 root 身份在两个节点上分别运行。
脚本内容如下:
#!/bin/sh
# Pre-installation conf. on Solaris9 for installing Oracle 10g R2 with RAC.
# Written by susbin@chinaunix.net 072406
sc_name=pre.install.ora10.conf.sh
ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
CRS_BASE=/u01/crs/oracle/product/10; export CRS_BASE
ORACLE_HOME=${CRS_BASE}/app; export ORACLE_HOME
PATH=$PATH:/usr/ccs/bin:/usr/local/bin; export PATH
echo ===============================================================
echo $sc_name started at `date`.
echo " "
echo " "
echo "=============================================="
echo "Creating Required Operating System Groups and Users"
echo " "
echo "Creating groups: dba, osdba, and oinstall."
groupadd -g 201 dba
groupadd -g 202 oinstall
groupadd -g 203 osdba
echo "Check them with the command: grep 20 /etc/group"
grep 20 /etc/group
echo " "
echo "Check if \"nobody\" exists on the system with: id nobody"
echo ""
id -a nobody
echo " "
echo "Creating the directory \"ORACLE_BASE\", which is set to $ORACLE_BASE"
mkdir -p $ORACLE_BASE
echo "Check it with the command: ls -l /u01/app "
echo ""
ls -l /u01/app
echo " "
echo "Creating a user account \"oracle\" and set the password of it:"
useradd -u 1005 -g 202 -G 201,203 -d $ORACLE_BASE -m -s /bin/ksh oracle
echo "Check the line in /etc/passwd with: grep oracle /etc/passwd"
grep oracle /etc/passwd
echo "Set the password of account oracle:"
passwd -r files oracle
chown -R oracle install ${ORACLE_BASE}
chmod -R 775 $ORACLE_BASE
echo " "
echo "Check if the oracle account has required groups with: id -a oracle "
echo " "
id -a oracle
echo " "
echo " "
echo "=============================================="
echo "Configuring Kernel Parameters"
echo " "
echo "Save a copy of /etc/system and append eleven lines to it."
echo "Need to reboot the system so the new parameters can take effect."
cp -p /etc/system /etc/system.orig
chmod 644 /etc/system
/bin/cat << EOF >> /etc/system
set noexec_user_stack=1
set semsys:seminfo_semmni=100
set semsys:seminfo_semmns=1024
set semsys:seminfo_semmsl=256
set semsys:seminfo_semvmx=32767
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
EOF
echo " "
echo "Check /etc/system with the command: tail -11 /etc/system"
tail -11 /etc/system
echo " "
echo " "
echo "=============================================="
echo "Identifying Required Software Directories"
echo "ORACLE_BASE is set to $ORACLE_BASE, and the size of it should be 3GB or bigger."
echo "Check it with the command: $ df -h $ORACLE_BASE"
echo " "
mount /dev/dsk/c0t1d0s0 $ORACLE_BASE
df -h $ORACLE_BASE
echo " "
echo " "
echo "=============================================="
echo "Configuring the Oracle User's Environment"
/bin/cat << EOF > ${ORACLE_BASE}/.profile
if [ -t 0 ]; then
stty intr ^C
fi
umask 022
ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
# for crs
CRS_BASE=/u01/crs/oracle/product/10; export CRS_BASE
ORACLE_HOME=${CRS_BASE}/app; export ORACLE_HOME
PATH=$PATH:/usr/local/bin:.:/bin:/usr/sbin:/usr/ucb; export PATH
# end of crs
# for oraDB
#ORACLE_SID=rac1; export ORACLE_SID
# end of oraDB
EDITOR=vi; export EDITOR
EXINIT='set nu showmode'; export EXINIT
EOF
chown -R oracle install ${ORACLE_BASE}
echo " "
echo " "
echo "=============================================="
echo "Configuring oracle clusterware home directory, which is set to"
echo " ${CRS_BASE}/crs "
mkdir -p ${CRS_BASE}/crs
chown -R root install /u01/crs
chmod -R 775 /u01/crs/oracle
echo " "
ls -l $CRS_BASE
echo ""
echo "=============================================="
echo "Configuring UDP parameters by creating a S70ndd and put it under"
echo "/etc/rc2.d to set the two values of ndd to 65536."
echo " "
/bin/cat << EOF > /etc/rc2.d/S70ndd
#!/sbin/sh
PATH=/usr/sbin;export PATH
ndd -set /dev/udp udp_recv_hiwat 65536
ndd -set /dev/udp udp_xmit_hiwat 65536
exit 0
EOF
chown root:sys /etc/rc2.d/S70ndd
chmod 755 /etc/rc2.d/S70ndd
echo "Check the S70ndd with \"ls -l S70ndd\" and \"cat S70ndd\" "
echo " "
ls -l /etc/rc2.d/S70ndd
echo " "
cat /etc/rc2.d/S70ndd
echo " "
echo "=============================================="
echo "Verify that the /etc/hosts file is used for name resolution"
echo "with the command: grep hosts: /etc/nsswitch.conf | grep files "
echo " "
grep hosts: /etc/nsswitch.conf | grep files
echo ""
echo "=============================================="
echo "Verify that the host name has been set with: hostname"
echo " "
hostname
echo ""
echo "=============================================="
echo "Verify that the domain name has NOT been set with: domainname"
echo " "
domainname
echo ""
echo "=============================================="
echo "Verify that the hosts file contains the fully qualified host name"
echo "with the command: grep `eval hostname` /etc/hosts "
echo " "
grep `eval hostname` /etc/hosts
echo " "
echo "The pre-installation configuring tasks is done on this node."
echo "Reboot the system so the new parameters can take effect."
echo " "
echo $sc_name ended at `date`.
echo ===============================================================
系统重启后,登录为 oracle 用户,在所有节点上运行以下命令来检查内核参数:
/usr/sbin/sysdef | grep SEM /usr/sbin/sysdef | grep SHM
然后检查 oracle 账户的环境变量,确保安装集群软件时是正确的:
$ env | grep ORACLE ORACLE_BASE=/u01/app/oracle ORACLE_HOME=/u01/crs/oracle/product/10/crs $
C. 在所有集群节点上配置 SSH
Solaris 9 自带的 Sun_SSH_1.1 存在一个 bug,这在 Sun 论坛上有相关讨论。而 doc 2 中关于 SSH 的配置指南是基于 OpenSSH V.3.x 的,并且明确指出 Oracle NetCA 和 DBCA 需要 scp 和 ssh 位于 /usr/local/bin 路径下。基于这两个原因,果断选择在两个节点上安装 openssh-4.3p2-sol9-sparc-local。另外,还需要将 /usr/local/etc/sshd_config 文件中的 “StrictModes” 设置为 “no”,否则即使所有配置都完成了,ssh 仍然会提示输入密码。
为此创建了两个脚本用于配置 SSH。使用说明如下:
1. 将 ssh.conf1.ksh 放到所有节点上 oracle 用户的家目录。
2. 在节点 1 上运行 ssh.conf1.ksh。
3. 修改节点 2 上的 ssh.conf1.ksh 然后运行它。
4. 在节点 1 上运行 ssh.conf2.ksh。
5. 在所有节点上运行命令:chmod 600 .ssh/authorized_keys
6. 测试 SSH 配置,命令是:ssh node1 [node2] date
脚本 `ssh.conf1.ksh` 内容如下:
#!/bin/ksh
# Run this script as user oracle on node1, and then on node2.
# Make sure the package ssh is installed under /usr/local.
# Written by susbin@chinaunix.net 071906
# Put the hostname of the two nodes below
node1=rac1
node2=rac2
sc_name=ssh.conf1.ksh
home_dir=/u01/app/oracle
key_dir=${home_dir}/.ssh
ssh_base=/usr/local/bin
echo ================================================================
echo $sc_name started at `date`.
echo " "
echo "You need to run this script on $node1 and $node2."
echo "Make changes on this script before you-run it on $node2."
echo " "
/bin/rm -r $key_dir
/bin/mkdir $key_dir
/bin/chmod 700 $key_dir
${ssh_base}/ssh-keygen -t rsa
echo " "
${ssh_base}/ssh-keygen -t dsa
/bin/touch ${key_dir}/authorized_keys
echo " "
echo "Now save the keys into the file authorized_keys."
echo " "
## comment out the lines when you-run it on node2.
${ssh_base}/ssh $node1 cat ${key_dir}/id_rsa.pub >> ${key_dir}/authorized_keys
${ssh_base}/ssh $node1 cat ${key_dir}/id_dsa.pub >> ${key_dir}/authorized_keys
## end of the lines
## uncomment the lines below when you-run it on node2.
#${ssh_base}/ssh $node2 cat ${key_dir}/id_rsa.pub >> ${key_dir}/authorized_keys
#${ssh_base}/ssh $node2 cat ${key_dir}/id_dsa.pub >> ${key_dir}/authorized_keys
#${ssh_base}/ssh $node1 cat ${key_dir}/id_rsa.pub >> ${key_dir}/authorized_keys
#${ssh_base}/ssh $node1 cat ${key_dir}/id_dsa.pub >> ${key_dir}/authorized_keys
#${ssh_base}/scp ${key_dir}/authorized_keys ${node1} {key_dir}
## end of the lines
echo " "
echo "It is done."
echo " "
echo $sc_name ended at `date`.
echo ==============================================================
脚本 `ssh.conf2.ksh` 内容如下:
#!/bin/ksh
# Run this script after you have run ssh.conf1.ksh on both nodes.
# Run this script as user oracle on node1 only.
# Written by susbin@chinaunix.net 071906
# Put the hostname of the two nodes below
node1=rac1
node2=rac2
sc_name=ssh.conf2.ksh
home_dir=/u01/app/oracle
key_dir=${home_dir}/.ssh
ssh_base=/usr/local/bin
echo ===========================================================
echo $sc_name started at `date`.
echo " "
echo "You only need to run this script on $node1."
echo " "
${ssh_base}/ssh $node2 cat ${key_dir}/id_rsa.pub >> ${key_dir}/authorized_keys
${ssh_base}/ssh $node2 cat ${key_dir}/id_dsa.pub >> ${key_dir}/authorized_keys
${ssh_base}/scp ${key_dir}/authorized_keys ${node2} {key_dir}
echo " "
echo "You need to run command \"/bin/chmod 600 ${key_dir}/authorized_keys\" "
echo "on all nodes and then test the ssh configuration with command "
echo " \"ssh node1 [node2] date \" "
echo " "
echo $sc_name ended at `date`.
echo ============================================================
echo " "
exec ${ssh_base}/ssh-agent $SHELL
${ssh_base}/ssh-add
## The command "exec ${ssh_base}/ssh-agent $SHELL" will spawn a sub-shell.
## and the rest of your login session will runs within this subshell.
## end of ssh.conf2.ksh
D. 配置集群软件和数据库存储(ASM 安装)
在两个节点上安装好主机适配器(X1057A)之后,运行 `format` 命令,确保共享盘在两个节点上的控制器编号是一致的。
然后在节点 1 上格式化磁盘。对于 ASM 要使用的磁盘,需要创建一个跨越整个磁盘的单一片区,并且必须从柱面 1 开始,否则 ASM 无法将其识别为候选磁盘。
# format ... selecting c3t0d2 [disk formatted] format> ... Free Hog partition[6]? 7 Enter size of partition '0' [0b, 0c, 0.00mb, 0.00gb]: 1c Enter size of partition '1' [0b, 0c, 0.00mb, 0.00gb]: 0 Enter size of partition '3' [0b, 0c, 0.00mb, 0.00gb]: 0 Enter size of partition '4' [0b, 0c, 0.00mb, 0.00gb]: 0 Enter size of partition '5' [0b, 0c, 0.00mb, 0.00gb]: 0 Enter size of partition '6' [0b, 0c, 0.00mb, 0.00gb]: 0 partition> p Current partition table (sun4g): Total disk cylinders available: 3880 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 unassigned wm 0 - 0 1.05MB (1/0/0) 2160 1 unassigned wu 0 0 (0/0/0) 0 2 backup wu 0 - 3879 4.00GB (3880/0/0) 8380800 3 unassigned wu 0 0 (0/0/0) 0 4 unassigned wu 0 0 (0/0/0) 0 5 unassigned wu 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wu 1 - 3879 4.00GB (3879/0/0) 8378640 partition> Okay to make this the current partition table[yes]? yes ... #
接着,将分区表从 c3t0d2 拷贝到其他磁盘:
# for disks in c3t1d2s0 c3t3d2s0 > do > prtvtoc /dev/rdsk/c3t0d2s0 | fmthard -s - /dev/rdsk/$disks > done fmthard: New volume table of contents now in place. fmthard: New volume table of contents now in place. #
然后格式化用于 OCR 和投票磁盘的盘片。一个不错的实践是把它们放在切片 3-7 上,切片 0 可不是个好选择。
# format ... selecting c3t0d4 [disk formatted] ... partition> p Current partition table (sun2g): Total disk cylinders available: 2733 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 unassigned wm 0 - 2515 1.82GB (2516/0/0) 3824320 1 unassigned wu 0 0 (0/0/0) 0 2 backup wu 0 - 2732 1.98GB (2733/0/0) 4154160 3 unassigned wu 0 0 (0/0/0) 0 4 unassigned wu 0 0 (0/0/0) 0 5 unassigned wu 0 0 (0/0/0) 0 6 unassigned wm 2516 - 2688 128.40MB (173/0/0) 262960 7 unassigned wu 2689 - 2732 32.66MB (44/0/0) 66880 partition> q ... #
同样,把分区表拷贝给其他 OCR/投票磁盘:
# prtvtoc /dev/rdsk/c3t0d4s0 | fmthard -s - /dev/rdsk/c3t2d4s2 fmthard: New volume table of contents now in place. # prtvtoc /dev/rdsk/c3t0d4s0 | fmthard -s - /dev/rdsk/c3t4d4s2 fmthard: New volume table of contents now in place. #
最后,在所有节点上,设置这些裸设备(包括 ASM、OCR 和投票磁盘的切片)的所有者、所属组和权限:
# cd /
# for rawdevs in c3t0d2s7,c3t1d2s7,c3t3d2s7 \
> c3t0d4s6 c3t2d4s6 c3t4d4s6 c3t0d4s7 c3t2d4s7 c3t4d4s7
> do
> echo $rawdevs; chown oracle:dba /dev/rdsk/$rawdevs; chmod 660 /dev/rdsk/$rawdevs
> ls -l `ls -l /dev/rdsk/$rawdevs | awk -F" " '{ print $11 }'`
> done
c3t0d2s7
crw-rw---- 1 oracle dba 118, 63 Jul 25 10:54 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@1,2:h,raw
c3t1d2s7
crw-rw---- 1 oracle dba 118,127 Jul 21 12:55 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@3,0:h,raw
c3t3d2s7
crw-rw---- 1 oracle dba 118,143 Jul 25 10:55 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@3,2:h,raw
c3t0d4s6
crw-rw---- 1 oracle dba 118, 38 Aug 9 12:07 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@0,4:g,raw
c3t2d4s6
crw-rw---- 1 oracle dba 118,118 Aug 16 10:21 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@2,4:g,raw
c3t4d4s6
crw-rw---- 1 oracle dba 118,198 Jul 25 10:55 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@4,4:g,raw
c3t0d4s7
crw-rw---- 1 oracle dba 118, 39 Aug 16 10:19 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@0,4:h,raw
c3t2d4s7
crw-rw---- 1 oracle dba 118,119 Aug 16 10:19 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@2,4:h,raw
c3t4d4s7
crw-rw---- 1 oracle dba 118,199 Aug 16 10:19 ../../devices/sbus@1f,0/SUNW,soc@0,0/SUNW,pln@a0000000,752ee9/ssd@4,4:h,raw
#
在节点 1 上,运行 CVU 来检查所有共享盘在所有节点上是否都可用:
$ cd /ora10.dvd2/clusterware/cluvfy $ ./runcluvfy.sh comp ssa -n rac1,rac2 -s \ > /dev/rdsk/c3t0d2s7,/dev/rdsk/c3t1d2s7,/dev/rdsk/c3t3d2s7\ > /dev/rdsk/c3t0d4s6,/dev/rdsk/c3t2d4s6,/dev/rdsk/c3t4d4s6\ > /dev/rdsk/c3t0d4s7,/dev/rdsk/c3t2d4s7,/dev/rdsk/c3t4d4s7 Verifying shared storage accessibility Checking shared storage accessibility... "/dev/rdsk/c3t0d2s7" is shared. "/dev/rdsk/c3t1d2s7" is shared. "/dev/rdsk/c3t3d2s7" is shared. "/dev/rdsk/c3t0d4s6" is shared. "/dev/rdsk/c3t2d4s6" is shared. "/dev/rdsk/c3t4d4s6" is shared. "/dev/rdsk/c3t0d4s7" is shared. "/dev/rdsk/c3t2d4s7" is shared. "/dev/rdsk/c3t4d4s7" is shared. Shared storage check was successful on nodes "rac2,rac1". Verification of shared storage accessibility was successful. $
至此,预安装任务全部完成。接下来,就可以开始安装 Oracle 集群软件了。
