DNS Configuration For SCAN
Oracle RAC and MTU
on Rac Environment UDP protocol is used for cache fusion. (BTW, UDP and RDS protocols are supported in UNIX platform --Exadata ; whereas Windows uses TCP protocol).
UDP : is an higher level protocol stack, and it is implemented over IP Protocol ( UDP/IP). Cache Fusion uses UDP protocol to send packets over the wire (Exadata uses RDS protocol though).
UDP protocol, fragmentation, and assembly
The Maximum Transmission Unit (MTU) size is the maximum packet size in bytes that a network interface can send over a network device. This size includes the protocol information, such as the TCP and IP headers. This size does not contain the Ethernet header. This study is concerned with the MTU settings that can be set in the configuration files in SLES, for example, in /etc/sysconfig/network/ifcfg-qeth-bus-ccw-0.0.0xxx for each device. The effective MTU size is also negotiated during the connection creation with the other end point, and the MTU can be reduced to the capacity of that device.
When the MTU is smaller than the packet size, the data must be divided into MTU sized packets. On the receiving side the packets must be reassembled, thus creating overhead that scales with the quantity of packages required. When the MTU is only slightly too small, there can also be wasted bandwidth. For example, when the MTU is slightly smaller than the average packet size the secondary packet can be almost empty.
An MTU size bigger than the packet size should have no impact on the network bandwidth, if all intermediate nodes on the path to the final destination support the MTU size, because only the real data are sent. But the number of packages processed in parallel is calculated with the TPC/IP windows size and the MTU size. When the window size stays constant and the MTU size is increased, this number is decreased, which might lead to a performance degradation, when the smaller MTU size would be appropriate as well.
Let us consider an example of MTU set to 3000 in a network interface. One 8K block transfer can not be performed with just one IP packet as the IP packet size (3000 bytes) is less than 8K. So, one transfer of UDP packet of 8K size is fragmented to 3 IP packets and sent over the wire. In the receiving side, those 3 packets are reassembled to create one UDP buffer of size 8K. After the assembly, that UDP buffer is delivered to an UDP port of a UNIX process. Usually, a foreground process will listen on that port to receive the UDP buffer.
Consider what happens If MTU is set to 9000 in the network interface: Then 8K buffer can be transmitted over the wire with just one IP packet. There is no need for fragmentation or reassembly with MTU=9000 as long as the block size is less than 8K. MTU=9000 is also known as jumbo frame configuration. ( But, if the database block size is greater than jumbo frame then fragmentation and reassembly is still required. For example, for 16KB size, with MTU=9000, there will three 1K IP packets and one 7K IP packet to be transmitted).
Fragmentation and reassembly is performed at OS Kernel layer level and hence it is the responsibility of Kernel and the stack below to complete the fragmentation and assembly. Oracle code simply calls the send and receive system calls, passes the buffers to populate.
UDP vs TCP
From RAC point of view, UDP is the right choice over TCP for cache fusion traffic. With TCP/IP, for every packet transfer has overhead, connection need to be setup, packet sent, and the process must wait for TCP Acknowledgement before considering the packet send as complete. In a busy RAC systems, we are talking about 2-3 milli-seconds for packet transfer and with TCP/IP, we probably may not be able to achieve that level of performance. With UDP, packet transfer is considered complete, as soon as packet is sent and error handling is done by Oracle code itself. As you know, reliable network is a key to RAC stability, if much of packets (closer to 100%) are sent without any packet drops, UDP is a good choice over TCP/IP for performance reasons.
If there are reassembly failures, then it is a function of unreliable network or kernel or something else, but nothing to do with the choice of UDP protocol itself. Of course, RDS is better than UDP as the error handling is offloaded to the fabric, but usually require, infiniband fabric for a proper RDS setup. For that matter, VPN connections use UDP protocol too.
What happens if an IP packet is lost, assuming MTU=3000 bytes?
From the wireshark files with mtu3000, you will see that each of the packet have a Fragment offset. That fragment offset and IP identification is used to reassemble the IP packets to create 8K UDP buffer. Consider that there are 3 puzzle pieces, each puzzle piece with markings, and Kernel uses those markings( offset and IP ID) to reassemble the packets. Let’s consider the case, one of 3 packet never arrived, then the kernel threads will keep those 2 IP packets in memory for 64 seconds( Linux kernel parameter ipfrag_time controls that time) before declaring reassembly failure. Without receiving the missing IP packet, kernel can not reassemble the UDP buffer, and so, reassembly failure is declared.
Oracle foreground process will wait for 30 seconds (it used to be 300 seconds or so in older version of RAC) and if the packet is not arrived within that timeout period, FG process will declare a ("gc cr block lost" and/or "gc current block lost") and re-request the block. Of course, kernel memory allocated for IP fragmentation and assembly is constrained by Kernel parameter ipfrag_high_thres and ipfrag_low_thres and lower values for these kernel parameters can lead to reassembly failures too (and that’s why it is important to follow all best practices from RAC installation guides).
[root@west02 ~]# netstat -s |grep assem
453433 reassemblies required
43380 packets reassembled ok
3180797 packet reassembles failed --- if GC loss occurred and cluster wait increase
Oracle Clusterware & Oracle RAC instances rely on heartbeats for node memberships. If network Heartbeats are consistently dropped, Instance/Node eviction may occur. The Symptoms below are therefore relevant for Node/Instance evictions.
1. "gc cr block lost" / "gc current block lost" in top 5 or significant wait event
2. SQL traces report multiple gc cr requests / gc current request /
3. gc cr multiblock requests with long and uniform elapsed times
4. Poor application performance / throughput
5. Packet send/receive errors as displayed in ifconfig or vendor supplied utility
6. Netstat reports errors/retransmits/reassembly failures
SQL SCRIPT :
B.VALUE "GC BLOCKS CORRUPT 1",
C.VALUE "GC BLOCKS LOST 2",
D.VALUE "GC BLOCKS CORRUPT 2"
FROM GV$SYSSTAT A,
GV$SYSSTAT B,
GV$SYSSTAT C,
GV$SYSSTAT D
WHERE A.INST_ID = 1
AND A.NAME = 'gc blocks lost'
AND B.INST_ID = 1
AND B.NAME = 'gc blocks corrupt'
AND C.INST_ID = 2
AND C.NAME = 'gc blocks lost'
AND D.INST_ID = 2
AND D.NAME = 'gc blocks corrupt';
The parameters in /proc/sys/net/ipv4 control various aspects of the network, including a parameter that controlls the reassembly buffer size.
ipfrag_high_threshold specifies that maximum amount of memory used to reassemble IP fragments. When the memory used by fragments reaches ipfrag_high_threshold, old entries are removed until the memory used declines to ipfrag_low_threshold.
If the output of netstat shows increasing amounts of IP fragment reassembles failing, we recommend to increase ipfrag_high_threshold. The impact can be significant. In some use cases, increasing this buffer space improved throughput from 32MB/sec to 80MB/sec.
To temporarily change the value of ipfrag_high_threshold, use this command as root:
[root@west01 ~]# cat /proc/sys/net/ipv4/ipfrag_high_thresh
4194304
[root@west01 ~]# cat /proc/sys/net/ipv4/ipfrag_low_thresh
3145728
Change Parameter :
[root@west01 ~]#echo "Value" > /proc/sys/net/ipv4/ipfrag_high_threshold
OR
[root@west02 ~]# sysctl -a|grep ipfrag
net.ipv4.ipfrag_high_thresh = 4194304
net.ipv4.ipfrag_low_thresh = 3145728
net.ipv4.ipfrag_max_dist = 64
net.ipv4.ipfrag_secret_interval = 0
net.ipv4.ipfrag_time = 30
BTW, there are few other reasons for 'gc lost packets' too. High CPU usage also can lead to 'gc lost packets' failures too, as the process may not have enough cpu time to drain the buffers, network buffers allocated for that process becomes full, and so, kernel will drop incoming packets.
It is probably better to explain these concepts visually. So, I created a video. When you watch this video, notice that there is HD button on the top of the video. Play this in HD mode so that you will have better learning experience.
Oracle Enterprise Session Border Controller (E-SBC) : connects disparate Internet Protocol (IP) communications networks while mitigating security threats, curing interoperability problems, and ensuring reliable communications. It protects and controls real-time voice, video, and Unified Communications (UC) as they traverse IP network borders.
Errors :
Process termination requested for pid 6440 [source = rdbms], [info = 2] [request issued by pid: 52630, uid: 211819489]
Errors in file /opt/oracle/app/orcl/diag/rdbms/regtxndb/regtxndb1/trace/regtxndb1_m000_52909.trc (incident=121694) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
RAC Installation on 19c
1. Install Esxi Host
3. Configure RHEL 7.8
2. DNS Server
3. Install Storage Openfiler
4. Install Redhat 7.8
5. Configure the nodes
6. Grid Setup
7. RDBMS setup
8. DBCA
9. Patch the database
10. validations
1. Network Interfaces :
. Private Network
. Public Netwok
. Optional for access internet (depend on requirement)
--/etc/hosts : Scan ip must be resolve in DNS
#----------------------------------woravr-----------------------------#
#Public
192.168.45.101 woravr01.vishalbh.com woravr01
192.168.45.102 woravr02.vishalbh.com woravr02
192.168.45.103 woravr03.vishalbh.com woravr03
#Private
192.168.46.101 woravr01-priv.vishalbh.com woravr01-priv
192.168.46.102 woravr02-priv.vishalbh.com woravr02-priv
192.168.46.103 woravr03-priv.vishalbh.com woravr03-priv
#vip
192.168.45.104 woravr01-vip.vishalbh.com woravr01-vip
192.168.45.105 woravr02-vip.vishalbh.com woravr02-vip
192.168.45.106 woravr03-vip.vishalbh.com woravr03-vip
#scan
192.168.45.107 woravrscan.vishalbh.com woravrscan
192.168.45.108 woravrscan.vishalbh.com woravrscan
192.168.45.109 woravrscan.vishalbh.com woravrscan
#----------------------------------woravr-----------------------------#
--/etc/resolve.conf
# Generated by NetworkManager
search vishalbh.com #--- DNS server
nameserver 192.168.45.2 #--- your domain name server
More DNS configuration details :
[root@woravr02 ~]# nslookup woravrscan.vishalbh.com
Server: 192.168.45.2
Address: 192.168.45.2#53
Name: woravrscan.vishalbh.com
Address: 192.168.45.107
Name: woravrscan.vishalbh.com
Address: 192.168.45.108
Name: woravrscan.vishalbh.com
Address: 192.168.45.109
2. Install required RPM's '
yum update -y
yum install xorg-x11-server-Xorg xorg-x11-xauth xorg-x11-apps -y
yum install -y oracle-database-preinstall-19c.x86_64
yum install oracleasm-support
yum install bind* -y
sysctl -p
3. Stop the firewall
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
4. Start the NTP sever or time server
systemctl enable chronyd.service
systemctl restart chronyd.service
chronyc -a 'burst 4/4'
chronyc -a makestep
5. User managment
groupadd -g 54341 oinstall
groupadd -g 54342 dba
groupadd -g 54343 oper
groupadd -g 54344 backupdba
groupadd -g 54345 dgdba
groupadd -g 54346 kmdba
groupadd -g 54347 asmdba
groupadd -g 54348 asmoper
groupadd -g 54349 asmadmin
groupadd -g 54350 racdba
usermod -g oinstall -G dba,oper,asmdba,asmadmin,asmoper,backupdba,dgdba,kmdba,racdba oracle
useradd -m -u 54342 -g oinstall -G dba,asmadmin,asmdba,asmoper -d /home/grid -s /bin/bash grid
echo "grid" | passwd --stdin grid
useradd -m -u 54341 -g oinstall -G dba,oper,backupdba,dgdba,kmdba,asmdba,asmadmin,racdba -d /home/oracle -s /bin/bash oracle
echo "oracle" | passwd --stdin oracle
6. Directory Stucture
mkdir -p /u01/app/oracle/19c/grid
mkdir -p /u01/app/oracle/19c/grid
mkdir -p /u01/app/oracle/oracle
mkdir -p /u01/app/oracle/19c/db
chmod -R 775 /u01
chown -R oracle:oinstall /u02/app/oracle
chown -R grid:oinstall /u02/app/oracle/19c/grid
7. Shared Disks for Oracleasm disks
1. Udev method (mostly Own by system team)
2. Oracleasm (multipath)
# rpm -qa | grep oracleasm
oracleasm-support-2.1.11-2.el7.x86_64
make Disk partion using fdisk comands (output like below):
[root@woravr02 ~]# fdisk -l | egrep 'Disk /dev/sd|dev'
Disk /dev/sdb: 209.7 GB, 209715200000 bytes, 409600000 sectors
/dev/sdb1 2048 409599999 204798976 83 Linux
Disk /dev/sdc: 157.3 GB, 157303177216 bytes, 307232768 sectors
/dev/sdc1 2048 307232767 153615360 83 Linux
Disk /dev/sdd: 104.9 GB, 104857600000 bytes, 204800000 sectors
/dev/sdd1 2048 204799999 102398976 83 Linux
Disk /dev/sde: 419.4 GB, 419430400000 bytes, 819200000 sectors
/dev/sde1 2048 819199999 409598976 83 Linux
[root@woravr02 ~]#
#oracleasm createdisk FRA /dev/sdb1
Writing disk header: done
Instantiating disk: done
#oracleasm createdisk ACFS /dev/sdc1
Writing disk header: done
Instantiating disk: done
#oracleasm createdisk OCR /dev/sdd1
Writing disk header: done
Instantiating disk: done
#oracleasm createdisk DATA /dev/sde1
Writing disk header: done
Instantiating disk: done
#/usr/sbin/oracleasm scandisks #--FROM other nodes
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "OCR"
Instantiating disk "DATA"
Instantiating disk "ACFS"
Instantiating disk "FRA"
#oracleasm listdisks
ACFS
DATA
FRA
OCR
8. unzip the Software
unzip LINUX.X64_193000_grid_home_2.zip -d /u01/app/oracle/19c/grid
unzip LINUX.X64_193000_db_home.zip -d /u01/app/oracle/19c/grid
9. User Equivalence
./sshUserSetup.sh -user grid -hosts "woravr01 woravr02 woravr03" -noPromptPassphrase -confirm -advanced
10. runcluvfy
./runcluvfy.sh stage -pre crsinst -n woravr01,woravr02,woravr03 -orainv oinstall -osdba asmdba -verbose
11. ./gridSetup.sh
More Details
12 . RunFixup if required .
13. orainstRoot
/u01/app/oraInventory/orainstRoot.sh ( each node )
14. root.sh
/u01/app/oracle/19c/grid/root.sh (each node take care exicution sequence)
15 . Check the Cluster status after Each root.sh script
./crsctl check cluster -all
**************************************************************
woravr02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
woravr02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
**************************************************************
woravr03:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@woravr02 grid_1]# crsstat
Cluster woravrdc
Type | Name | woravr01 | woravr02 | woravr03 |
-------------------------------------------------------------------------------------------
asm | asm | Online | Online | Online |
asmnetwork | asmnet1 | Online | Online | Online |
chad | chad | Online | Online | Online |
cvu | cvu | Online | - | - |
dg | DATA | Online | Online | Online |
dg | FRA | Online | Online | Online |
dg | OCR | Online | Online | Online |
network | net1 | Online | Online | Online |
ons | ons | Online | Online | Online |
qosmserver | qosmserver | Online | - | - |
vip | scan1 | - | - | Online |
vip | scan2 | Online | - | - |
vip | scan3 | - | Online | - |
vip | woravr01 | Online | - | - |
vip | woravr02 | - | Online | - |
vip | woravr03 | - | - | Online |
-------------------------------------------------------------------------------------------
x : Resource is disabled
: Has been restarted less than 24 hours ago
Listener | Port | woravr01 | woravr02 | woravr03 | Type |
----------------------------------------------------------------------------------------------------------
ASMNET1LSNR_ASM| TCP:1525 | Online | Online | Online | Listener |
LISTENER | TCP:1521 | Online | Online | Online | Listener |
LISTENER_SCAN1 | TCP:1521 | - | - | Online | SCAN |
LISTENER_SCAN2 | TCP:1521 | Online | - | - | SCAN |
LISTENER_SCAN3 | TCP:1521 | - | Online | - | SCAN |
-----------------------------------------------------------------------------------------------------------
: Has been restarted less than 24 hours ago
16. Invoke ASMCA to create diskgroup
/u01/app/oracle/19c/grid/bin/./asmca
17. Intsall RDBMS Software
./runInstaller.sh
18. root.sh
/u01/app/oracle/19c/db/root.sh
19. Run DBCA
Create the database
20. Create service for database : optional
srvctl modify service -db woravr -service woravrsvc -i woravr01,woravr02 -t woravr03 -tafpolicy BASIC -failback YES
21. crsstat
Cluster woravrdc
Type | Name | woravr01 | woravr02 | woravr03 |
-------------------------------------------------------------------------------------------
asm | asm | Online | Online | Online |
asmnetwork | asmnet1 | Online | Online | Online |
chad | chad | Online | Online | Online |
cvu | cvu | Online | - | - |
dg | DATA | Online | Online | Online |
dg | FRA | Online | Online | Online |
dg | OCR | Online | Online | Online |
network | net1 | Online | Online | Online |
ons | ons | Online | Online | Online |
qosmserver | qosmserver | Online | - | - |
vip | scan1 | - | - | Online |
vip | scan2 | Online | - | - |
vip | scan3 | - | Online | - |
vip | woravr01 | Online | - | - |
vip | woravr02 | - | Online | - |
vip | woravr03 | - | - | Online |
-------------------------------------------------------------------------------------------
x : Resource is disabled
: Has been restarted less than 24 hours ago
Listener | Port | woravr01 | woravr02 | woravr03 | Type |
----------------------------------------------------------------------------------------------------------
ASMNET1LSNR_ASM| TCP:1525 | Online | Online | Online | Listener |
LISTENER | TCP:1521 | Online | Online | Online | Listener |
LISTENER_SCAN1 | TCP:1521 | - | - | Online | SCAN |
LISTENER_SCAN2 | TCP:1521 | Online | - | - | SCAN |
LISTENER_SCAN3 | TCP:1521 | - | Online | - | SCAN |
-----------------------------------------------------------------------------------------------------------
: Has been restarted less than 24 hours ago
DB | Service | woravr01 | woravr02 | woravr03 |
-----------------------------------------------------------------------------------------
woravr | woravrsvc | Online | Online | Online |
-----------------------------------------------------------------------------------------
: Has been restarted less than 24 hours ago
DB | Version | woravr01 | woravr02 | woravr02 | DB Type |
--------------------------------------------------------------------------------------------------------
woravr | (1) | Open | Open | Open | RAC (P) |
---------------------------------------------------------------------------------------------------------
ORACLE_HOME references listed in the Version column
1 : /u01/app/oracle/19c/db_1 oracle oinstall
: Has been restarted less than 24 hours ago
22. Add Entry on /etc/oratab
vi /etc/oratab
23. Create Env Profiles .
[oracle@woravr02 ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
export TMP=/tmp
export TMPDIR=$TMP
export ORACLE_BASE=/u01/app/oracle_base
export GRID_HOME=/u01/app/oracle/19c/grid_1
export DB_HOME=/u01/app/oracle/19c/db_1
export ORACLE_HOME=$DB_HOME
export ORACLE_SID=woravr2
export ORACLE_TERM=xterm
export BASE_PATH=/usr/sbin:$PATH
export PATH=$ORACLE_HOME/bin:$GRID_HOME/bin:$BASE_PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
export CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib
alias grid='. /home/oracle/grid.env'
alias db='. /home/oracle/db.env'
alias l='ls -altrsh'
#-------------------------------------------------------
[oracle@woravr02 ~]$ cat db.env
export ORACLE_SID=woravr2
export ORACLE_HOME=$DB_HOME
export PATH=$ORACLE_HOME/bin:$BASE_PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
export CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib
[oracle@woravr02 ~]$ cat grid.env
export ORACLE_SID=+ASM2
export ORACLE_HOME=$GRID_HOME
export PATH=$ORACLE_HOME/bin:$BASE_PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
export CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib
[oracle@woravr02 ~]$
24 . Patch the database
below command from root user :
1. Export Grid Env
$ORACLE_HOME/OPatch/./opatchauto apply /tmp/31326369/31305339 -oh $ORACLE_HOME
2. Export the RDBMS Env
$ORACLE_HOME/OPatch/./opatchauto apply /tmp/31326369/31305339 -oh $ORACLE_HOME
repeat same steps for all nodes
open all pluggable databases and exicute the below command after patch
$ORACLE_HOME/OPatch/./datapatch -verbose
[oracle@woravr02 ~]$ db
[oracle@woravr02 ~]$ $ORACLE_HOME/OPatch/./opatch lsinventory | grep applied
Patch 31305087 : applied on Sun Aug 09 20:07:12 IST 2020
Patch 31281355 : applied on Sun Aug 09 20:02:59 IST 2020
[oracle@woravr02 ~]$ grid
[oracle@woravr02 ~]$ $ORACLE_HOME/OPatch/./opatch lsinventory | grep applied
Patch 31335188 : applied on Sun Aug 09 19:47:39 IST 2020
Patch 31305087 : applied on Sun Aug 09 19:46:48 IST 2020
Patch 31304218 : applied on Sun Aug 09 19:45:41 IST 2020
Patch 31281355 : applied on Sun Aug 09 19:42:14 IST 2020
[oracle@woravr02 ~]$