本帖最后由 kevin.zhang 于 2012-5-1 00:55 编辑

来自:
http://dbakevin.blogspot.com/2012/04/trouble-shooting-crs-4534-cannot.html
转载请注明。

Here is a three nodes Rac cluster.
All DBs are up and running fine on all the nodes.
oracle $ ps -ef|grep smon|wc -l
43
oracle $ ps -ef|grep tns|wc -l
50
oracle $ ps -ef|grep d.bin
root      8177     1  0 Apr28 ?        00:18:29 /stage/grid/11.2.0/grid/bin/orarootagent.bin
oracle    8550     1 22 Apr28 ?        08:52:59 /stage/grid/11.2.0/grid/bin/oraagent.bin
root     26480     1  0 Apr28 ?        00:03:26 /stage/grid/11.2.0/grid/bin/ohasd.bin reboot
oracle   29210     1  0 Apr28 ?        00:00:00 /stage/grid/11.2.0/grid/bin/mdnsd.bin
oracle   29232     1  0 Apr28 ?        00:01:09 /stage/grid/11.2.0/grid/bin/gpnpd.bin
oracle   29257     1  0 Apr28 ?        00:05:32 /stage/grid/11.2.0/grid/bin/gipcd.bin
root     29307     1  0 Apr28 ?        00:00:13 /stage/grid/11.2.0/grid/bin/cssdmonitor
root     29334     1  0 Apr28 ?        00:00:10 /stage/grid/11.2.0/grid/bin/cssdagent
oracle   29351     1  8 Apr28 ?        03:25:51 /stage/grid/11.2.0/grid/bin/ocssd.bin
root     29540     1  0 Apr28 ?        00:20:57 /stage/grid/11.2.0/grid/bin/orarootagent.bin
oracle   29556     1  0 Apr28 ?        00:00:18 /stage/grid/11.2.0/grid/bin/diskmon.bin -d -f
root     30160     1  0 Apr28 ?        00:00:33 /stage/grid/11.2.0/grid/bin/octssd.bin reboot
oracle   30350     1  0 Apr28 ?        00:00:04 /stage/grid/11.2.0/grid/bin/evmd.bin
root     31947     1  0 Apr28 ?        00:03:50 /stage/grid/11.2.0/grid/bin/crsd.bin reboot

From above output red part we aslo can see evmd.bin process is up.
But when use crsctl command, it shows that Event Manger not work on node 2 and node 3.
oracle $ crsctl check cluster -all
**************************************************************
cihcissdb758:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
cihcissdb759:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
**************************************************************
cihcissdb760:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
**************************************************************

Checked cluster alert.log, it also indicate EVMD started normal:
2012-04-28 18:29:48.154
[evmd(30579)]CRS-1401:EVMD started on node cihcissdb759.

Below is from evmd.log:
2012-04-28 18:29:17.518: [ EVMD][1928129552] EVMD Starting
2012-04-28 18:29:17.518: [ EVMD][1928129552] Initializing OCR
2012-04-28 18:29:17.518: [ EVMD][1928129552] EVMD running in cluster mode
2012-04-28 18:29:18.524: [ EVMD][1928129552] Waiting for OCR. Seconds elapsed: 0
2012-04-28 18:29:48.136: [ EVMD][1928129552] Get OCR context succeeded
2012-04-28 18:29:48.153: [ EVMD][1928129552] Initializing Diagnostics Settings
2012-04-28 18:29:48.224: [ EVMD][1928129552] Authorization database built successfully.
2012-04-28 18:29:49.105: [ EVMAPP][1928129552] EVMD Started
2012-04-28 18:29:49.123: [ EVMD][1928129552] Authorization database built successfully.

All seems fine. Then why "CRS-4534: Cannot communicate with Event Manager"?

Let's check the current status for evmd:
oracle $ crsctl stat res -t -init
-------------------------------------------------------
NAME             TARGET          STATE       SERVER   
-------------------------------------------------------
ora.evmd      1  ONLINE   INTERMEDIATE cihcissdb760

The current status INTERMEDIATE means the evmd did starting but not totally completed.

Below recorded in oracle_agent.logfile:
2012-04-28 18:45:50.936: [    AGFW][1113385280] {0:0:2} ora.evmd 1 1 state changed from: UNKNOWN to: OFFLINE
.................
2012-04-28 18:46:23.441: [    AGFW][1114609984] {0:0:2} ora.evmd 1 1 state changed from: UNKNOWN to: STARTING
.................
2012-04-28 20:41:27.768: [    AGFW][1116698944] {0:0:2} ora.evmd 1 1 state changed from: STARTING to: PARTIAL
.................

We  can see the evmd was starting in progress, status changed from UNKNOWN --> OFFLINE --> STARTING --> PARTIAL.
And next the status should change from PARTIAL to ONLINE.
But from the logfile, we don't see the final step, it never happen.

It failed at the final step due to below error message:
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] clsn_agent::abort {
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] abort {
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] abort command: check
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] tryActionLock {
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] did not get lock
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] tryActionLock }
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] clsn_agent::abort: Exception LockAcquireTimeoutException
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] clsn_agent::abort, agent exiting }

This is very similar to bug 11807012: oraagent.bin Exits After Check Timed Out

But this bug should have been fixed in 11.2.0.2.3,  and our server already patched 11gR2 Patch Set Update 4.

It looks like that patch set update don't real fully solved that bug.


Raised SR and will check with Oracle Support further.

标签: 暂无标签
kevin.zhang

写了 32 篇文章,拥有财富 308,被 10 人关注

转播转播 分享分享 分享淘帖
回复

使用道具

成为第一个吐槽的人

您需要登录后才可以回帖 登录 | 加入社区

本版积分规则

意见
反馈