Thursday, August 19, 2010

NPV internal FLOGI rejected by the upstream switch?

This error had baffled me over last few days. What seemed like a half hour job had taken 4 weeks of painstaking efforts to find answers.
A frantic search on net, forums and documentation was leading to nowhere. HP support as usual was useless.

What was I trying to do?

Trying to connect 9124e blade switches to 9513 core switch.

cfs distribution enabled
vsans matched

on 9124e
- npv enabled
- host ports set to F
- external uplinks set to NP
- flex-attach enabled on host ports

on 9513
- npiv enabled
- ports set to F
now the uplink ports and ports and core switch were stuck in "init" state and npv status as follows

#show npv status


npiv is disabled


disruptive load balancing is disabled


External Interfaces:


====================


Interface: ext1, State: Failed(NPV internal FLOGI rejected by the upstream switch, Reason: 0x5, Reason Expl: 0x0)


Interface: ext2, State: Failed(NPV internal FLOGI rejected by the upstream switch, Reason: 0x5, Reason Expl: 0x0)


Interface: ext3, State: Pre-Initialized


Interface: ext4, State: Pre-Initialized


Number of External Interfaces: 4


Server Interfaces:


==================


Interface: bay6, VSAN: 100, State: Waiting for External Interface


Interface: bay5, VSAN: 100, State: Waiting for External Interface


Interface: bay7, VSAN: 100, State: Waiting for External Interface


Interface: bay4, VSAN: 100, State: Waiting for External Interface


Interface: bay3, VSAN: 100, State: Waiting for External Interface


Interface: bay2, VSAN: 100, State: Waiting for External Interface


Interface: bay1, VSAN: 100, State: Waiting for External Interface


How it was solved?

First thought that is was VMWare related, as the hosts were not logging into 9124e either. But it was not correct.

The error hinted that the switches were communicating so I tried to debug the flogi on 9513. But the output it presented overwhelmed with too much information to decipher :)

While checking FCID database, I could see duplicate FCIDs in the allocation list. It didnt let me delete it with error telline me that te FCID is in use.

I decided to change the VSAN of the ports to a new temporary VSAN. And voilla! the links came up and everything proceeded.

HP had initially set these switches for us and they had changed the VSAN interoperability parameter to non-default values. This caused the 9124e to be rejected a login. Changing VSAN parameters on 9124e succedded but that did not work. Seems like a bug.

Cisco documentation says that the interface is smart enough to detect a mismatched VSAN. But in reality it does not provide all the errors needed :)

After 4 days, HP still was stuck in figuring out where each port is connected :)