vSphere 5.5 Upd1: NFS randomly disconnect from ESXi

nfsdisconnect01

After upgrading vSphere 5.5 to Update 1, NFS datastores randomly disconnect from ESXi. The problem has been reported by VMware in the public KB 2076392.

The problem cause the intermittent APDs for NFS datastores with the result of VMs are unable to do any I/O to the datastore during the disconnection.

The main symptoms of the problem are the following:

  • VMs appear frozen
  • NFS datastores are grayed out.

The vobd log reports entries similar to:

2014-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
2014-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
2014-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2014-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

At the moment the only solution for this problem is to avoid upgrading to vSphere 5.5 Update 1 or rolling back to vSphere 5.5 because no fix are available yet from VMware.

firma