Table of Contents
SRM And SnapMirror
During the testing phase of our implementation I got a lot errors during the “Synchronize Storage” and “Create Writable Storage Snapshots” steps. These are the errors we got:
Error - Failed to sync data on replica device '/vol/<volumename>/<lunname>'. Device synchronization did not complete properly Device synchronization might have been disrupted because of network failure Ensure that the storage array hosting the device is connected to the network and accessible to its peer storage array.Also check storage array for Replication errors in the snapmirror log file.
and:
Error - Failed to create snapshots of replica devices. Failed to create snapshot of replica device /vol/<volumename>/<lunname>. SRA command 'testFailoverStart' failed for device
Unfortunately, I could never really found why these errors were generated, although at 99% of the time they were given on only one of the filers. That specific filer also had SATA disks, while the other only had FC disks. I could also not replicate these errors, sometimes it just worked, and sometimes not. But, after some testing I found out that the error only occurred when snapmirror was busy. To make sure the testing went smooth I created some scripts and a prompt in the recoveryplans to make sure everything works fine.
The Scripts
These are the scripts that are being executed:
G:\scripts\storagescripts\plink filer01a -telnet < "G:\scripts\storagescripts\CommandFile - snapmirror off.txt" G:\scripts\storagescripts\plink filer01b -telnet < "G:\scripts\storagescripts\CommandFile - snapmirror off.txt" exit
And this is the commandfile:
root XXXXXX snapmirror off logout telnet
Note there is an empty line at the end to dismiss the telnet connection. Also you'll need plink.
Of course, the snapmirror enable command simply changes the “snapmirror off” to “snapmirror on”.
Recovery Plan Change
Add Commands
Now the recovery plan needs to be changed for this to be executed. First you need two commands, one before “Create Writable Storage Snapshot” and one after. These commands are configured like this:
Snapmirror Off
- Type: Command on SRM Server
- Name: Snapmirror Off
- Content: c:\windows\system32\cmd.exe /c g:\scripts\storagescripts\snapmirroroff.bat > g:\scripts\storagescripts\snapoff.log
- Timeout: 5 minutes
Snapmirror On
- Type: Command on SRM Server
- Name: Snapmirror On
- Content: c:\windows\system32\cmd.exe /c g:\scripts\storagescripts\snapmirroron.bat > g:\scripts\storagescripts\snapon.log
- Timeout: 5 minutes
Extra Prompt
Now during the testing of the extra commands we found out that the step “Create Writable Storage Snapshot” and the command step directly after that were executed at the same time, even though the “Create Writable Storage Snapshot” step didn't finish yet. Now this is obviously a bug, but we needed a workaround. To prevent this from happening we created a prompt between the “Create Writable Storage Snapshot” step and the command step. We just wait for the “Create Writable Storage Snapshot” step to finish and then dismiss the prompt. It's a bit of a hassle but since then everything works fine.
The prompt configuration is like this:
- Name: Wait for the Create Writable Storage Snapshots to complete.
- Content: Due to a bug, you'll have to wait before dismissing this prompt. Make sure the step “Create Writable Storage Snapshots” is completely finished and then press dismiss.