Table of Contents
NetApp SnapMirror: Using Data
In this article I'll describe a couple of scenarios I could think of regarding the usage of snapmirror data.
First Scenario: ESX ISO Store
In this scenario we have a ISO store in our easy accessible acceptance location. This ISO store is used by ESX to mount ISOs to install VMs. However, we need these ISOs in our production environment as well. Copying the ISOs manually is quite a hassle, so we want this to go almost automatically.
Second Scenario: Disaster Recovery
In this scenario we'll pretend that our production site is unavailable and we'll have to do a failover using the snapmirrored data. When the disaster is resolved we'll also do a switch-back.
In these scenarios the snapmirror has already been set up as described here.
ESX ISO Store
Setting up the ESX ISO store scenario basically comes down to these steps:
- Break the mirror
- Mount the datastore
- Perform a resync if new ISOs are added
Note: Mounting the datastores without breaking the mirror is unfortunately not possible. ESX requires that the LUNs are writable, which is not possible if the mirror is still operational.
Break the Mirror
Breaking the mirror will automatically make the target writable. This is done from the destination filer.
dst-prd-filer1> snapmirror break ESX_ISOSTORE snapmirror break: Destination ESX_ISOSTORE is now writable. Volume size is being retained for potential snapmirror resync. If you would like to grow the volume and do not expect to resync, set vol option fs_size_fixed to off. dst-prd-filer1> snapmirror status ESX_ISOSTORE Snapmirror is on. Source Destination State Lag Status src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Broken-off 00:50:12 Idle
Mount the Datastore
First Host
The first time you do this there is a difference between the first host and other hosts on how to mount the datastore. The first host can be done through vCenter, additional hosts not.
Log on to vCenter, select the first host and go to Configuration → Storage. Click on “Rescan All” and select both options to scan all the HBAs. After refreshing the device is present but the datastore is not. To make the datastore visible perform these steps:
- Add Storage
- Select “Disk/LUN”
- The device on which the VMFS datastore is should be visible, select it
- Select “Keep the existing signature”
- The VMFS partition is shown, click next and finish.
The datastore should now be accessible by the host.
Additional Hosts
If you would do the same steps as above for additional hosts you'll get this error:
Call "HostStorageSystem.ResolveMultipleUnresolvedVmfsVolumes" for object "storageSystem-446" on vCenter Server "vCenter.company.local" failed.
The solution by VMware kb is to point the vSphere Client directly to the host instead of vCenter and then perform the same steps as above.
Resync From Original Source
dst-prd-filer1> snapmirror status ESX_ISOSTORE Snapmirror is on. Source Destination State Lag Status src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Broken-off 00:50:12 Idle dst-prd-filer1> snapmirror resync ESX_ISOSTORE The resync base snapshot will be: dst-prd-filer1(0151762648)_ESX_ISOSTORE.2 These older snapshots have already been deleted from the source and will be deleted from the destination: dst-prd-filer1(0151762648)_ESX_ISOSTORE.1 Are you sure you want to resync the volume? y Mon May 23 16:45:39 CEST [dst-prd-filer1: snapmirror.dst.resync.info:notice]: SnapMirror resync of ESX_ISOSTORE to src-acc-filer1:SRC_ACC_ESX_ISO is using dst-prd-filer1(0151762648)_ESX_ISOSTORE.2 as the base snapshot. Volume ESX_ISOSTORE will be briefly unavailable before coming back online. Mon May 23 16:45:40 CEST [dst-prd-filer1: wafl.snaprestore.revert:notice]: Reverting volume ESX_ISOSTORE to a previous snapshot. exportfs [Line 2]: NFS not licensed; local volume /vol/ESX_ISOSTORE not exported Revert to resync base snapshot was successful. Mon May 23 16:45:40 CEST [dst-prd-filer1: replication.dst.resync.success:notice]: SnapMirror resync of ESX_ISOSTORE to src-acc-filer1:SRC_ACC_ESX_ISO was successful. Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. dst-prd-filer1> snapmirror status ESX_ISOSTORE Snapmirror is on. Source Destination State Lag Status src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Snapmirrored 00:50:47 Transferring (7592 KB done) dst-prd-filer1>
Note: The volume is set to “snapmirrored,read-only” automatically
Note2: The datastore is gone from the vmware hosts
Break again
Because the datastore has already been known to the host, all you need to do to gain access to the datastore again is a rescan all on every host. The datastore will be made available automatically.
Disaster Recovery
Using the snapmirrored data for disaster recovery is most probably the most basic reason why you bought SnapMirror. Furthermore, because in real life disasters rarely happen, we'll do a disaster test. The test will be done in two different ways. The first one is done while assuming you have flexclone licensed. This is your preferred scenario since it won't break your mirror, which means that if you have a disaster during testing you'll have the latest version of your data untouched and available. The second way is without flexclone. This means breaking the mirror, snapshot the volume and clone the lun. Afterwards we'll have to restore the SnapMirror relationship.
Disaster Recovery Test Using FlexClone
Create Snapshot On Source
Because the target volume is read-only, we'll have to create the snapshot we need for cloning on the source filer. When we done that we replicate the snapshot to the target filer.
Creating Snapshot
On the source filer:
storage01> snap list snapmirrorsource Volume snapmirrorsource working.... %/used %/total date name ---------- ---------- ------------ -------- 2% ( 2%) 0% ( 0%) May 26 10:15 storage02(0099904947)_snapmirrortarget.2 (snapmirror) storage01> snap create snapmirrorsource volclonetest storage01> snap list snapmirrorsource Volume snapmirrorsource working... %/used %/total date name ---------- ---------- ------------ -------- 2% ( 2%) 0% ( 0%) May 26 10:16 volclonetest 3% ( 2%) 0% ( 0%) May 26 10:15 storage02(0099904947)_snapmirrortarget.2 (snapmirror)
Note that it's (probably) possible to use the snapmirror snapshot. I just prefer simple named snapshots.
Update Snapmirror
On the target filer:
storage02> snapmirror update snapmirrortarget Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. storage02> snapmirror status Snapmirror is on. Source Destination State Lag Status storage01:snapmirrorsource storage02:snapmirrortarget Snapmirrored 00:00:10 Idle storage02> snap list snapmirrortarget Volume snapmirrortarget working... %/used %/total date name ---------- ---------- ------------ -------- 0% ( 0%) 0% ( 0%) May 26 10:19 storage02(0099904947)_snapmirrortarget.3 2% ( 2%) 0% ( 0%) May 26 10:16 volclonetest 4% ( 2%) 0% ( 0%) May 26 10:15 storage02(0099904947)_snapmirrortarget.2
The created snapshot volclonetest is available on the target filer now as well.
Clone Volume
On the target filer we'll have to license flex_clone before we can clone the volume:
storage02> license add XXXXXXX A flex_clone site license has been installed. FlexClone enabled. storage02> vol clone create snapmirrortargetclone -s volume -b snapmirrortarget volclonetest Creation of clone volume 'snapmirrortargetclone' has completed.
- snapmirrortargetclone: is the name of the new volume
- -s volume: is the space reservation
- -b snapmirrortarget: is the parent volume
- volclonetest: the snapshot the clone is using
Check the Volume
You can see the available volumes on the CLI of the filer, note that you can't tell it's a flexclone:
storage02> vol status Volume State Status Options snapmirrortarget online raid_dp, flex nosnap=on, snapmirrored=on, snapmirrored create_ucode=on, read-only convert_ucode=on, fs_size_fixed=on snapmirrortargetclone online raid_dp, flex nosnap=on, create_ucode=on, convert_ucode=on
Check the Snapshot
storage02> snap list Volume snapmirrortarget working... %/used %/total date name ---------- ---------- ------------ -------- 0% ( 0%) 0% ( 0%) May 26 10:19 storage02(0099904947)_snapmirrortarget.3 2% ( 2%) 0% ( 0%) May 26 10:16 volclonetest (busy,snapmirror,vclone) 4% ( 2%) 0% ( 0%) May 26 10:15 storage02(0099904947)_snapmirrortarget.2 Volume snapmirrortargetclone working.... %/used %/total date name ---------- ---------- ------------ -------- 4% ( 4%) 0% ( 0%) May 26 10:16 volclonetest
Using the Data
Now you can use the new volume in any way you like. Write requests will be done in the new volume, read requests will first be done in the new volume, and if necessary be forwarded to the old volume. Since the new volume is based on a snapshot, the original source and target are not altered in any way.
What Else is Possible
In this part we'll go outside the original casus for just using the data and we'll do the whatif scenario. We'll try to remove the source snapshot, see what happens and… how to fix it.
Remove the Snapshot
On the source filer:
storage01> snap list snapmirrorsource Volume snapmirrorsource working.... %/used %/total date name ---------- ---------- ------------ -------- 4% ( 4%) 0% ( 0%) May 26 10:19 storage02(0099904947)_snapmirrortarget.3 (snapmirror) 6% ( 2%) 0% ( 0%) May 26 10:16 volclonetest storage01> snap delete snapmirrorsource volclonetest storage01> snap list snapmirrorsource working.... %/used %/total date name ---------- ---------- ------------ -------- 4% ( 4%) 0% ( 0%) May 26 10:19 storage02(0099904947)_snapmirrortarget.3 (snapmirror)
So it's possible to remove the snapshot the cloned volume is based on in the target filer… What will happen when we'll try to update the SnapMirror.
Update SnapMirror
storage02> snapmirror update snapmirrortarget Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. storage02> snapmirror status Snapmirror is on. Source Destination State Lag Status storage01:snapmirrorsource storage02:snapmirrortarget Snapmirrored 01:14:25 Pending
As you can see is status: pending
Filerview reports: replication transfer failed to complete
Syslog in /etc/messages shows:
Thu May 26 11:35:05 CEST [snapmirror.dst.snapDelErr:error]: Snapshot volclonetest in destination volume snapmirrortarget is in use, cannot delete. Thu May 26 11:35:05 CEST [replication.dst.err:error]: SnapMirror: destination transfer from storage01:snapmirrorsource to snapmirrortarget : replication transfer failed to complete.
So the system is trying to remove the snapshot but can't, which means the mirror is not working anymore.
Split Cloned Volume
The solution to this problem is, assuming we want to keep the cloned volume is to split the cloned volume off from it's parent and the snapshot from the parent. This effectively means that all the data which is now in the original target volume will be copied to the cloned volume.
Note that changes to the data you did in the clone are persistent, if you deleted files from the clone they will keep being deleted.
storage02> vol clone split start snapmirrortargetclone Clone volume 'snapmirrortargetclone' will be split from its parent. Monitor system log or use 'vol clone split status' for progress. storage02> snapmirror status Snapmirror is on. Source Destination State Lag Status storage01:snapmirrorsource storage02:snapmirrortarget Snapmirrored 00:00:10 Idle
As you can see, after splitting the volume the snapmirror is OK again. Also, teh volume snapmirrortargetclone is now a volume on it's own, and doesn't show as a flexclone anymore in FilerView.
Disaster Recovery Test Witout FlexClone
Testing the data without without Flexclone means breaking the mirror and just using the data. But in case we'll want to restore the mirror really fast we'll use LUN cloning to make sure the mirror can be restored. We'll do so with these steps:
- Break the mirror
- Create a Snapshot
- Clone the LUN with the created snapshot
- Resync the data and restore the SnapMirror relationship
Break the SnapMirror Relationship
src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Snapmirrored 00:00:53 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:00:53 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 88:27:13 Idle src-acc-filer1> snapmirror break src-acc-filer1:AIX_01 snapmirror break: Destination AIX_01 is now writable. Volume size is being retained for potential snapmirror resync. If you would like to grow the volume and do not expect to resync, set vol option fs_size_fixed to off. src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Broken-off 00:01:04 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:01:04 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 88:27:24 Idle
Create a Snapshot
src-acc-filer1> snap create AIX_01 clonesnapshot src-acc-filer1> snap list AIX_01 Volume AIX_01 working... %/used %/total date name ---------- ---------- ------------ -------- 0% ( 0%) 0% ( 0%) May 27 09:13 clonesnapshot 0% ( 0%) 0% ( 0%) May 27 09:12 src-acc-filer1(0151762815)_AIX_01.9687 0% ( 0%) 0% ( 0%) May 27 09:06 src-acc-filer1(0151762815)_AIX_01.9686 0% ( 0%) 0% ( 0%) May 27 06:00 hourly.0 10% (10%) 3% ( 2%) May 27 00:00 nightly.0 10% ( 1%) 3% ( 0%) May 26 18:00 hourly.1 19% (10%) 5% ( 3%) May 26 00:00 nightly.1 26% (11%) 8% ( 3%) May 24 14:08 sjoerd_snapshot 34% (15%) 12% ( 4%) May 23 00:00 weekly.0
Check Free Space
First time I ever tried this I made a mistake. I didn't check my free space requirements AND didn't create the LUN clones without a space reservation. That resulted in losing all snapshots because the volume was still on a fixed size (so couldn't grow) and the only way left to clear space was by deleting snapshots…
So after breaking a volume you can turn fixed size off like this:
vol options AIX_01 fs_size_fixed off
Create Lun clone
Before cloning check the current LUNs for pathnames:
/vol/AIX_01/boot 30g (32212254720) (r/w, online) /vol/AIX_01/optamb 10g (10737418240) (r/w, online) /vol/AIX_01/optoracle 10g (10737418240) (r/w, online) /vol/AIX_01/varbackup 120.0g (128861601792) (r/w, online) /vol/AIX_01/vardata 100g (107374182400) (r/w, online) /vol/AIX_01/vardump 10g (10737418240) (r/w, online) /vol/AIX_01/varlog 40.0g (42953867264) (r/w, online)
Than create the LUN clones with space reservation:
src-acc-filer1> lun clone create /vol/AIX_01/vardataclone -b /vol/AIX_01/vardata clonesnapshot src-acc-filer1> lun clone create /vol/AIX_01/varbackupclone -b /vol/AIX_01/varbackup clonesnapshot
Or without space reservation:
src-acc-filer1> lun clone create /vol/AIX_01/vardataclone -o noreserve -b /vol/AIX_01/vardata clonesnapshot src-acc-filer1> lun clone create /vol/AIX_01/varbackupclone -o noreserve -b /vol/AIX_01/varbackup clonesnapshot
You now have two extra LUNs in the volume:
/vol/AIX_01/boot 30g (32212254720) (r/w, online) /vol/AIX_01/optamb 10g (10737418240) (r/w, online) /vol/AIX_01/optoracle 10g (10737418240) (r/w, online) /vol/AIX_01/varbackup 120.0g (128861601792) (r/w, online) /vol/AIX_01/varbackupclone 120.0g (128861601792) (r/w, online) /vol/AIX_01/vardata 100g (107374182400) (r/w, online) /vol/AIX_01/vardataclone 100g (107374182400) (r/w, online) /vol/AIX_01/vardump 10g (10737418240) (r/w, online) /vol/AIX_01/varlog 40.0g (42953867264) (r/w, online)
And you can check the parent snapshot information by requesting verbose information about the LUNs:
src-acc-filer1> lun show -v /vol/AIX_01/vardataclone /vol/AIX_01/vardataclone 100g (107374182400) (r/w, online) Serial#: W-/QGJd3msKE Backed by: /vol/AIX_01/.snapshot/clonesnapshot/vardata Share: none Space Reservation: enabled Multiprotocol Type: aix src-acc-filer1> lun show -v /vol/AIX_01/varbackupclone /vol/AIX_01/varbackupclone 120.0g (128861601792) (r/w, online) Serial#: W-/QGJd3nOO5 Backed by: /vol/AIX_01/.snapshot/clonesnapshot/varbackup Share: none Space Reservation: enabled Multiprotocol Type: aix
Use the LUNs
Use them as you would always, so just map them to the correct initiator group on a free LUN id (if no LUN id is given, the lowest available will be asigned:
lun map /vol/AIX_01/varbackupclone SRC-AIX-01 20 lun map /vol/AIX_01/vardataclone SRC-AIX-01 21
After mapping you can discover them from the commandline as root (this is on AIX with the host utilities installed):
root@src-aix-01:/home/root>sanlun lun show controller: lun-pathname device filename adapter protocol lun size lun state src-acc-filer1: /vol/SRC_AIX_01/boot hdisk1 fcs0 FCP 30g (32212254720) GOOD src-acc-filer1: /vol/SRC_AIX_01/optamb hdisk2 fcs0 FCP 10g (10737418240) GOOD src-acc-filer1: /vol/SRC_AIX_01/optoracle hdisk3 fcs0 FCP 10g (10737418240) GOOD src-acc-filer1: /vol/SRC_AIX_01/varbackup hdisk4 fcs0 FCP 120.0g (128861601792) GOOD src-acc-filer1: /vol/SRC_AIX_01/vardata hdisk5 fcs0 FCP 100g (107374182400) GOOD src-acc-filer1: /vol/SRC_AIX_01/vardump hdisk6 fcs0 FCP 10g (10737418240) GOOD src-acc-filer1: /vol/SRC_AIX_01/varlog hdisk7 fcs0 FCP 40.0g (42953867264) GOOD root@src-aix-01:/home/root>cfgmgr root@src-aix-01:/home/root>sanlun lun show controller: lun-pathname device filename adapter protocol lun size lun state src-acc-filer1: /vol/SRC_AIX_01/boot hdisk1 fcs0 FCP 30g (32212254720) GOOD src-acc-filer1: /vol/SRC_AIX_01/optamb hdisk2 fcs0 FCP 10g (10737418240) GOOD src-acc-filer1: /vol/SRC_AIX_01/optoracle hdisk3 fcs0 FCP 10g (10737418240) GOOD src-acc-filer1: /vol/SRC_AIX_01/varbackup hdisk4 fcs0 FCP 120.0g (128861601792) GOOD src-acc-filer1: /vol/SRC_AIX_01/vardata hdisk5 fcs0 FCP 100g (107374182400) GOOD src-acc-filer1: /vol/SRC_AIX_01/vardump hdisk6 fcs0 FCP 10g (10737418240) GOOD src-acc-filer1: /vol/SRC_AIX_01/varlog hdisk7 fcs0 FCP 40.0g (42953867264) GOOD src-acc-filer1: /vol/AIX_01/varbackupclone hdisk8 fcs0 FCP 120.0g (128861601792) GOOD src-acc-filer1: /vol/AIX_01/vardataclone hdisk9 fcs0 FCP 100g (107374182400) GOOD
Than on AIX import the volumegroups:
root@src-aix-01:/home/root>importvg -y backupclone hdisk8 0516-530 synclvodm: Logical volume name loglv02 changed to loglv06. 0516-712 synclvodm: The chlv succeeded, however chfs must now be run on every filesystem which references the old log name loglv02. 0516-530 synclvodm: Logical volume name fslv02 changed to fslv06. imfs: mount point "/var/backup" already exists in /etc/filesystems backupclone root@src-aix-01:/home/root>importvg -y dataclone hdisk9 0516-530 synclvodm: Logical volume name loglv03 changed to loglv07. 0516-712 synclvodm: The chlv succeeded, however chfs must now be run on every filesystem which references the old log name loglv03. 0516-530 synclvodm: Logical volume name fslv03 changed to fslv07. imfs: mount point "/var/data" already exists in /etc/filesystems dataclone root@src-aix-01:/home/root>lsvg rootvg optambvg optoraclevg varbackupvg vardatavg vardumpvg varlogvg backupclone dataclone
And add this to /etc/filesystems:
/var/backupclone: dev = /dev/fslv06 vfs = jfs2 log = /dev/loglv06 mount = true options = rw account = false /var/dataclone: dev = /dev/fslv07 vfs = jfs2 log = /dev/loglv07 mount = true options = rw account = false
Than enable volumegroups and mount filesystems:
root@src-aix-01:/home/root>varyonvg backupclone root@src-aix-01:/home/root>varyonvg dataclone root@src-aix-01:/home/root>mkdir /var/backupclone root@src-aix-01:/home/root>mkdir /var/dataclone root@src-aix-01:/home/root>mount /var/dataclone Replaying log for /dev/fslv07. root@src-aix-01:/home/root>mount /var/backupclone Replaying log for /dev/fslv06.
To get the data mounted:
root@src-aix-01:/home/root>df -Pm Filesystem MB blocks Used Available Capacity Mounted on /dev/hd4 2048.00 61.21 1986.79 3% / /dev/hd2 4096.00 1365.46 2730.54 34% /usr /dev/hd9var 1024.00 25.74 998.26 3% /var /dev/hd3 8192.00 898.02 7293.98 11% /tmp /dev/fwdump 832.00 0.45 831.55 1% /var/adm/ras/platform /dev/hd1 512.00 67.16 444.84 14% /home /proc - - - - /proc /dev/hd10opt 4096.00 159.90 3936.10 4% /opt /dev/fslv00 10208.00 1671.47 8536.53 17% /opt/amb /dev/fslv01 10208.00 4333.34 5874.66 43% /opt/oracle /dev/fslv02 122752.00 24100.24 98651.76 20% /var/backup /dev/fslv03 102144.00 53500.98 48643.02 53% /var/data /dev/fslv04 10208.00 1.89 10206.11 1% /var/dump /dev/fslv05 40896.00 444.71 40451.29 2% /var/log /dev/fslv07 102144.00 42812.18 59331.82 42% /var/dataclone /dev/fslv06 122752.00 28333.43 94418.57 24% /var/backupclone
Restore the SnapMirror Relationship
First remove the LUNs from AIX:
root@src-aix-01:/home/root>umount /var/dataclone/ root@src-aix-01:/home/root>umlsvg ount /var/backupclone/ root@src-aix-01:/home/root>rmdir /var/dataclone/ root@src-aix-01:/home/root>rmdir /var/backupclone/ root@src-aix-01:/home/root>varyoffvg backupclone root@src-aix-01:/home/root>varyoffvg dataclone root@src-aix-01:/home/root>exportvg backupclone root@src-aix-01:/home/root>exportvg dataclone root@src-aix-01:/home/root>rmdev -dl hdisk8 root@src-aix-01:/home/root>rmdev -dl hdisk9
And don't forget to clear /etc/filesystems.
Than unmap the LUNs on the filer and simply perform a resync:
src-acc-filer1> lun unmap /vol/AIX_01/varbackupclone SRC-AIX-01 src-acc-filer1> lun unmap /vol/AIX_01/vardataclone SRC-AIX-01 src-acc-filer1> snapmirror resync AIX_01 The resync base snapshot will be: src-acc-filer1(0151762815)_AIX_01.308 These newer snapshots will be deleted from the destination: hourly.0 clonesnapshot These older snapshots have already been deleted from the source and will be deleted from the destination: src-acc-filer1(0151762815)_AIX_01.307 Are you sure you want to resync the volume? yes Mon May 30 14:55:27 CEST [src-acc-filer1: snapmirror.dst.resync.info:notice]: SnapMirror resync of AIX_01 to dst-prd-filer1:AIX_01 is using src-acc-filer1(0151762815)_AIX_01.308 as the base snapshot. Volume AIX_01 will be briefly unavailable before coming back online. Mon May 30 14:55:28 CEST [src-acc-filer1: wafl.snaprestore.revert:notice]: Reverting volume AIX_01 to a previous snapshot. Revert to resync base snapshot was successful. Mon May 30 14:55:29 CEST [src-acc-filer1: replication.dst.resync.success:notice]: SnapMirror resync of AIX_01 to dst-prd-filer1:AIX_01 was successful. Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Snapmirrored 04:37:36 Transferring (9024 KB done) dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:01:35 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 166:09:56 Idle
Troubleshooting
If you would run into this problem it is not possible to resync and you'll have to do a initialize:
Fri May 27 13:57:31 CEST [src-acc-filer1: replication.dst.resync.failed:error]: SnapMirror resync of AIX_01 to dst-prd-filer1:AIX_01 : no common snapshot to use as the base for resynchronization. Snapmirror resynchronization of AIX_01 to dst-prd-filer1:AIX_01 : no common snapshot to use as the base for resynchronization Aborting resync.
Set the target volume in restricted mode and initialize the snapmirror relationship.
Disaster Recovery
In this test we'll actually break the mirror, and use the original data. When the disaster has resolved, we'll use SnapMirror to replicate the changed data to the original source, and then restore the original mirror. So that will take these steps:
- Break the mirror
- Use the data
- Sync the data back from the destination to the source
- Restore the original SnapMirror relationship, making the source the source again.
As this sceanrio is in the beginning the same as with the ESX ISO Store scenario we'll just focus on syncing the data back to the source and then restoring the original SnapMirror relationship. The ISO store is actually a good example since we also had some ISOs in the production site that we didn't have at the acceptance site. So I'm adding these ISO's to the store, and then I'll sync the data back.
Sync Data Back
If original source is available and data is present, including the snapshots: On original source:
src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Snapmirrored 00:06:33 Transferring (26 MB done) dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:00:33 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 166:38:53 Idle src-acc-filer1> snapmirror resync -S dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO The resync base snapshot will be: dst-prd-filer1(0151762648)_ESX_ISOSTORE.3 These newer snapshots will be deleted from the destination: hourly.0 hourly.1 weekly.0 nightly.0 nightly.1 Are you sure you want to resync the volume? yes Mon May 30 15:27:09 CEST [src-acc-filer1: snapmirror.dst.resync.info:notice]: SnapMirror resync of SRC_ACC_ESX_ISO to dst-prd-filer1:ESX_ISOSTORE is using dst-prd-filer1(0151762648)_ESX_ISOSTORE.3 as the base snapshot. Volume SRC_ACC_ESX_ISO will be briefly unavailable before coming back online. Mon May 30 15:27:10 CEST [src-acc-filer1: wafl.snaprestore.revert:notice]: Reverting volume SRC_ACC_ESX_ISO to a previous snapshot. Revert to resync base snapshot was successful. Mon May 30 15:27:11 CEST [src-acc-filer1: replication.dst.resync.success:notice]: SnapMirror resync of SRC_ACC_ESX_ISO to dst-prd-filer1:ESX_ISOSTORE was successful. Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO Snapmirrored 166:41:45 Transferring (28 MB done) dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Snapmirrored 00:03:26 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:03:25 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 166:41:45 Idle
If the original data is not available you can recreate the volume and start a complete new initialization: On original source:
snapmirror initialize -S dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO
Last Data Sync
Then stop all systems, make sure no data is written anymore and do: On original source:
src-acc-filer1> snapmirror update -S dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log.
Restore Original SnapMirror Relationship
On original source:
src-acc-filer1> snapmirror break SRC_ACC_ESX_ISO snapmirror break: Destination SRC_ACC_ESX_ISO is now writable. Volume size is being retained for potential snapmirror resync. If you would like to grow the volume and do not expect to resync, set vol option fs_size_fixed to off.
On original destination:
dst-prd-filer1> snapmirror resync ESX_ISOSTORE The resync base snapshot will be: src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.2 Are you sure you want to resync the volume? yes Tue May 31 08:33:30 CEST [dst-prd-filer1: snapmirror.dst.resync.info:notice]: SnapMirror resync of ESX_ISOSTORE to src-acc-filer1:SRC_ACC_ESX_ISO is using src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.2 as the base snapshot. Volume ESX_ISOSTORE will be briefly unavailable before coming back online. Tue May 31 08:33:32 CEST [dst-prd-filer1: wafl.snaprestore.revert:notice]: Reverting volume ESX_ISOSTORE to a previous snapshot. exportfs [Line 2]: NFS not licensed; local volume /vol/ESX_ISOSTORE not exported Revert to resync base snapshot was successful. Tue May 31 08:33:32 CEST [dst-prd-filer1: replication.dst.resync.success:notice]: SnapMirror resync of ESX_ISOSTORE to src-acc-filer1:SRC_ACC_ESX_ISO was successful. Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log.
Cleanup the Temporary SnapMirror Relationship
Status original source:
src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO Broken-off 00:25:08 Idle dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Snapmirrored 00:01:16 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:01:16 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 00:00:18 Idle
Status original destination:
dst-prd-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Snapmirrored 00:01:00 Idle dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO Source 00:25:50 Idle dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Source 00:01:58 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Source 00:01:58 Idle
So we have a broken-off snapmirror relationship we used for syncing the data from the destination back to the source. Usually a release would do, but not now:
src-acc-filer1> snapmirror release ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO snapmirror release: ESX_ISOSTORE, src-acc-filer1:SRC_ACC_ESX_ISO: source is offline, is restricted, or does not exist
And when trying to delete this in FilerView I got this error:
Invalid Delete Operation: Schedule for 'src-acc-filer1:SRC_ACC_ESX_ISO' already deleted.
Looking into the snapmirror.conf I see that the relationship is not listed:
src-acc-filer1> rdfile /etc/snapmirror.conf #Regenerated by registry Mon Apr 11 13:28:39 GMT 2011 dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 - 0-59/6 * * * dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 - 0-59/6 * * *
But showing the snapshots shows that there are still two snapshots from the destination site:
src-acc-filer1> snap list SRC_ACC_ESX_ISO Volume SRC_ACC_ESX_ISO working... %/used %/total date name ---------- ---------- ------------ -------- 0% ( 0%) 0% ( 0%) May 31 08:55 dst-prd-filer1(0151762648)_ESX_ISOSTORE.2 (snapmirror) 0% ( 0%) 0% ( 0%) May 31 08:30 src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.2 (snapmirror) 0% ( 0%) 0% ( 0%) May 31 06:00 hourly.0 0% ( 0%) 0% ( 0%) May 31 00:00 nightly.0 0% ( 0%) 0% ( 0%) May 30 18:00 hourly.1 0% ( 0%) 0% ( 0%) May 30 15:27 src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.1 0% ( 0%) 0% ( 0%) May 30 00:00 weekly.0 0% ( 0%) 0% ( 0%) May 29 00:00 nightly.1 0% ( 0%) 0% ( 0%) May 23 16:45 dst-prd-filer1(0151762648)_ESX_ISOSTORE.3 (snapmirror)
As SnapMirror is based on snapshots from the destination we'll have to keep the snapshots starting with dst-prd-filer1 and removing the src-acc-filer1 snapshots since they were used for the temporary snapshot:
src-acc-filer1> snap delete SRC_ACC_ESX_ISO src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.2 Tue May 31 09:34:12 CEST [src-acc-filer1: wafl.snap.delete:info]: Snapshot copy src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.2 on volume SRC_ACC_ESX_ISO IBM was deleted by the Data ONTAP function snapcmd_delete. The unique ID for this Snapshot copy is (37, 1967). src-acc-filer1> snap delete SRC_ACC_ESX_ISO src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.1 Tue May 31 09:34:26 CEST [src-acc-filer1: wafl.snap.delete:info]: Snapshot copy src-acc-filer1(0151762815)_SRC_ACC_ESX_ISO.1 on volume SRC_ACC_ESX_ISO IBM was deleted by the Data ONTAP function snapcmd_delete. The unique ID for this Snapshot copy is (33, 1852). src-acc-filer1> snap list SRC_ACC_ESX_ISO Volume SRC_ACC_ESX_ISO working... %/used %/total date name ---------- ---------- ------------ -------- 0% ( 0%) 0% ( 0%) May 31 09:24 dst-prd-filer1(0151762648)_ESX_ISOSTORE.4 (snapmirror) 0% ( 0%) 0% ( 0%) May 31 06:00 hourly.0 0% ( 0%) 0% ( 0%) May 31 00:00 nightly.0 0% ( 0%) 0% ( 0%) May 30 18:00 hourly.1 0% ( 0%) 0% ( 0%) May 30 00:00 weekly.0 0% ( 0%) 0% ( 0%) May 29 00:00 nightly.1 0% ( 0%) 0% ( 0%) May 23 16:45 dst-prd-filer1(0151762648)_ESX_ISOSTORE.3 (snapmirror)
As you can see, the broken off snapmirror is gone now: On original source:
src-acc-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Snapmirrored 00:05:27 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Snapmirrored 00:05:19 Idle src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Source 00:10:22 Idle
But on the original destination the relationship is still available. Note that this is the source for the relationship we're trying to remove:
dst-prd-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Snapmirrored 00:11:41 Idle dst-prd-filer1:ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO Source 01:06:30 Idle dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Source 00:06:46 Transferring (32 MB done) dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Source 00:00:38 Idle
In this case a release works:
dst-prd-filer1> snapmirror release ESX_ISOSTORE src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1> snapmirror status Snapmirror is on. Source Destination State Lag Status src-acc-filer1:SRC_ACC_ESX_ISO dst-prd-filer1:ESX_ISOSTORE Snapmirrored 00:13:15 Idle dst-prd-filer1:AIX_01 src-acc-filer1:AIX_01 Source 00:02:12 Idle dst-prd-filer1:AIX_02 src-acc-filer1:AIX_02 Source 00:02:12 Idle