SHIFT

--- Sjoerd Hooft's InFormation Technology ---

User Tools

Site Tools


Sidebar

Recently Changed Pages:

View All Pages


View All Tags


LinkedIn




WIKI Disclaimer: As with most other things on the Internet, the content on this wiki is not supported. It was contributed by me and is published “as is”. It has worked for me, and might work for you.
Also note that any view or statement expressed anywhere on this site are strictly mine and not the opinions or views of my employer.


Pages with comments

View All Comments

s3copytoazure

Move data from AWS S3 to Azure Storage Container

So consider the following use case, you have some data on an AWS S3 bucket which you need to automatically move to another cloud provider, in this case Azure, because of continuity requirements. I decided to use an Azure DevOps pipeline, and azcopy to transfer the files so follow me on how to set this up.

Get Access to the S3 Bucket

First step is to configure an IAM user with access to the S3 bucket:

  • In the AWS portal, go to IAM, and create a group 'S3ReadOnly'
    • Do not attach any user yet
    • Attach the permission policy 'AmazonS3ReadOnlyAccess'
  • In the AWS portal, go to IAM, and create a user 's3.backup'
    • Do NOT provide user access to the AWS Management Console
    • Add the user group 'S3ReadOnly'
  • Once the user is created select the user and go to the 'Security credentials' of that user
    • Go to access keys and create a new one
      • Use case: Application running outside AWS
      • Description: Backup S3 data to Azure
      • After creation you can copy the access key and the secret. This is the only time you can copy the secret so do so and store it in your password manager

Get Access to the Azure Storage Container

To get access to the storage container we'll use a Shared Access Signature:

  • Go into the Azure Portal, to the Storage Account and Shared access signature. Configure the SAS like this:
    • Allowed services: Blob
    • Allowed resource types: Container, Object
    • Set End Date: 31-dec-2024
    • Keep everything else default
    • Click 'Generate SAS and connection string'
    • Copy the 'Blob service SAS URL' and keep it available for now
Note: Keep track of the signing key you're using. If the SAS ever gets compromised you need to rotate the key which was used to create the SAS with

Configure the Storage Account and SAS

If you've copied the SAS URL you can notice it exists of three parts:

  1. The base url, consisting of the storage account name and the azure blob storage base url: 'https://s3backup.blob.core.windows.net/'
  2. The SAS which holds the configuration: '?sv=2022-11-02&ss=b&srt=c&sp=rwdlaciytfx&se=2023-07-07T15:21:13Z&st=2023-07-07T07:21:13Z&spr=https&sig='
  3. The signature, which is everything that comes after 'sig=' in the SAS. This is the sensitive part, so again, add it to your password manager

Before we can copy data to the container in the storage account we need to add a container to the storage account. In the azure portal, go to the storage account and then Data Storage and select Containers. Click add to add a container, and keep the public access level to private. Now add the name of the container (s3backup) to the base url you got from the SAS, so it will become something like this: 'https://s3backup.blob.core.windows.net/s3backup'

Setup the pipeline

Now go to Azure DevOps, but before we create the actual pipeline we will first add the AWS secret and the SAS signature as a secret to a variable group.

  • Go to Azure DevOps, to pipelines and click library
  • Add a variable group and name it 'Backups'
  • Add two keys named 'AWS_KEY' and 'SAS_SIG' with the correct values and make sure you configure them as a secret.

Now we got all the requirements in place and we can create the pipeline. The pipeline can be created like this:

  • Go to Azure DevOps, to pipelines and click New Pipeline
  • Use Azure Repos Git, select your Repo and configure your pipeline as a starter pipeline. This will show you a starter pipeline which you can replace with the yaml below
  • Click save and run to start the pipeline

Note the yaml below has the following confgurations (from above to below):

  • Trigger is none as we'll use a schedule
  • Pool is set to Windows-latest which works best with the powershell task we'll use later on
  • Variables is set to the variables group 'Backups' in which we've put the required secrets
  • We'll use stage variables:
    • SourceBucket: must be set to the S3 bucket url
    • TargetContainer: must be set to the base url you've gathered from the SAS URL combined with the container name
    • SasToken: The SAS without the signature you got from the SAS URL
    • AWSAccessKey: The AWS access key
  • The task that does the actual backup must be run always and is mapped with the two secrets from the variable group.
  • In the inline script the following things are done:
    • For logging purposes we log the task name as well as the azcopy version
    • We get yesterdays date in the correct format
    • We set the AWS KEY and SECRET as system variables
    • We log the files that are already in the container
    • We copy the required files and log the results
name: $(Build.DefinitionName)-$(Build.BuildId)
appendCommitMessageToRunName: false

trigger: none

pool:
  vmImage: 'windows-latest'

schedules:
- cron: "0 2 * * 0,2"
  displayName: Weekly backup on Sunday and Wednesday 2h
  branches:
    include:
    - main
  always: true

variables:
- group: Backups

stages:
- stage: S3Backup
  displayName: "Backups"
  condition: always()
  variables: 
    SourceBucket          : 'https://s3.eu-central-1.amazonaws.com/backup'
    TargetContainer       : 'https://s3backup.blob.core.windows.net/s3backup'
    SasToken              : '?sv=2022-11-02&ss=b&srt=c&sp=rwdlaciytfx&se=2023-07-07T15:21:13Z&st=2023-07-07T07:21:13Z&spr=https&sig='
    AWSAccessKey          : 'AKIAXXXXXXXXXXXXXXXXXX'

  jobs: 
  - job: RisktoolData
    displayName: "Data backup to Azure"
    steps:
    - checkout: self

    - task: PowerShell@2
      displayName: "Backup risktool data"
      condition: always()
      env:
        AWSKEY: $(AWS_KEY)
        SASSIG: $(SAS_SIG)
      inputs:
        pwsh: true
        targetType: 'inline'
        script: |
          Write-Host "`n##[section]Task: $env:TASK_DISPLAYNAME `n"
          azcopy --version
          $yesterday = Get-Date $((Get-Date).AddDays(-1)) -Format yyyyMMdd
          $env:AWS_ACCESS_KEY_ID=$env:AWSAccessKey
          $env:AWS_SECRET_ACCESS_KEY=$env:AWSKEY
          $targetUrl = "$($env:TargetContainer)$($env:SasToken)$($env:SASSIG)"
          Write-Host "`n##[section]Show target container objects: $targetUrl"
          azcopy list $targetUrl
          Write-Host "`n##[section]Start FS Backup from $yesterday"
          azcopy copy "$($env:SourceBucket)/PRD/FS_db_$($yesterday).all.out.gz.enc" $targetUrl --recursive=true
          Write-Host "`nLog:`n"
          $logfile = Get-ChildItem "C:\Users\VssAdministrator\.azcopy\" | sort LastWriteTime | select -last 1
          Get-Content -Path "C:\Users\VssAdministrator\.azcopy\$($logfile.Name)"
          Write-Host "`n##[section]Start DB Backup from $yesterday"
          azcopy copy "$($env:SourceBucket)/PRD/DB_db_$($yesterday).all.out.gz.enc" $targetUrl --recursive=true
          Write-Host "`nLog:`n"
          $logfile = Get-ChildItem "C:\Users\VssAdministrator\.azcopy\" | sort LastWriteTime | select -last 1
          Get-Content -Path "C:\Users\VssAdministrator\.azcopy\$($logfile.Name)"

Cleanup Copied Files in Azure Storage Container

Use this lifecycle management policy to cleanup files automatically after 30 days:

  • Go to the azure portal and select the storage account and then the storage container
  • Under data management, click on lifecycle management → Add a Rule
    • Rule name: Delete after 30 days
    • Rule scope: Apply rule to all blobs in your storage account
    • Blob Type: Block blobs
    • Blob subtype: Base blobs
  • Click next to go to the base blobs conditions. Create a condition rule with these settings:
    • If Base blobs were Created More than 30 Days ago Then Delete the blob
  • Click Add to add the rule

See lifecycle management overview and configure a lifecycle management policy for more information.

Troubleshooting

Common errors:

failed to perform copy command due to error: failed to initialize enumerator: cannot transfer individual files/folders to the root of a service. Add a container or directory to the destination URL

Solution: You forgot to create a container in the storage account and add it to the TargetContainer url in the stage variables

This request is not authorized to perform this operation using this resource type.. When Put Blob from URL.

Solution: Configure the allowed resource types for the SAS as 'Container' and 'Object'

You could leave a comment if you were logged in.
s3copytoazure.txt · Last modified: 2023/07/07 10:10 by sjoerd