Learn Computing from the Experts | The Rheinwerk Computing Blog

How to Synchronize Scripting Directories to External Storage

Written by Rheinwerk Computing | Nov 1, 2024 1:00:00 PM

You probably use the cloud as a backup storage in some way or other, right?

 

As a developer, you probably use Git to synchronize your code to a remote repository, combining the benefits of version control with those of an external backup. As an “ordinary” user, you simply save important files to a OneDrive, NextCloud, Dropbox, or iCloud Drive directory. All of these procedures are both useful and secure when applied correctly.

 

The starting point for this example is the desire to additionally synchronize important directories of the notebook to an external data medium. The advantage of this old-fashioned backup method is that stored files are available even if cloud access is not possible at any given time for whatever reason. Local backups also come in handy when data volumes are too large for cloud storage, for example, when virtual machines, video projects, and so on are involved.

 

We would also like to point out the most important disadvantage right away: “Synchronizing means that files deleted on your notebook will also be deleted on the external disk.” Thus, this backup procedure does not provide the option to restore deleted or overwritten files. In this respect, the Bash or PowerShell scripts presented in this chapter should not represent your sole means of backup but should instead complement other methods.

 

Be Careful with Backups of Files Currently Used

If you use the script to synchronize files that are currently open and change during copying, the backup is worthless most of the time. This limitation applies, for example, to image files of virtual machines or to databases.

 

Unfortunately, no universal solution exists for this problem. Ideally, you should run the script at a time when as few files as possible are actively in use. Most database servers provide features to perform consistent backups even while the database is running, but mostly not at the file level. You could shut down virtual machines in your script for the time of the backup and restart them afterwards. However, such approaches depend quite specifically on the software running on your computer.

 

Depending on the operating system, file system snapshots represent another solution: In this context, you can temporarily freeze a copy of the file system and use this static copy as the basis for the backup. On Linux, the Logical Volume Manager (LVM) or the Btrfs file system provides such options.

 

PowerShell Script with robocopy

robocopy, which stands for robust file copy, is a Windows command that has been shipped with all versions of Windows since 2008. Even though the command cannot be called as a cmdlet, it integrates well into PowerShell scripts. The command’s myriad options are documented at the following links:

The following script starts with the initialization of three parameters that specify which directories should be synchronized to which destination.

 

Get-Volume retrieves a list of all file systems. If the target disk is not recognized in it, the script will end. For Where-Object, note that although Get-Volume displays the file system name in the FriendlyName column, the property is actually called FileSystemLabel—which is hardly comprehensible with what the developers of Get-Volume had in mind!

 

When the target file system is detected, the corresponding drive letter is determined. This letter does not always have to remain the same, depending on which data carriers are currently in use.

 

The script logs all robocopy outputs to a file whose name has the following format: robocopy-2023-12-31--17-30.log. This file and a logging directory are set up automatically.

 

Finally, the script loops through all the directories listed in $syncdirs and runs robocopy. The options used in this process have the following meaning:

  • /e: Traverse directories recursively
  • /purge: Delete locally deleted files and directories also in the backup (caution!)
  • /xo: Copy only files that have changed; this last option speeds up the synchronization process enormously from the second pass onwards
  • /log+:filename: Adds logging output to the specified file

#!/usr/bin/env pwsh

# Sample file sync-folders.ps1

# $destvolume: name of the backup data medium (such as a USB flash drive).

# $destdir:    name of the target directory on this data medium

# $logdir:     name of the logging directory on the data medium

# $syncdirs: list of directories to be synchronized

#            (relative to the personal files)

$destvolume = "mybackupdisk"

$destdir    = "backup-kofler"

$logdir     = "$destdir\sync-logging"

$syncdirs   = "Documents", "Pictures", "myscripts"

# determine drive letter of the target file system

$disk = Get-Volume |

   Where-Object { $_.FileSystemLabel -eq $destvolume }

if (! $disk) {

   Write-Output "Backup disk $destvolume not found. Exit."

   Exit 1

}

$drive = ($disk | Select-Object -First 1).DriveLetter

Write-Output "Syncing with disk ${drive}:"

 

# create target directory if it does not exist yet;

# | out-zero prevents the directory name from being displayed

New-Item -ItemType Directory -Force "${drive}:\${destdir}" |

   Out-Null

 

# create logging directory and logging file

New-Item -ItemType Directory -Force "${drive}:\${logdir}" |

   Out-Null

$logfile = `

   "${drive}:\${logdir}\robocopy-{0:yyyy-MM-dd--HH-mm}.log" `

   -f (Get-Date)

New-Item $logfile | Out-Null

 

# loop through the sync directories

foreach ($dir in $syncdirs) {

   $from = "${HOME}\$dir"

   $to = "${drive}:\${destdir}\$dir"

   Write-Output "sync from $from to $to"

   robocopy /e /purge /xo /log+:$logfile "$from" "$to"

}

 

In tests, the script kept throwing the access denied error (error code 5) when writing the files to the USB drive. Countless reports exist about this error on the internet and almost as many suggested solutions. Apparently, only running the script with admin privileges works reliably. You can simply run initial tests in a terminal window with administrator rights. After that, you must set up the script in Microsoft Windows Task Scheduler to run once a day with admin rights (Run with highest privileges option in the General dialog sheet).

 

Ideas for Improvement

Before you entrust your data to the script, we would like to point out a few restrictions and describe how you can optimize the script:

  • Warning in case of error or non-execution: The script simply aborts if the external data drive is not available at the moment. A better approach would be to display or send a warning in some form after several unsuccessful attempts.
  • Exclusion rules: The script can synchronize directories either completely or not at all. In practice, exclusion criteria would be helpful for files or directories that should not be included in the backup. 

Bash Script with rsync

If you work on Linux or macOS, the best way to realize comparable synchronization tasks is to use a Bash script with rsync. The following script has the same structure as the PowerShell script described earlier. It tests whether the backup volume is available at a particular mount point and then runs the rsync command for a list of local directories.

 

rsync also works in conjunction with SSH. Thus, you can adapt the script relatively easily so that your directories are not synchronized with an external data medium but with another computer.

 

# Sample file sync-folders.sh

# what is to be synchronized where

DESTVOLUME="/run/media/kofler/mybackupdisk"

DESTDIR="backup-kofler"

LOGDIR="$DESTDIR/sync-logging"

SYNCDIRS=("Documents" "Pictures" "myscripts")

 

# is the backup file system available?

if ! mount | grep $DESTVOLUME --quiet; then

   echo "Backup disk $DESTVOLUME not found. Exit."

   exit 1

fi

 

# create destination and logging directories

mkdir -p "$DESTVOLUME/$DESTDIR"

mkdir -p "$DESTVOLUME/$LOGDIR"

 

# compose file name for logging

logfname=$(date "+rsycn-%Y-%m-%d--%H-%M.log")

log="$DESTVOLUME/$LOGDIR/$logfname"

 

# loop through all directories

for dir in "${SYNCDIRS[@]}"; do

   from=$HOME/$dir

   to=$DESTVOLUME/$DESTDIR/$dir

   echo "sync from $from to $to"

   rsync -a --delete -v "$from" "$to" >> $log

done

 

The rsync options have the following effect:

  • -a (archive): Process directories recursively, get file information (owner, access rights)
  • --delete: Delete locally deleted directories and files also in backup (caution!)
  • -v (verbose): Output in detail what is currently going on

To have the script called automatically once a day at 12:30 pm, we added the following entry to /etc/crontab. You can simply adjust the desired time, account name, and path to the backup script to suit your particular requirements:

 

# in file /etc/crontab

30 12 * * * kofler /home/kofler/myscripts/sync-folders.sh

 

macOS: This script has been tested only on Linux. For macOS, small adaptations are required, especially regarding the test that checks whether the backup disk is currently connected to the computer.

 

Editor’s note: This post has been adapted from a section of the book Scripting: Automation with Bash, PowerShell, and Python by Michael Kofler.