You probably use the cloud as a backup storage in some way or other, right?
As a developer, you probably use Git to synchronize your code to a remote repository, combining the benefits of version control with those of an external backup. As an “ordinary” user, you simply save important files to a OneDrive, NextCloud, Dropbox, or iCloud Drive directory. All of these procedures are both useful and secure when applied correctly.
The starting point for this example is the desire to additionally synchronize important directories of the notebook to an external data medium. The advantage of this old-fashioned backup method is that stored files are available even if cloud access is not possible at any given time for whatever reason. Local backups also come in handy when data volumes are too large for cloud storage, for example, when virtual machines, video projects, and so on are involved.
We would also like to point out the most important disadvantage right away: “Synchronizing means that files deleted on your notebook will also be deleted on the external disk.” Thus, this backup procedure does not provide the option to restore deleted or overwritten files. In this respect, the Bash or PowerShell scripts presented in this chapter should not represent your sole means of backup but should instead complement other methods.
If you use the script to synchronize files that are currently open and change during copying, the backup is worthless most of the time. This limitation applies, for example, to image files of virtual machines or to databases.
Unfortunately, no universal solution exists for this problem. Ideally, you should run the script at a time when as few files as possible are actively in use. Most database servers provide features to perform consistent backups even while the database is running, but mostly not at the file level. You could shut down virtual machines in your script for the time of the backup and restart them afterwards. However, such approaches depend quite specifically on the software running on your computer.
Depending on the operating system, file system snapshots represent another solution: In this context, you can temporarily freeze a copy of the file system and use this static copy as the basis for the backup. On Linux, the Logical Volume Manager (LVM) or the Btrfs file system provides such options.
robocopy, which stands for robust file copy, is a Windows command that has been shipped with all versions of Windows since 2008. Even though the command cannot be called as a cmdlet, it integrates well into PowerShell scripts. The command’s myriad options are documented at the following links:
The following script starts with the initialization of three parameters that specify which directories should be synchronized to which destination.
Get-Volume retrieves a list of all file systems. If the target disk is not recognized in it, the script will end. For Where-Object, note that although Get-Volume displays the file system name in the FriendlyName column, the property is actually called FileSystemLabel—which is hardly comprehensible with what the developers of Get-Volume had in mind!
When the target file system is detected, the corresponding drive letter is determined. This letter does not always have to remain the same, depending on which data carriers are currently in use.
The script logs all robocopy outputs to a file whose name has the following format: robocopy-2023-12-31--17-30.log. This file and a logging directory are set up automatically.
Finally, the script loops through all the directories listed in $syncdirs and runs robocopy. The options used in this process have the following meaning:
#!/usr/bin/env pwsh
# Sample file sync-folders.ps1
# $destvolume: name of the backup data medium (such as a USB flash drive).
# $destdir: name of the target directory on this data medium
# $logdir: name of the logging directory on the data medium
# $syncdirs: list of directories to be synchronized
# (relative to the personal files)
$destvolume = "mybackupdisk"
$destdir = "backup-kofler"
$logdir = "$destdir\sync-logging"
$syncdirs = "Documents", "Pictures", "myscripts"
# determine drive letter of the target file system
$disk = Get-Volume |
Where-Object { $_.FileSystemLabel -eq $destvolume }
if (! $disk) {
Write-Output "Backup disk $destvolume not found. Exit."
Exit 1
}
$drive = ($disk | Select-Object -First 1).DriveLetter
Write-Output "Syncing with disk ${drive}:"
# create target directory if it does not exist yet;
# | out-zero prevents the directory name from being displayed
New-Item -ItemType Directory -Force "${drive}:\${destdir}" |
Out-Null
# create logging directory and logging file
New-Item -ItemType Directory -Force "${drive}:\${logdir}" |
Out-Null
$logfile = `
"${drive}:\${logdir}\robocopy-{0:yyyy-MM-dd--HH-mm}.log" `
-f (Get-Date)
New-Item $logfile | Out-Null
# loop through the sync directories
foreach ($dir in $syncdirs) {
$from = "${HOME}\$dir"
$to = "${drive}:\${destdir}\$dir"
Write-Output "sync from $from to $to"
robocopy /e /purge /xo /log+:$logfile "$from" "$to"
}
In tests, the script kept throwing the access denied error (error code 5) when writing the files to the USB drive. Countless reports exist about this error on the internet and almost as many suggested solutions. Apparently, only running the script with admin privileges works reliably. You can simply run initial tests in a terminal window with administrator rights. After that, you must set up the script in Microsoft Windows Task Scheduler to run once a day with admin rights (Run with highest privileges option in the General dialog sheet).
Before you entrust your data to the script, we would like to point out a few restrictions and describe how you can optimize the script:
If you work on Linux or macOS, the best way to realize comparable synchronization tasks is to use a Bash script with rsync. The following script has the same structure as the PowerShell script described earlier. It tests whether the backup volume is available at a particular mount point and then runs the rsync command for a list of local directories.
rsync also works in conjunction with SSH. Thus, you can adapt the script relatively easily so that your directories are not synchronized with an external data medium but with another computer.
# Sample file sync-folders.sh
# what is to be synchronized where
DESTVOLUME="/run/media/kofler/mybackupdisk"
DESTDIR="backup-kofler"
LOGDIR="$DESTDIR/sync-logging"
SYNCDIRS=("Documents" "Pictures" "myscripts")
# is the backup file system available?
if ! mount | grep $DESTVOLUME --quiet; then
echo "Backup disk $DESTVOLUME not found. Exit."
exit 1
fi
# create destination and logging directories
mkdir -p "$DESTVOLUME/$DESTDIR"
mkdir -p "$DESTVOLUME/$LOGDIR"
# compose file name for logging
logfname=$(date "+rsycn-%Y-%m-%d--%H-%M.log")
log="$DESTVOLUME/$LOGDIR/$logfname"
# loop through all directories
for dir in "${SYNCDIRS[@]}"; do
from=$HOME/$dir
to=$DESTVOLUME/$DESTDIR/$dir
echo "sync from $from to $to"
rsync -a --delete -v "$from" "$to" >> $log
done
The rsync options have the following effect:
To have the script called automatically once a day at 12:30 pm, we added the following entry to /etc/crontab. You can simply adjust the desired time, account name, and path to the backup script to suit your particular requirements:
# in file /etc/crontab
30 12 * * * kofler /home/kofler/myscripts/sync-folders.sh
macOS: This script has been tested only on Linux. For macOS, small adaptations are required, especially regarding the test that checks whether the backup disk is currently connected to the computer.
Editor’s note: This post has been adapted from a section of the book Scripting: Automation with Bash, PowerShell, and Python by Michael Kofler.