Folder Structure Standardization and Unison for File Synchronization

We’re smart, electronics- and computer-savvy folks, right? So why is it that when I’m trying to figure out which of my computers a particular EAGLE project is on, I have to envision where I was sitting:

  • couch or kitchen == iBook or MacBook
  • home office == workstation
  • work is unlikely but == work computer

and about how long ago it was:

  • more than a year == iBook
  • less == MacBook

? Shouldn’t all of my files be available to me wherever I am? Why should I have to guess and look around and always have them in the wrong place?

Oh, sure, when I upgraded from the iBook to the MacBook, I could have used Migration Assistant to copy everything over; but it seemed like a great time to declutter, organize, and start fresh. And it was, until I didn’t get around to the organizing part and needed EAGLE files I hadn’t brought over yet. Like, now.

I’ve been home sick from work today yesterday and today, and during the parts that I was awake I got files synced across my different platforms. I haven’t been playing sick to get a chance to sync up my computers — whatever I have is making me sniffle, speak about an octave lower than normal, drink gallons of orange juice, and listen to Madonna CDs. You don’t want what I have, and neither do the people I work with.

Unison Background

For a long time, I’ve been intending to install Unison for syncing my electronics project files (entire hierarchies, actually) across the different computers I use. Now I’ve actually done it.

Unison is open-source software that’s no longer under development; its creators have moved on to a new project called Harmony that’s a programming language for bidirectional actions and looks like somewhat of an abstraction of the same principles used in Unison. I think Harmony is quite a bit more than I need, and Unison is fully functional and still supported by a network of dedicated users, so Unison is my pick.

Unison’s core model is familiar to users of version-control systems: it looks for files that have changed between a pair of computers and offers to reconcile them. Files changed (including creation and deletion) on only one computer or the other are easy to synchronize — just copy them across (or delete them). Files changed on both computers since the last synchronization are flagged for operator intervention, to determine how they should be handled, and Unison can even merge files with non-overlapping changes (as do modern version-control systems).

Because Unison works across pairs of computers, if you want to sync more than two computers, you need to plan your own strategy for synchronizing the computers a pair at a time. The recommended strategy, and the one that makes the most sense for me, is to adopt a hub-and-spoke model where one computer acts as the central repository and the other computers all sync to it (and by syncing with it stay in sync with each other as well). A logical choice for my hub would be my home server; but I actually don’t do electronics work on it, so I’m starting out with my home workstation acting as the repository.

I looked at a number of other options, but none met all my needs. rsync was close and I already use it to copy all my blog pics up to my image hosting server, but it doesn’t detect the most recently updated endpoint and pick which direction to synchronize — it simply makes the destination look like the source, whichever way you’re aiming it. Unison knows which way to copy files if I’ve worked on both project A on the MacBook and project B on the desktop since the last time I’ve synced.

Pre-Sync Standardization

Aside: Because I’m an old Unix geek, I refer to hierarchical organizations of files as directories, not folders. If you’re a folder person, just substitute “folder” for “directory” throughout and you’ll be fine.

Over the years, I’ve evolved several different directory structures for my electronics projects. I always have an electronics/ directory; but sometimes I’ve put EAGLE and Arduino project directories outside of electronics/, sometimes in electronics/eagle/ and electronics/arduino/ subdirectories, and sometimes scattered loose in electronics/.

Server (where apparently I occasionally backed things up):

  • electronics/
    • [EAGLE projects]/
    • [other directories]/

The workstation’s current home directory had nothing because I recently upgraded the hard drives and OS, but the old hard drive on /mnt/home/neufeld/ had:

  • eagle/
    • [EAGLE projects]/
  • electronics/
    • datasheets/
    • [non-EAGLE projects]/

iBook:

  • electronics/
    • arduino-[####]/
    • datasheets/
    • [EAGLE projects]/

MacBook:

  • electronics/
    • Arduino/
    • datasheets/
    • [EAGLE projects]/

If I’m going to synchronize all those files across all those platforms, I need to standardize on a single directory structure. The MacBook’s is the most recent and pretty much what I want: Everything goes under electronics/, with all the files for a given project bundled together in a subdirectory regardless of whether they’re EAGLE, GNUPlot, OODraw, etc. The one exception is that Arduino sketches are easier to find when they’re in a common subdirectory, so I’ll leave an electronics/Arduino/ directory for that.

What about capitalization and spaces in file and directory names? When I’m slinging files around on my Linux workstation and server, it’s easier not to have spaces in filenames; and working in that environment brings back old habits to use all lowercase filenames. But I do most of my work now on the MacBook and (when using the Linux workstation) in GUI applications that comfortably support caps and spaces, so I have a lot of things named with caps and spaces. I haven’t managed to make up my mind yet.

My basic process for diving into file synchronization, then, looks like this:

  1. Install Unison on the (mostly empty) workstation.
  2. Clean up the directory structure on a “client” machine to fit the new standard.
  3. Install Unison on the client and sync to the workstation to begin populating the repository.
  4. Iterate on other clients until all clients have copied to the workstation once. Also sanitize the structure on the workstation’s old hard drive and sync it to the repository on the current drive.
  5. Retire any clients I’m not going to use any more (iBook).
  6. Resync all clients I’m going to continue using, to copy from the workstation everything that the other clients have just dumped there.

Then the ongoing process should be:

  1. Sync a client before using it to work on a project, to pick up any changes in the repository.
  2. Do some work.
  3. Sync the client after working, to push changes back to the repository.

Thus the repository (the workstation) will always be up to date, and clients will get updates from it and write updates to it on demand.

Unison Installation and First Sync (MacBook to Workstation)

Installing Unison on the workstation was as simple as

yum install unison

For the first client, I picked the MacBook, as it already had the directory structure I wanted and didn’t require any reorganization. I downloaded a Mac DMG installer from Alan Schmitt’s site and got to work.

First I followed the tutorial section of the Unison documentation to create and synchronize some scratch files and directories and verify that Unison really worked as promised, which it did.

Unison profile

Next I set up a profile on the MacBook indicating that files would be synchronized between my MacBook’s electronics directory (/Users/neufeld/electronics) and the workstation’s electronics directory via an SSH host entry already in my .ssh/config. I don’t know why the Mac client requires the full path to my home directory rather than accepting a relative path — I suspect the Unison application’s current directory is where I installed it in /Applications/Utilities/Unison.app/.

Unison file synchronization

After I saved and selected the new profile, Unison scanned the electronics/ directory structure on each system and indicated several things that needed to be pushed from the MacBook to the workstation.

Unison file synchronization

I pressed “Go” and Unison quickly copied the files across to the workstation.

Unison file synchronization

Rescanning showed that everything was now in sync, as it should be.

Unison file synchronization

I had some files on my MacBook’s desktop that I hadn’t stashed away in electronics yet, so I dragged them into their proper folders and then had Unison rescan. It found the files and directories and indicated that they needed to be synced to the workstation.

Unison file synchronization

I pressed “Go” again and it copied the files across.

Unison file synchronization

Another rescan showed everything synced again, exactly as expected.

iBook Synchronization

The iBook was the real motivation for syncing, as it has the “fob” LED driver EAGLE files I wanted to work on, so it was the next client to do.

I dragged folders around the iBook (I know I said I’d call them “directories,” but I’m in the habit of using the jargon of the platform I’m working on, so deal, ‘kay?) to fit my new structure. I also noted with dismay the massive clutter inside the electronics/ folder and did a little cleaning — but decided to wait with the main cleaning until I had it synced to my MacBook with the larger screen and snappier response. Remember, tidy things up on one computer and the cleanliness will propagate to all of them.

Unison file synchronization

Installing Unison and configuring the profile were exactly the same as on the MacBook; but because I had done real work on the iBook for several years, it had considerably more data to copy to the workstation. And of course it needed to copy to the iBook the few files in the workstation repository that had originated on the MacBook.

Unison file synchronization

I didn’t expect to find any conflicts, but scrolled through the entire list to see whether I might have manually copied across an EAGLE project or two. I found the .DS_Store files at the bottom of the list, which hold OS X desktop settings like the window size, icon position, etc. I didn’t need to sync those files across platforms, so I marked them both to skip and ran the sync.

Unison file synchronization

The larger collection of files didn’t take as long to copy across my 802.11b (yes, b, shut up) wireless connection as you’d expect 864MB to take. I read in Unison’s documentation that in order to conserve bandwidth, it can find multiple copies of identical files (not links but actual copies) and only send them across the wire once, so I suspect it may have saved dramatically on the multiple Arduino directories.

The sync finished and rescanned as successfully synced. Then I did the same on the iBook as I had done on the MacBook — cleaned up some desktop clutter that I hadn’t filed into electronics/ yet and ran another sync.

Unison file synchronization

Rather than manually skip the .DS_Store files each time, I selected one of them and then went up to the Ignore menu and chose Ignore Name.

Unison file synchronization

Both of the .DS_Store files disappeared from the sync list, so the ignore feature appears to have worked. The synchronization of the last few files from the iBook completed as expected.

iBook Pre-Retirement Cleanup

I don’t plan on using the iBook again, other than to retrieve the last few (non-electronics) files I have on it. Whenever I think I’m done using a computer, I make sure that I’ve copied everything I want off of it and delete files and applications as I go so it’s easy to see that I’ve covered everything I need.

So … it’d be bad if I cleaned off the iBook and then (for whatever reason) ran Unison again, as it would detect the change and blithely offer to delete all the files from my workstation as well.

It’s not enough to uninstall Unison from the iBook — what if I want to use Unison again for syncing a fileset other than my electronics projects? At a minimum, I needed to remove the profile that syncs the electronics/ directories. Even having done that, if I stupidly recreated the profile, Unison’s internal archive tables would still remember about the electronics/ synchronization that had existed and realize that everything on the workstation “should” be deleted. Ideally I needed to remove that archive information as well.

On the Mac, profiles and synchronization archives aren’t in a .unison directory as documented, but rather in Library/Application Support/Unison . In that folder on the iBook, I see two files:

ard05982ded52c2ec7eb00q4d00ee7cb58

Unison archive format 22
Archive for root //wireless-37.neufeld.newton.ks.us//Users/neufeld/electronics synchronizing roots //dell2600.neufeld.newton.ks.us//home/neufeld/electronics, //wireless-37.neufeld.newton.ks.us//Users/neufeld/electronics
Written at 2010-02-15 at 12:15:58
[binary data]

workstation.prf

# Unison preferences file
root = /Users/neufeld/electronics
root = ssh://neufeld@workstation/electronics
ignore = Name {.DS_Store}

Since I had already done some exploring with the MacBook’s profiles and archives of the tutorial examples, I was confident that I simply needed to remove these and dragged them to the trash.

On the Linux workstation, the archive file was in .unison as promised. There was no profile because I hadn’t yet copied from the workstation to anywhere and profiles get created where syncs are initiated; but there were three archive files because I had synced a separate directory from the MacBook while testing, and then electronics/ from the MacBook and from the iBook.

None of the three archive filenames matched the archive filename on the iBook (which would have been handy), and the hostnames in the archive files were wireless-33 and wireless-37 — not the most helpful in the world. (OS X dutifully takes on the hostname that my DHCP server assigns it, rather than keep the name set in System Preferences / Sharing. Whatever.) But wireless-33 appeared in two of the archive files and was what the MacBook was currently calling itself and wireless-37 appeared in only one file and was what the iBook was calling itself. Also the wireless-37 file was massive compared to the other two, which made sense as I had synced much more data across the iBook – workstation pair. I removed the workstation archive file for the iBook.

And having protected myself as much as possible against unintended future Unison file deletion, I dragged the iBook’s entire electronics folder into the trash and emptied the trash. Had I not overlooked one small detail, that would have been exactly the right thing to do. (Oh, don’t get too worked up. I didn’t lose any data, just metadata. I’ll get back to that.)

Old Workstation HD to Current Workstation Synchronization

Next I needed to copy files from my workstation’s old hard drive into the new repository. After changing to my old home directory, I moved the separate eagle/ directory under electronics/. That’s not the way I intend to leave things — remember that EAGLE project directories are now going to live amongst the other electronics projects — but there was so much clutter in electronics/ that I wanted to do the same as on the iBook, merging now and cleaning up later.

Then from the shell I ran

unison electronics ~/electronics

It popped up a GUI showing me lots of files to sync — bidirectionally, of course.

The list of files coming from the repository to the old hard drive initially contained not only .DS_Store files from the Mac synchronization, but also ._foo shadows of every file that came from the Mac — the way Macs store resource forks on non-native filesystems. After some experimentation, I found that these directives in the Unison profile suppressed all the resource fork files from my display:

ignore = Path {.DS_Store}
ignore = Path {*/.DS_Store}
ignore = Name {._*}

Of course, since all of those files were already in the common repository, it wasn’t like I was keeping clutter out of the repository; but having the resource fork files removed from the display made it easier to scan through the list for conflicts before syncing.

Unison file synchronization

With the resource fork files ignored, the list was all cleaned up and ready to go, so I ran the sync.

Unison file synchronization

Hm, looks like everything copied to the common repository just fine, but most things didn’t copy from it.

Unison file synchronization

Ah yes, the original motivation for replacing my 80G workstation hard drive with a mirrored pair of 250G drives — the /home filesystem was full. Unison can’t add more files to a full partition, so it can’t sync everything from the repository to the old hard drive.

Okay, so this is not a big deal. I’m trying to empty the old hard drive anyway, so why would I want a fresh copy of the full repository — all I really needed was to get everything from the old HD to the repository.

I scrolled through the display to confirm that all of the “forward” copies succeeded, then called it good and removed the profile and archive files from my .unison directory as before to protect myself from future deletion problems.

I also cleaned out the few files that had synced from the repository to the old drive, but I didn’t yet removed the entire electronics/ directory. I’m not sure why, but it was a fortuitous choice. (Metadata. Keep reading.)

Workstation Repository to MacBook Synchronization

Finally, having synchronized all the different sources of electronics files into the repository, I needed to sync all the collected files from the repository back to the MacBook so I could do the EAGLE design work that started this whole thing.

Unison file synchronization

I started up Unison on the MacBook, selected the profile for the main workstation repository, and let Unison find all the repository additions.

Unison file synchronization

I scrolled through the list and found one problem — two directories on the workstation whose names differed only in case. Because I have the default case-insensitive-but-preserving filesystem running on the MacBook, the two directories named helsing and Helsing on the Linux workstation wouldn’t be differentiable on the MacBook. I hit “Go” to sync everything else and went to investigate why I had two different same-named directories.

Unison file synchronization

As I had guessed, one of the directories wewasre originally from the iBook and the other from the workstation’s old hard drive, when I’d been working on a project on both of them and manually copied a few files back and forth. No problem, just reconcile the two by hand this once and I’ll be back in business.

CRAP. I discovered that Unison doesn’t preserve file modification times (metadata, like I promised) by default. All of the files in the repository had a mod time of when they were copied, not when they were originally created, making hand-reconciliation more difficult and completely eliminating my record of when I worked on different projects. Looks like I need to add the -times flag to the command line or the times flag to the profile.

MAN, I wish I’d caught that before deleting all the files from the iBook. I still had original mod times from the MacBook and from the workstation’s old hard drive, and I knew I could use my mad Unix skilz to set all the mod times right in the repository and on the MacBook, but it was obviously going to be a huge pain. Since my understanding was that Unison uses mod times to determine freshness, I couldn’t just fix the mod times in the repository and sync again — I needed to write a script to update mod times on both the MacBook and the workstation before the next sync.

That must be why I hadn’t deleted electronics/ from the old HD yet.

In order to keep things progressing, I renamed the two directories as linux and ibook subdirectories below a common helsing/ directory.

Meanwhile, the synchronization of the okay files between the workstation and the MacBook finished.

Unison file synchronization

After rescanning, the remaining directories now synced fine.

Everything I intended to sync was now synced, albeit with incorrect mod times on all the files originating from the iBook and the workstation’s old hard drive.

EAGLE Settings

I still needed to update my application settings to use the new directory structure. To make EAGLE create find project files in the right place, I went into Options / Directories... and changed the Projects entry to $HOME/electronics:$EAGLEDIR/projects/examples .

All done.


L33T Perl Skilz

Oh, who am I kidding. I couldn’t let those messed up mod times sit there; it itched at my brain.

To begin creating the solution, I ran

find2perl electronics -depth > gen-fix-elec-mtimes

This generates a Perl script that does the same thing as the find command with the same flags and arguments. Note that I ran it to search a relative path (electronics) rather than an absolute path (/mnt/home/neufeld/electronics) because I needed the ultimate product to be a script that changed mod times in the electronics/ directory under the current directory (the home directory on the workstation and on the MacBook) when I ran it.

I edited the generated script to change directory to /mnt/home/neufeld before doing the search and then fleshed out the stub subroutine to grab the mtime of the old file, translate it into the syntax of the touch command, and print it out.

#! /usr/bin/perl -w
    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
        if 0; #$running_under_some_shell

use strict;
use File::Find ();
use POSIX qw(strftime);

# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.

# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;

sub wanted;

# Traverse desired filesystems
chdir("/mnt/home/neufeld") or die "can't chdir: $!\n";

File::Find::finddepth({wanted => \&wanted}, '/mnt/home/neufeld/electronics');
exit;

sub wanted {
    my ($mtime) = (lstat($_))[9];

    print("touch -t ", &touchtime($mtime), " \"$name\"\n");
}

sub touchtime {
    return strftime("%Y%m%d%H%M.%S", localtime($_[0]));
}

I now had a script that would traverse a directory depth-first and generate a series of touch commands to synchronize the modtimes on a copy of this directory. The depth-first search is an old habit from the days of find foo -depth | cpio ... to ensure that directory times were set back to the correct timestamp after filling them with newly-copied files, and shouldn’t be relevant here as changing the mtime of a file doesn’t impact the mtime of the directory it’s in.

Note the quotes printed around the filenames, to protect spaces in filenames from the Unix shell parsing them into separate arguments to touch.

Running this generator script

./gen-fix-elec-mtimes > fix-elec-mtimes

gave me a fix-it script that looks like

touch -t 200706012025.53 "electronics/CNC/drill-platform.odg"
touch -t 200704300628.23 "electronics/CNC/drill-platform.png"
touch -t 200705012105.28 "electronics/CNC/drill-platform.png.odg"
touch -t 200704300626.54 "electronics/CNC/drill-platform.pdf"
touch -t 200705061209.19 "electronics/CNC/drill-platform-front.png"
touch -t 200705061209.18 "electronics/CNC"
touch -t 200703301908.55 "electronics/thermostat/thermostat_schem.png"
touch -t 200705201708.38 "electronics/thermostat"
...
touch -t 200811271510.34 "electronics/eagle/LED cloud/distribution v1.s#1"
touch -t 200811271540.35 "electronics/eagle/LED cloud/distribution v1.b#2"
touch -t 200811271529.14 "electronics/eagle/LED cloud/distribution v1.b#3"
touch -t 200811271458.38 "electronics/eagle/LED cloud/distribution v1.s#3"
touch -t 200811271510.34 "electronics/eagle/LED cloud/distribution v1.b#4"
touch -t 200811271509.45 "electronics/eagle/LED cloud/distribution v1.b#5"
touch -t 200811271458.38 "electronics/eagle/LED cloud/distribution v1.b#6"
touch -t 200811281053.19 "electronics/eagle/LED cloud"
touch -t 201002151808.28 "electronics/eagle"
touch -t 201002151814.36 "electronics"

Out of a combination of paranoia, self-preservation, and reality check, I archived a copy of electronics/ first:

tar czf electronics.tgz electronics

and then

sh fix-elec-mtimes

and got missing file errors about the helsing directory I’d renamed earlier to fix the case conflict. I put that directory name back to the way it originally was on the old HD, reran the script, checked that all the mtimes were right, and rechanged the directory name to helsing/linux.

Perfecto.

I then repeated the paranoid archiving, directory rename, run script, and directory rerename on the MacBook. That fixed its mtimes on files originally from the old workstation hard drive also.

Starting Unison on the MacBook showed everything up to date — a good sign that I hadn’t broken anything.

I now had all the mtimes from the workstation’s old HD fixed everywhere. Good start.

I no longer had access to iBook file mod times; but I still had access to mod times of files originally created on the MacBook, which were fine on the MacBook but out of date in the workstation’s repository.

On the MacBook, the files originally from it were mixed in with all the files copied over from the iBook and from the workstation, so I wanted to separate the wheat from the chaff. (I could have copied all mod times from the MacBook to the repository and simply overwritten with identical timestamps the ones that had already been updated, but that felt sloppy and I’d had enough sloppy for one day.)

Since everything on the MacBook was older than two days and everything copied was newer than that, I could use the ctime (inode change time, not file creation time as occasionally believed) to identify which files came whence.

find electronics -ctime +2

gave me exactly the list of files that existed on the MacBook previously. I used find2perl to translate the -ctime +2 parameters into Perl code (because I’m lazy) and integrated the result into gen-fix-elec-mtimes . I then ran

./gen-fix-elec-mtimes > fix-elec-mtimes-macbook

which generated a short script of touch commands to make the workstation repository match the times of MacBook files. I copied the new script to the workstation, ran it, and now had all mod times synced that were in my power to sync.

Unison Setting for Mod Time Preservation

I added times = true to the MacBook Unison profile, started it up, and found gazoodles of files to be synced.

Unison file synchronization

Of course — the mod times on all the iBook files were a couple of hours apart between the workstation and the MacBook, representing the times they were copied from the iBook to the workstation and the times they were copied from the workstation to the MacBook.

And Unison … was asking me which time it should use. Which means I could have used Unison to update all the times instead of writing my silly scripts. Post-traumatic justification: Well, Unison wouldn’t have updated directory times (because it doesn’t), and my scripts did. So there.

Unison file synchronization

I went through the files, double-checking that nothing (more) unexpected was going on and picking directories and specifying right-to-left (workstation-to-MacBook) synchronization. It didn’t matter a whole lot which mod times I used, but right-to-left updated all the mod times on the MacBook to when the files were copied from the iBook to the workstation rather than the slightly later times when the same files were copied from the workstation to the MacBook.

And now Unison everything is up to date and I think I’m done.

And metadata mistakes aside, Unison is really fabulously easy to use and I’m very comfortable that I can go through and clean up my old project files on either the MacBook or the workstation and sync it to the other, and work on projects wherever I want and keep them in sync across both platforms.

3 Responses to “Folder Structure Standardization and Unison for File Synchronization”

  1. Jon says:

    I have recently discovered Dropbox and have come to love it. Depending on how paranoid you are about having your data living somewhere else, it works great.

    Dropbox is a commercial service that gives you 2 gigs of data storage on their server for free (more at a monthly charge). On each computer you use, you create a Dropbox folder. Anything that is saved into this folder gets replicated to your other computers, and it’s also available by logging into their website so you can access it from any computer you might use. Syncing is automatic and changes from multiple locations are handled gracefully.

    A nice feature is that you can set up sub-directories in the Dropbox that can be shared with other people, either by web access or by setting up Dropbox on their computer. They get no access outside the shared directory and access can be stopped at any time.

    It’s also providing automatic off-site backup of these files and changes are recoverable for 30 days with the free service.

    It’s nice to be able to work on something on the desktop and go downstairs and have it available on the laptop. No need to worry about where it was saved and the files are available on each machine should internet go down.

    It’s available for Windows, Mac, Linux and iPhone. https://www.dropbox.com/

  2. Jin says:

    That’s an incredible amount of work. I didn’t realize Unison was still a going concern. I tried it once but became concerned when it had some problem with some mac filesystem issue or other.

    Did you consider using something like Dropbox? Syncs a local directory constantly with their server. Like the hub and spoke model, with their server as the hub, except you can do it from anywhere you have a network connection. 2GB free, has clients Windows, Mac, and Linux. You can just create symlinks into your ~/Dropbox folder if you want.

  3. Keith Neufeld says:

    Jin, most of the extra work with Unison was because (A) I’m paranoid about data sources I don’t intend to use any more and (B) I messed up my mod times and took a roundabout way of fixing them. The basic Unison installation and synchronization was easy as pie — I’m just not one to sugarcoat the process I went through to get the exact results I wanted. And hopefully I shorten someone else’s path by sharing.

    Regarding Dropbox, I’m aware of a variety of third-party storage solutions. But Jon, you hit the nail on the head with paranoia about having my data living somewhere else — I’m pretty high up on the paranoia ladder and I prefer to retain complete control of my data. I recognize that not everyone would have the same criteria.

Leave a Reply