As I mentioned in my article on filing with a virtual assistant, I need to preserve the timestamps from the original PDF so I can quickly locate the original documents if necessary.
You’ll recall the originals are sitting in a manilla folder, sorted in the order they were scanned. If I preserve the timestamp from the original batch, I can quickly locate the original (“hmm, this document is from a batch about a third of the way in…”). If not, I’d have to wade through the entire folder.
In theory, I could physically file the originals in some appropriate semantic order e.g. date and sender. In practice, I seem to lack some necessary gene. Hence: virtual filing.
So, how do we preserve the time stamps?
On a UNIX-like system, including Mac OS X, you can use the ‘touch‘ command to copy the timestamps from the original big PDF to each little PDF. On Windows, you could install cygwin and use the same command.
Instead of cygwin’s touch, I’ll probably write a Python script to perform the time stamp management and a little more of the heavy lifting, e.g.:
- Create a subdirectory named after the original PDF
- Move the original PDF into it, renaming it ORIGINAL
- Mark the original read-only
- Wait for the user to declare s/he’s finished
- Clone the original’s time stamp to all the files
- Zip the files up
- Delete the subdirectory
If I do write such a script, I’ll publish it on bitbucket as I write it.
One of the many tasks before me is to sort through a stack of paper taller than a child. I need it all filed and searchable so I can finish various important tasks. I’ve been neglecting these tasks out of fear the paper would fall on me and crush me to death.
As I mentioned before, I’m working with a virtual PA to deal with this backlog of open loops. They need the files to do the work. So, how do I get the files to them?
I could ship the lot to Pakistan — I bet it’s been done — but I’m not going to. Instead I’m using my Fujitsu ScanSnap S300M, Dropbox, and some PDF splitting software (more on that in a future article, perhaps).
Buying the kit isn’t the whole story, however. You have to figure out what professionals photographers call “the workflow”. Who does what, and when? Where does the process back up? Where can it fail?
The good news is, I’m figuring a lot of this out.
For the first two batches, I individually scanned my documents and asked my PA to fill in a Google Docs spreadsheet with the sender, date, and recipient of each document. I then used the spreadsheet to construct a script to rename each file, and ran the script.
For the third batch, I asked my PA to rename each PDF in a ZIP file to reflect its sender, date, and recipient as above.
Since then, I’ve been able to:
- Scan a wad of documents
- Save the PDF to Dropbox
- Put the originals in a manilla folder labelled “Scanned yyyy-mm”
- Have my PA split the big PDF into individual documents and name each piece appropriately
I’m glad I tried the gentle approach of bringing my PA up to speed. We had a few hiccups in the first three batches. The results would have been awful if I’d dumped him in the deep end with a scan of dozens of documents.
There are a few post-processing tasks I still have to do, both of which I’ll be able to get my PA to do:
- Set the timestamps on each PDF to match the original batch
- File the results by copying the PDF into my Big Bucket o’ Filing
I need to preserve the timestamps so I can locate the original documents if necessary. You’ll recall they’re all sitting in a manilla folder, sorted in the order they were scanned. See my details on time stamp preservation article for… well, more detail.
All told. I’m sure my virtual PA and I will soon get the the knack of virtual filing. All I have to do now is figure out what do do about my privacy concerns…
