Marko Apfel - Afghanistan/Belgium/Germany

Management, Architecture, Programming, QA, Coach, GIS, EAI

  Home  |   Contact  |   Syndication    |   Login
  187 Posts | 2 Stories | 201 Comments | 4 Trackbacks


Twitter | LinkedIn | Xing


Post Categories



Enterprise Library


SQL Server


In too many projects I saw a chaos to store different versions of documents (in case these documents are edited over the time and older versions were kept for later analysis).

Each colleague has an own understanding how to name documents and how to add information about the status/version of the document. So it is not seldom to see timestamps as pre- or suffixes, user and user-acronyms as suffixes, and version numbers as suffixes. It gets really interesting if multiple users co-work on the same document and introduce their own naming guideline. After a few versions you are completely lost to know which document is the most recent version. Sometimes you can hope that sorting for timestamps of OS (Date modified) in the File Explorer leads you to the right one. But too often somebody opens a documents and changes by coincidence something inside (e.g. auto fields, like dates) and confirms the save question during closing with Yes. Then you have an older version with a newer timestamp.

Normally we use the compressed (without the dashes) ISO timestamp format as a prefix, but too often that rule was initially broken and later adjusted. So it leads to folder contents like that one


The first two files don’t follow the rule.

Imagine now the situation, that a lot of other and different files are inside the folder. It is almost impossible to recognize what belongs together. It get even more difficult, if the file name changed in the mean time.


I’m not happy with all the different versions inside the folder and I’m also not happy that we enrich the file name to have version information inside. But it gives the benefit, that with one view you immediately now how from when the document is – notably after sending the file between different parties.

Some years ago I tried to address that problem with Git – having only one file to each topic inside the folders and get the older versions out of the repository in case of necessity. But during that time (2011??) it was not handy. I cannot exactly remember the issues, but it was not worth to introduce that.

But yet the situation changed. Git can recognize the renaming of these doc files (e.g. if a new timestamp is prefixed or versions is suffixed) and contemporaneously track the changes inside these Word documents.

So I gave it a try to add these different versions each by each to the repository to fake the evolvement of the past. To be precise: what normally happens during work with the document in an ideal world (edit, rename, commit changes) I would do in a few minutes. Important was that it keeps to things historical together – and not to get individual commits or independent versions of the files.


To bring all the already created versions in a meaningful versioned way to a repository, I did the following.

  1. Move all the files to a temporary folder
  2. Run git init in the empty folder to create a local repository here
  3. Move the first version of the file (Folgeprojekt_Leistungsbeschreibung V1.doc) into that folder
  4. add it to the index (git add Folgeprojekt_Leistungsbeschreibung\ V1.doc)
  5. Commit that change set (git commit –m “initial commit with statement of work for the follow-up project, version 1”)
  6. Delete this file
  7. Move the second version into that folder
  8. Run git status to see the new (for the moment untracked) file and the deleted (already tracked) old/previous one
  9. add the new one to the index (git add Folgeprojekt_Leistungsbeschreibung\ V2.doc)
  10. add the deletion of the previous version to the index (git add Folgeprojekt_Leistungsbeschreibung V1.doc)
  11. run git status to see that Git is able to recognize the renaming
  12. Commit that changeset (git commit –m “add statement of work for the follow-up project, version 2”)
  13. Repeat the steps 6 till 12 for all the other versions

In the log you can see now all the individual changes in one line of the history of that file.

e.g. the diff between the version 1 and 2 shows this:


I guess, that via the similarity index


Git was able to understand, that a deleted and new file is only a new version – a renaming here.


So Git can create the history line of a given file. I was really surprised to see that, because the doc format is a binary and it needs some additional steps in the background to get that understanding.

Of course you see also the individual changes of the file



I tried this also with the docx Word format. It works in the same way as described above.



Given these capabilities in future projects I enforce the team to keep only the most recent version of a document in a folder. All versioning is to be done with the repository. This gives clean folders where you immediately get an overview about the distinct documents and it avoids the situation where you have to open multiple documents to understand the evolvement of a document over the time.

And additionally you have the opportunity to follow the evolvement by diffing the individual versions. With hopefully good commit messages you have another information what happens from one version to the next one.

posted on Wednesday, June 24, 2015 4:06 PM