Unless you've been writing code for decades, you've likely never used a VCS1 besides Git. However Git is far from being the first important VCS.
In this post, we trace the history of VCS, leading up to the creation of Git.
The Earliest Forms of VCS: Manual
In the early 1970s, developers managed versions by manually storing copies of their files. If they thought an iteration was significant, they’d save a copy. There was no standard practice, and every developer had a personal technique.
As computers got more capable, software got more complex, and so did the process of creating it. Developers iterated more, and they needed dedicated tools to be able to store and retrieve past versions of their code.
The First “Proper” VCS: SCCS and RCS
The first formal VCS was SCCS2, launched in 1972 by Marc J. Rochkind at Bell Labs. It pioneered many concepts that we take for granted today.
For example, SCCS was the first to introduce automatic revision tracking. Developers could now write a command, and trigger automatic storage of the current version of their file. They could trigger another command to retrieve a past version. Later versions of SCCS supported branching and merging as well.
SCCS was a giant leap in the developer workflow.
Most unique among its features, was its storage mechanism. Instead of saving full duplicate of the file being versioned, SCCS stored just changes. It treated the first version of a file as the base and stored it in full. For subsequent updates, it stored only the changes as “deltas” within a file internal to SCCS.
This technique was great for optimising file storage.
As the tool got more usage, it became evident that the delta storage technique SCCS employed wasn’t very ideal for performance. To reconstruct the file for any given version, it had to start with the base version, and then apply deltas sequentially. It’s also why it’s called "forward delta".
Forward delta was like trying to assemble a jigsaw puzzle from scratch each time you want to see the complete picture. For common usage patterns, it was just inefficient.
Ten years later in 1982, Walter F. Tichy of Purdue University launched RCS3 — a more performant alternative to SCCS.
Walter retained the idea of delta storage, but flipped the mechanism with its “reverse delta” method. Instead of using the first version as base, he smartly chose the latest version. RCS deltas compared changes compared to the latest version of the code.
This was a lot more efficient for common tasks, since developers were rarely going too far back in their file history. In addition to improving performance, RCS also improved on the user experience with simpler commands, and introduced a branching and merging system that was more intuitive for developers in that era.
CVS: RCS, but better
As programs continued to get more complex, SCCS and RCS were no longer able to meet the needs of developers. Both worked best for projects on a single machine. Both could version just individual files, not entire projects. When a file was being edited by one developer, it was locked to prevent getting edited by others simultaneously on the same network.
To overcome these shortcomings, Dick Grune launched CVS4 in 1986, while working at the Dutch University Vrije Universiteit.
CVS still used RCS on the back-end, but added multi-file support, and introduced a client-server collaboration model. This meant that the codebase could now be stored on a central server and accessed by developers from different locations, facilitating collaborative work. In its heyday, CVS was a significant leap forward.
The 1990s: The Era of Proprietary VCS
In the mid-1990s, the software world was saw many new proprietary VCS enter the scene and innovate in different areas.
ClearCase by Atria Software introduced a robust branching model, and the ability to configure workflows to the preferences of individual teams within enterprises. SourceSafe by One Tree Software was known for its user-friendly interface, and got acquired by Microsoft. Perforce was known for its exceptional performance.
As for the open source VCS ecosystem, it was mostly quiet.
SVN and BitKeeper: Centralised vs Distributed
In 2000, BitKeeper5 arrived on the scene. Although proprietary, it would go on to be critical in the Git story. Also in 2000, work started on SVN6, which would go on to be the next major open source VCS after CVS.
SVN was created by CollabNet Inc. with key contributions from Karl Fogel and Jim Blandy. It was designed to overcome the shortcomings of CVS, and got very popular in the open-source community. Some developers use it to this day.
A key idea SVN introduced was that of atomic commits. With CVS, when a commit operation failed halfway, it would lead to data getting corrupted. Part of the files would get saved, and part of them would be lost. SVN’s commits took an all or nothing approach, ensuring reliability.
SVN also innovated in its data storage mechanism. In CVS, when you created a branch, it essentially made a full copy of all the files from the original branch at that point in time. This would consume a lot of storage space, especially for large projects. When you created a branch in SVN, it didn’t duplicate all the files. Instead, it created a new directory for the branch that initially only held references to the original files. As changes were made in the new branch, SVN would store only those changes.
SVN was centralised, like all other VCS up to that point. Developers would write their code locally, but they had to depend on a central repository for versioning information. Operations such as viewing history, committing changes, or creating branches required connectivity to the central server, and weren’t possible offline.
BitKeeper is significant for being the first popular distributed VCS. Unlike SVN, it allowed developers to have full local copies of the projects they were working on, which was a novel concept at the time. While SVN’s restrictive approach was seen as an advantage in some corporate settings, BitKeeper’s distributed model was more naturally suited for the nature of open source collaboration.
For these reasons, it enjoyed adoption from its most notable user, the Linux Kernel Project.
The Linux Kernel Controversy
Technology-wise, BitKeeper was perfect for the kernel project. However, it wasn’t open source software. While Linus Torvalds liked BitKeeper and adopted it, he couldn’t continue to use it due to backlash from the kernel core team.
Using a proprietary system was seen as misaligned with the principles of a flagship free project. There were other VCS available at the time, but none that met the performance requirements for a project as complex as the Kernel.
Despite not actually wanting to7, Linus decided to write his own tool, and ended up launching Git within two weeks. He described git as "the stupid content tracker". 8
The rest, as we know, is history.
Special thanks to Tyler Von Harz for his contributions to this article.
RCS (Revision Control System) was a performance improvement over SCCS.
Also see Purdue RCS Homepage.
In this interview, Linus Torvalds shares how he didn’t actually have any interest in writing a new version control system, but was forced to, because nothing else met the needs of the Linux Kernel project at the time.
Quoting Linus: "I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'Git'". "git" can mean anything, depending on your mood:
Random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
Stupid. Contemptible and despicable. Simple. Take your pick from the dictionary of slang.
"Global information tracker": you're in a good mood, and it actually works for you. Angels sing and light suddenly fills the room.
"Goddamn idiotic truckload of sh*t": when it breaks