Understand Version Control (1) — Repository, Commit, Branch

Domi Yan
4 min readAug 31, 2020

Version control systems (VCS), also known as SCM (Source Code Management), are tools to help software development teams to manage changes to source code over time. It is the one tool every software developer must deal with in their daily work.

This series is written to help beginners getting started with VCS (which can be hard!). While I am not going to focus on specific tool’s usage, whenever needed, I will use terminology from git. If your company uses a different tool, don’t worry, the concept is similar. If you can understand the problem in git, you will master all version control tools.

In the first article, we will start with 1. how code changes are stored (repository), 2. how each code change is recorded (commit), and 3. how to create separate development workspaces (branch). Hopefully, this can give you a simple and clear mental model of VCS.

Repository

When dealing with version control, we are interacting with a repository — a data structure specifically designed for storing and tracking source code changes. It is created at the starting point of a project. Here I will describe it in a way that is clear enough for most developers to have their everyday job done.

A repository has a tree-like structure, it grows as developers make changes to the source code. Often, when people discuss VCS related topic, they draw a graph like the following:

Visualization of a Repository

In this graph, each node represents a record code change. In git, it’s called a commit.

Commit

A commit tracks a newly added change to the source code. We can view it from 2 perspectives:

  1. An incremental change: A commit represents modifications based on its parent node. By looking at the content of a commit (in diff format), we can tell exactly what’s changed.
  2. A snapshot of the source code: If you trace the tree from the root node to the one and adding all incremental changes together, you get a snapshot of the source code at that moment.

Let’s look at an example:

commits contain incremental code change information

Each commit in the graph represents an incremental change, this is helpful for developers to identify a specific change. For example, “bar” is added in “commit 1”. Also, the snapshot of codebase after “commit 1” is available. Developers sometimes want to “go back” to the point before/after specific commit for investigation/debug purpose.

A commit node can have more than one child node (as we can tell in the graph in the “repository” section). Different incremental changes are applied to the source code at the diverge point and creat different source files. In git, we described such diverge behavior as creating a new branch or branching.

Branch

A new branch is created when a diverge happens in the repository tree. From this moment on, modifications can happen parallel along multiple branches. Usually, there is a default branch called “master branch” (in git). You can create a new branch off an existing node in the tree. The following graph demonstrates a repository that has 3 branches. Branch “Dev” is branching off “master branch” and branch “Test” is branching off “Dev” branch.

Branching

What’s the motivation for branching? Fundamentally, you are duplicating the source code and develop it in a “new place” where new commits by others to the original branch won’t show up. There are two common scenarios where this is needed:

1. “Isolate” your development. When you start modifying source code, you want to develop on something stable at least for a short period of time and isolate your commits from others’. When you finish your development, you can merge back your change to the original branch. The behavior looks like:

Branching off and merge back

In the merge process, you will create a new commit, “bring” your change back to the original branch and make sure your change can work together well with other new changes.

2. Create a “release build”: you have probably heard of software version names — alpha version, beta version, release v 1.0, release v 2.1, etc. During the product development/release cycle, at some point (like one month before the release), branching happens to create a more stable build:

Branch off to create a release build

After branching, only release specific changes will be committed to the new branch. No merge operation will happen to the branch.

Summary

In this article, we learned the basics in version control system:

1. All code changes are stored in a tree-like data structure call repository.

2. Each node in the “repository tree” is a commit that records a code change. Historical commits can help developers to identify a change and revert codebase to a specific moment.

3. Branches are created when diverged commits happen. It allows software to be developed in parallel by different individuals. It also occurs when a release build needs to be created.

--

--