Anirban

Northern Arizona University

Basic Git concepts explained

This post contains my explanations to some of the basic concepts revolving around Git.

Contents

Git

Git is a piece of software, or more as to generally speak of, a ‘system’ that records changes that we make to our files over time. Its main purpose is to keep track of the changes one makes to their set of files (which they want to be recorded) and to help in different scenarios where this tracking can be essentially helpful. For instance, if the need arises, one can revert to a point in time wherein everything worked fine if the newly updates files had undesired changes.

The second purpose of git is to make collaboration easy and to establish a common ground where all of its users (software developers being the most prominent example) can refer to a codebase and the changes to it at different points, plus make changes to it one by one in a systematic and convenient way. With the use of git, many people can collaboratively work on a project together by sharing and updating their different versions of it.

Repository

A ‘repository’ in general means a place to store stuff (which can be anything - resources, objects, etc.). As a term in Computer Science, it generally means a folder/directory wherein files are kept (i.e. a file storage location). A ‘git repository’ is likewise, a regular file storing folder (think of it as a container), except that it has git initialized and incorporated into it (in a separately sub-directory .git/), which helps to keep track of the changes made to those files over time.

Remote

Remote, in general, refers to something distant or far away, and its meaning in context to git repositories also shares the same sense, i.e. to say remote here is a git repository which is hosted on GitHub’s servers somewhere in the world, distant and separate from the local repository which resides on our computer. It is an online version of a repository and can have different changes uploaded therein with respect to the version present on one’s local machine. (which can then be pulled locally by using the git pull command, or on the contrary, the local changes can be pushed there onto GitHub by using git push to keep the online version updated)

Branching

‘Branching’ in general refers to the act of separating into different pathways, from a single one. In git too, the same concept applies, wherein one initially has a single stream of development (for the first/initial branching, this has to be the main branch, which was previously known as master) or a ‘branch’, and based on the user’s need, this is separated into another instance (or rather a copy) of this developmental branch at a specified point of time. (likewise, the created branch can be further segregated into different branches, with a parallel workflow) This tends to be crucial if one would like to have a clean, working (bug-free, or at least without ones that might be introduced with updates) copy of the main branch, with major changes (such as features) isolated into different branches.

I find branching to be an interesting phenomenon that can be applied everywhere, right from decision-making to fictional time travel theories (here’s something that I wrote along the lines of it!), with ample examples floating around such that everyone can understand and relate to within their domain.

Merging

‘Merging’ in general refers to the act of combining two different pathways into one. In git, this can be thought of as taking the independent lines of development created by branching at one point (the time their histories diverged from the current branch) and integrating them into a single branch (i.e. the current branch, with the other being left unaffected).

Merge conflicts

It’s not always expected for an integration or difference-incorporating process to go smoothly - for example, opinions may overlap. Likewise in git too, when different changes are made to the same file or the same lines within those in two separate branches, then merging them isn’t something that git can do without your help, since it has to know which of the updates to the copy is correct, and hence it creates a merge conflict which one needs to resolve, by going with either of the choices. To avoid this, it is important for contributors to keep track of overlapping updates, push/pull often or separately work, with different code segments/files assigned to each contributor/collaborator.

Pushing commits via a pull request is a one-way street, but has two scenarios depending on your access - working on a forked repository or having direct access to the main repository (collaborator/owner) and pushing without a maintainer’s ratification. Either way yields the same, but it is in most cases, a must to use a different branch and then to send a pull request with working changes from there to the main branch. This avoids breaking deployments in the core development branch while you are still experimenting with stuff and yet to make your changes work.

The three git states

A file or a set of files can have one among three states in the world of git:

  • Modified:
    As the name suggests, the files here at this state are recognized by git to be modified, i.e. to have some changes with respect to the previous git snapshot (which is a saved instance of a commit object containing all the user-added changes at one point that git saves locally - think of it like a save point in a game!), but they are not tracked by git to have them staged yet, and then to subsequently be ready to get committed (pertains to the next two states in order).
  • Staged:
    Being ‘staged’ refers to the state of the files wherein git not only notices the differences in between but also keeps track of the files in a staging area, alternatively referred to as the ‘index’ (interchangeable terms). One needs to explicitly mark the modified files ready (usually in anticipation to include those changes into the next commit snapshot) via use of the git add command, which literally adds the files to the index, which again to reiterate in plain words, is an area where the files tracked by Git are kept for the moment, which can later either be unstaged (leading back to the modified state, with untracked changes) or committed (leading to the next state in order, with changes being committed, or saved in a way).
  • Committed:
    This is the final state right after staging (which in most cases, tends to be a checkpoint before proceeding to save the changes locally), with all the staged files (having the applied modifications) being saved (and safely stored in a local database maintained by git, which can be pushed onto a remote on GitHub) with a distinct commit ID given by a unique SHA-2 hash, which labels the commit in the commit history (which holds different snapshots of the committed files from previous commits).