If you are a developer, you may be acquainted with Git technology and you may be using it to some extent in your daily activities. If you are an IT professional who spends most of your time playing with infrastructure issues, you may not have had a chance to use it yet.
Nowadays, some of the hottest topics in our industry are cloud and DevOps, thus understanding how Git works will definitely help us to understand in more detail how this technology is being used in those realms.
More and more IT pros are moving toward the uncharted waters such as infrastructure-as-a code (IaaC), scripts, automation and so forth. It is common to see an IT pro, like the humble author of this article, using Visual Studio to deploy Azure Infrastructure, use of version control in scripts, and many other scenarios.
To top it all off, Microsoft recently acquired GitHub, a software-as-a-service (SaaS) solution that integrates with Git to synchronize and keep consistency and empower teamwork to collaborate and develop code faster than ever before.
What is Git, by the way? Git is a version-control tool that is lightweight and it was created to support the Linux kernel maintenance. Among a vast number of contributors, the software was created by Linus Torvalds (yep, the guy who created Linux) and since its inception (2005), it has been used in development projects of all shapes and sizes.
You may be saying, “I’m an IT pro! There is nothing for me here, right?” Well, using Git capabilities helps virtually any professional who needs to implement version control and want to be organized. It is not just for developers — you may even want to use for your documentation files. So the short answer is, there are a lot of things that you can take advantage of using Git and it does not hurt understand a pretty cool technology, does it?
In this article, we will focus on the basic concepts to understand Git In our next article we will use Git with some PowerShell scripts that we are developing to demonstrate scenarios where Git can be useful even for local version control. Later on we will finish up with an article here at TechGenix covering the integration between Git and GitHub.
First and most important, Git has a local repository/database to keep track of the changes and that is done per directory. Every time that we initialize Git in any given folder, a folder structure will be created to support the versioning of that folder and subfolders (we can see the structure in the image below). Everything in Git uses SHA-1 hash, which is 40 characters long, and the hash is based on the content of the file and we will see those hashes all over the place.
The second important point is to understand the state of any given file: They can be in three different states:
- Committed: Data was saved in the local database — Git has it covered, don’t worry about this one.
- Modified: The file was modified, but it is not in the local database or staged. It means that there is change in the file, and we have to take an action if we want to keep it in our version control.
- Staged: These are the files that you defined the modified file to be part of your next commitment. If the file is staged, the snapshot is already there just waiting for the commit process.
It is important to understand that the information saved in the local databases are snapshots of the files. If the file wasn’t changed between commits, then the file is not copied over again — just a reference to the non-altered file will be used. All the consistency to check if the file was modified or not is based on the checksums that it is performed.
When a commit operation is executed, a new blob will be created for that specific commit operation, which contains the tree reference (it may have reference for previous commits, and they are referred as parents), another blob for the tree itself (and it contains links of all the blobs for every single file being part of this current commit), and one blob for each file part of the commit. All that information is using the checksum.
A third important item in the Git universe is the branch feature. Using branch we can keep a mainstream of development and allows several development lines to diverge from this main line and that can be used to test/validate new features, fix issues, and so forth.
A branch is just a different pointer and we can switch between branches as we wish. It will impact the content and files of our folder that is being controlled by Git. By default, any new Git repository will have a master branch where is the location where all the mainstream changes are occurring, but we can create different branches to tackle different areas of development and merge them back to the master later on.
Git has a special pointer called head, which helps us to understand where the current branch is located.
How do we work with Git?
These are a few ideas about the key principles of Git, but I do understand that you haven’t yet touched the command prompt to test it out. Before going there, we need to understand the process that a regular user will use locally.
- You have worked on files and changes were made.
- You stage the files manually using the Git command line.
- You commit the staged files into the local database.
These are the basic steps to work with Git, and having those concepts understood makes it easier to start going on to more advanced uses and creating different branches, merging them afterward, integrating with GitHub, and so forth.
I know that we haven’t had a chance to have a lot of action in this article but stay tuned. Our next article will involve several scenarios using Git to help our productivity and organization of scripts. But keep in mind that it can be used for a variety of things, not just scripts.
Featured image: Freerange Stock