As a data scientist, I regularly harness Git for code reviews and collaborative projects. While the basic commands like git add ., git commit -m, git push, and git pull are foundational, the scenarios that require them can often become intricate. In this blog, I will unravel some of the more complex Git commands through vivid, real-world examples. Let’s dive in.

TL;DR

  • git commit --amend [--no-edit]: Seamlessly incorporate changes into your last commit, eliminating the need for a new one. Opt for --no-edit to retain the original message.

  • git reset --soft HEAD~N: Roll back the last N commits, opting for --soft to keep your changes ready in the working directory, or --hard to wipe them clean.

  • git restore --staged: Revert files you’ve staged to their last committed state, perfect for undoing a premature git add.

  • git rebase: Streamline your commit history by editing and consolidating past commits into a clearer narrative.

  • git cherry-pick: Extract a specific commit from one branch and integrate it directly into your current working branch.

  • git stash: Temporarily set aside the uncommitted changes and make a clean working directory for something else.

Jessy and Victor: Collaborative Code Development

Now, let’s bring these commands to life through the experiences of Jessy and Victor, two characters who will show us the ropes of advanced Git usage in their day-to-day coding adventures.

Jessy and Victor are team members in a data science project focusing on developing a forecasting model. Jessy is managing the codebase and branches while Victor assists with code reviews and implementing best practices. Currently Jessy and Victor work within an organized repository that consists of three distinct branches, each tailored for specific aspects of their project:

  • dataprocessing: This branch is dedicated to developing libraries and scripts for data collection and compilation. It serves as the backbone for handling data, ensuring that everything from raw input to processed information is managed smoothly.
  • modeltraining: Focused solely on crafting the model training scripts, this branch is where the magic of machine learning happens. Here, Jessy and Victor transform theoretical data models into practical, trainable algorithms.
  • mainline: The hub of their collaborative efforts, the mainline branch is where all completed work converges. After rigorous code reviews to ensure quality and functionality, changes from dataprocessing and modeltraining are meticulously merged here, forming the final version of their project.
% git branch
* dataprocessing
  mainline
  modeltraining

Now, let’s explore some real-life scenarios where Jessy and Victor use Git commands to manage their workflow effectively.

Scenario 1: Refining Commit History

Context: Jessy was working on the dataprocessing branch to refine the data cleaning steps. After making a commit, she realized she missed including a new data filter that should be part of the previous commit.

Command: git commit --amend --no-edit

  • Jessy adds the new filter to the script and stages the changes with git add ..
  • Instead of creating a new commit, she uses git commit --amend --no-edit to add these changes to the previous commit without altering the commit message.

Benefits:

  • Keeps the commit history clean and meaningful.

  • Avoids cluttering the history with minor fixes or missed changes.

Scenario 2: Reverting Multiple Commits

Context: Victor notices that the last three commits to the modeltraining branch introduced errors that affect the model’s performance. Jessy decides to revert the changes to a point where the model was stable.

Command: git reset --soft HEAD~3

  • Jessy uses git reset --soft HEAD~3 to undo the last three commits but keeps the changes in her working directory for review and possible corrections.

Benefits:

  • Allows Jessy to rapidly backtrack while retaining the ability to analyze and integrate parts of the undone changes selectively.
  • Facilitates quick correction without losing all recent work.

Scenario 3: Cleaning Up Staging Area

Context: Jessy wants to submit a new code review for a new feature engineering script. But she accidentally stages several other files that were not meant to be committed to this code review.

Command: git restore --staged <file>

  • Jessy uses git restore --staged filename to unstage the specific files, reverting them back to the last committed state without affecting the changes in her working directory.

Benefits:

  • Provides a safe way to undo mistakes in staging without losing ongoing work.
  • Enhances control over what gets committed, maintaining a clean and relevant commit history.

Scenario 4: Streamlining Commit History

Context: After several iterative improvements in the dataprocessing branch, Jessy’s commit log becomes cluttered with minor updates and fixes.

Command: git rebase -i HEAD~5

  • Jessy selects the last five commits for interactive rebase and uses options like squash and reword to streamline and clarify the commit messages.

    • She will keep the first “pick” as pick (p) but change the rest of them squash (s)
    p bcy0262 update configuration settings
    s ase1356 update features generation libs; 
    s 743c258 update labels creation libs
    s b1d1079 update features engineering stages;
    s 403c753 update random seeds 
    
    • Then she will work in an interactive console to edit/modify the commit messages and combine them into a single commit.

Benefits:

  • Helps maintain a logical and concise commit history, making it easier for anyone reviewing the history to understand the development process.
  • Reduces complexity in tracking changes and finding specific updates.

Scenario 5: Incorporating Specific Changes

Context: Victor has developed a new feature in his dataprocessing branch and after code review with Jessy, they decided to first merge this feature into the mainline because it would be beneficial for feature work as well.

Command: git cherry-pick <commit-hash>

  • Victor uses git cherry-pick to apply this specific commit directly to the mainline branch without merging all changes from his branch.

    git checkout mainline
    git pull
    git cherry-pick f17daf74
    

Benefits:

  • Allows selective integration of changes, providing flexibility in managing features across branches.

  • Useful for pulling in hotfixes and individual features without the need for a full branch merge.

Scenario 6: Managing Uncommitted Changes

Context: Jessy is currently working on optimizing a script in the dataprocessing branch when she receives an urgent request to fix an issue in the mainline branch. However, she’s not ready to commit her current changes.

Command: git stash

  • Jessy uses git stash to temporarily store her uncommitted changes, allowing her to switch branches without losing her work.

    git stash push -m "Optimize data processing script"
    git checkout mainline
    
  • After resolving the issue in mainline and merging the fixes, Jessy returns to the dataprocessing branch and applies her stashed changes with git stash pop to continue her work on optimizing the script.

    git checkout dataprocessing
    git stash pop
    

Benefits:

  • Allows Jessy to quickly switch between tasks without committing half-done work, which keeps her commit history clean.

  • Helps in managing context switches more efficiently, reducing the risk of losing progress on current tasks.

I hope this post has provided you with some compelling examples of Git in action. Through the lens of Jessy and Victor, I tried to illustrate how Git can significantly enhance work efficiency in our day-to-day activities. Whether you’re managing complex projects or simple tasks, the right Git strategies can streamline your workflow and boost productivity.

advanced-git-in-action