3 minutes
Enhancing Workflow Efficiency: Practical Git Strategies in Action
As a data scientist, I regularly harness Git for code reviews and collaborative projects. While the basic commands like git add .
, git commit -m
, git push
, and git pull
are foundational, the scenarios that require them can often become intricate. In this blog, I will unravel some of the more complex Git commands through vivid, real-world examples. Let’s dive in.
TL;DR
-
git commit --amend [--no-edit]
: Seamlessly incorporate changes into your last commit, eliminating the need for a new one. Opt for--no-edit
to retain the original message. -
git reset --soft HEAD~N
: Roll back the last N commits, opting for--soft
to keep your changes ready in the working directory, or--hard
to wipe them clean. -
git restore --staged
: Revert files you’ve staged to their last committed state, perfect for undoing a prematuregit add
. -
git rebase
: Streamline your commit history by editing and consolidating past commits into a clearer narrative. -
git cherry-pick
: Extract a specific commit from one branch and integrate it directly into your current working branch. -
git stash
: Temporarily set aside the uncommitted changes and make a clean working directory for something else.
Jessy and Victor: Collaborative Code Development
Now, let’s bring these commands to life through the experiences of Jessy and Victor, two characters who will show us the ropes of advanced Git usage in their day-to-day coding adventures.
Jessy and Victor are team members in a data science project focusing on developing a forecasting model. Jessy is managing the codebase and branches while Victor assists with code reviews and implementing best practices. Currently Jessy and Victor work within an organized repository that consists of three distinct branches, each tailored for specific aspects of their project:
dataprocessing
: This branch is dedicated to developing libraries and scripts for data collection and compilation. It serves as the backbone for handling data, ensuring that everything from raw input to processed information is managed smoothly.modeltraining
: Focused solely on crafting the model training scripts, this branch is where the magic of machine learning happens. Here, Jessy and Victor transform theoretical data models into practical, trainable algorithms.mainline
: The hub of their collaborative efforts, the mainline branch is where all completed work converges. After rigorous code reviews to ensure quality and functionality, changes fromdataprocessing
andmodeltraining
are meticulously merged here, forming the final version of their project.
% git branch
* dataprocessing
mainline
modeltraining
Now, let’s explore some real-life scenarios where Jessy and Victor use Git commands to manage their workflow effectively.
Scenario 1: Refining Commit History
Context: Jessy was working on the dataprocessing
branch to refine the data cleaning steps. After making a commit, she realized she missed including a new data filter that should be part of the previous commit.
Command: git commit --amend --no-edit
- Jessy adds the new filter to the script and stages the changes with
git add .
. - Instead of creating a new commit, she uses
git commit --amend --no-edit
to add these changes to the previous commit without altering the commit message.
Benefits:
-
Keeps the commit history clean and meaningful.
-
Avoids cluttering the history with minor fixes or missed changes.
Scenario 2: Reverting Multiple Commits
Context: Victor notices that the last three commits to the modeltraining
branch introduced errors that affect the model’s performance. Jessy decides to revert the changes to a point where the model was stable.
Command: git reset --soft HEAD~3
- Jessy uses
git reset --soft HEAD~3
to undo the last three commits but keeps the changes in her working directory for review and possible corrections.
Benefits:
- Allows Jessy to rapidly backtrack while retaining the ability to analyze and integrate parts of the undone changes selectively.
- Facilitates quick correction without losing all recent work.
Scenario 3: Cleaning Up Staging Area
Context: Jessy wants to submit a new code review for a new feature engineering script. But she accidentally stages several other files that were not meant to be committed to this code review.
Command: git restore --staged <file>
- Jessy uses
git restore --staged filename
to unstage the specific files, reverting them back to the last committed state without affecting the changes in her working directory.
Benefits:
- Provides a safe way to undo mistakes in staging without losing ongoing work.
- Enhances control over what gets committed, maintaining a clean and relevant commit history.
Scenario 4: Streamlining Commit History
Context: After several iterative improvements in the dataprocessing
branch, Jessy’s commit log becomes cluttered with minor updates and fixes.
Command: git rebase -i HEAD~5
-
Jessy selects the last five commits for interactive rebase and uses options like squash and reword to streamline and clarify the commit messages.
- She will keep the first “pick” as
pick
(p
) but change the rest of themsquash
(s
)
p bcy0262 update configuration settings s ase1356 update features generation libs; s 743c258 update labels creation libs s b1d1079 update features engineering stages; s 403c753 update random seeds
- Then she will work in an interactive console to edit/modify the commit messages and combine them into a single commit.
- She will keep the first “pick” as
Benefits:
- Helps maintain a logical and concise commit history, making it easier for anyone reviewing the history to understand the development process.
- Reduces complexity in tracking changes and finding specific updates.
Scenario 5: Incorporating Specific Changes
Context: Victor has developed a new feature in his dataprocessing
branch and after code review with Jessy, they decided to first merge this feature into the mainline
because it would be beneficial for feature work as well.
Command: git cherry-pick <commit-hash>
-
Victor uses
git cherry-pick
to apply this specific commit directly to the mainline branch without merging all changes from his branch.git checkout mainline git pull git cherry-pick f17daf74
Benefits:
-
Allows selective integration of changes, providing flexibility in managing features across branches.
-
Useful for pulling in hotfixes and individual features without the need for a full branch merge.
Scenario 6: Managing Uncommitted Changes
Context: Jessy is currently working on optimizing a script in the dataprocessing
branch when she receives an urgent request to fix an issue in the mainline
branch. However, she’s not ready to commit her current changes.
Command: git stash
-
Jessy uses
git stash
to temporarily store her uncommitted changes, allowing her to switch branches without losing her work.git stash push -m "Optimize data processing script" git checkout mainline
-
After resolving the issue in
mainline
and merging the fixes, Jessy returns to thedataprocessing
branch and applies her stashed changes withgit stash pop
to continue her work on optimizing the script.git checkout dataprocessing git stash pop
Benefits:
-
Allows Jessy to quickly switch between tasks without committing half-done work, which keeps her commit history clean.
-
Helps in managing context switches more efficiently, reducing the risk of losing progress on current tasks.
I hope this post has provided you with some compelling examples of Git in action. Through the lens of Jessy and Victor, I tried to illustrate how Git can significantly enhance work efficiency in our day-to-day activities. Whether you’re managing complex projects or simple tasks, the right Git strategies can streamline your workflow and boost productivity.