This is a retelling of a presentation I gave at work. In it, I describe a mechanism I've started using to raise the quality of artifacts I check into version control.

👍

Something about me: I expect a lot from myself. Some who know me would say I'm a perfectionist... but what do they know? 😜 I put a lot of effort into the artifacts I produce and so I expect them to hold up when I release them. That being the case, there is a process I frequently interact with which makes me anxious.

ðŸĪĒ

git commit

This little command takes what I've created and commits it to a permanent record. Now, I know that Git allows modification of the commit history, but I haven't yet become proficient enough to do much more than rebase or git commit --amend. So in my mind, with few exceptions, git commit is permanent.

But wait, there's more.

ðŸĪŪ

git push

This is where the rubber really meets the road. This takes what I've created and shares it with others. With my customer. With my colleagues. With the public. It's at this point that I really want my artifacts to be high quality.

âģ

Now even if you don't share my character trait of *ahem*, so-called perfectionism, there is something we do share: our time is valuable. I can burn a lot of time waiting for the deployment pipeline to cycle before I find out if my code or infrastructure-as-code (IaC) template has errors in it. And while I wait, I likely move on to the next task and am interrupted when the deploy process reaches its conclusion which takes me out of my flow state in the new task. So it's slow and interrupts me. Double. Fail.

🌟

So the goal is clear: I want a mechanism to ensure the artifacts I'm stuffing into the repo are high quality and I want the mechanism to be fast and low friction. I want to raise the bar in how I uphold the quality of my artifacts without adding a ton of stress or cognitive load.

⚙ïļ

First, a short description of a feature in Git that will help: hooks. Quoting githooks.com:

Git hooks are scripts that Git executes before or after events such as: commit, push, and receive. Git hooks are a built-in feature - no need to download anything. Git hooks are run locally. These hook scripts are only limited by a developer's imagination.

In other words, hooks enable running custom logic at various points in the life cycle of a Git repository and the artifacts therein.

🊝

The list of hooks:

applypatch-msg pre-applypatch post-applypatch pre-commit prepare-commit-msg commit-msg post-commit pre-rebase post-checkout post-merge pre-receive update post-receive post-update pre-auto-gc post-rewrite pre-push

Two pop out as interesting given the goals above: pre-commit and pre-push. Pre-push would only help in the case of ensuring what I'm pushing to the remote is high quality, but wouldn't do anything for what I'm committing to the local copy of the repo. Therefore, the pre-commit hook is what I'll focus on. This hook runs prior to a commit being written to the repository.

ðŸ˜ĩ

Now that I know the hook to use, here's the workflow I expect to follow:

  1. Create a script in .git/hooks.
  2. Forget the name of the hook to use, hit Git docs.
  3. Start writing the shell script to implement the hook.
  4. Forget the Git commands to get the list of files which are part of the commit.
  5. Hit Git docs again.
  6. Write shell logic to skip over committed files which are a file type I don't want to check.
  7. Hit StackOverflow for help writing shell code.
  8. ...

And right about here is where I would give up. There's no way I can do this for every repo I work in. Too much. No, thank you.

ðŸŠĪ

Luckily, the Git ecosystem has come to the same conclusion and also has come up with solutions. There are a number of hook managers on githooks.com. The one I've adopted is pre-commit:

A framework for managing and maintaining multi-language pre-commit hooks.

There are two important parts to this description:

  1. It's a framework. Not only is pre-commit a tool for managing the Git hook of the same name, but it provides a way for tool authors to make it easy for others to consume those tools. For example, the CloudFormation Lint tool--a linter for AWS CloudFormation IaC templates--has joined the pre-commit ecosystem by publishing a simple yaml file in their GitHub repo which tells pre-commit how to consume the linter. This allows pre-commit users to easily add CloudFormation Lint as a hook.
  2. It's multi-language. Even though the pre-commit tool is written in Python, hooks can be written in Python, shell, Ruby, and more and all will be executed when the hook is called. This allows the ecosystem to extend beyond the boundaries of a single language.

ðŸšĻ

A caveat on Git hooks: They are local to the copy of the repository. If you and I are working on our own clones of some repo and you setup a hook and then later on push to the central repo, I do not get a copy of those hooks the next time I pull from the repo. I would have to install the hooks myself in my clone of the repo. Similarly, if you made another clone of the repo, your hooks would not follow you there; you'd have to install them again.

There are methods for bootstrapping a fresh repo clone with hooks, but they require work up front to setup.

🧐

Here's an example where I used pre-commit: github.com/knightjoel/mediawiki-to-notion

This is a personal project which implements a data pipeline for converting MediaWiki content to Notion pages.

✅

There were a number of checks I wanted to implement in this project:

  1. Format the Python code for the AWS Lambda functions (adherence to PEP8 and other common conventions).
  2. Scan the Python code for the AWS Lambda functions for security issues.
  3. Validate the definitions of the AWS Step Functions state machines is proper, parsable JSON.
  4. Scan all commits for AWS credentials.

🔌

Let's say I'm about to commit some new changes to this project's repo and I've already configured pre-commit with this configuration:

repos:
    - repo: https://github.com/pre-commit/pre-commit-hooks
      rev: v4.4.0
      hooks:
      - id: check-merge-conflict
      - id: check-json
      - id: detect-aws-credentials
    - repo: https://github.com/psf/black
      rev: 22.12.0
      hooks:
      - id: black
        language_version: python3.9
    - repo: https://github.com/pycqa/flake8
      rev: 6.0.0
      hooks:
      - id: flake8
    - repo: https://github.com/pycqa/bandit
      rev: 1.7.4
      hooks:
      - id: bandit

When I run git commit, pre-commit kicks in and runs the hooks:

~/git/mw-to-notion% git commit -a
check for merge conflicts................................................Passed
check json...............................................................Failed
- hook id: check-json
- exit code: 1

state_machines/upload-state-machine.asl.json: Failed to json decode (Expecting ',' delimiter: line 18 column 11 (char 2096))

detect aws credentials...................................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted process-mw-dump.py

All done! âœĻ 🍰 âœĻ
1 file reformatted.

flake8...................................................................Passed
bandit...................................................................Passed
~/git/mw-to-notion%

What I see is that before I even have a chance to enter the commit message, pre-commit is alerting me to some failed checks and has caused the commit process to abort.

  • The JSON for one of the state machines isn't parsing. It nicely gives me the line and column where the parse error is so I can fix the error.
  • The Python in process-mw-dump.py was reformatted. The 'black' hook reformatted the file in place; no further work needed on my part.

Once all the issues are fixed, a second git commit will proceed (and I'll be assured what I've just committed is in good shape).

🎎

How you can get started with pre-commit:

  1. Install it: pip install pre-commit.
  2. Find hooks and configure them. The pre-commit site is a good place to start.
  3. When you've written your .pre-commit-config.yaml file, activate pre-commit on the repo: pre-commit install.

😂

Some fun hooks to whet your appetite:

  • cfn-lint, cfn-nag (hooks for working with CloudFormation templates).
  • check-yaml (check YAML files for parsability).
  • go-fmt, go-lint, more (formatters, linters, and more for Go code).
  • markdownlint-cli2, markdown-link-check (Markdown linting, check hyperlinks in the Markdown for successful responses (I use this hook when publishing every article on this site)).
  • terraform_fmt, terraform_validate, more (lint, format, and more for terraform IaC).
  • trailing-whitespace, end-of-file-fixer, mixed-line-ending (checks for various white space issues).

This is just the trip of the iceberg. Go browse the list of hooks on pre-commit.com and get inspired to the many different ways you can check the quality of your artifacts.

Happy commit(ing)!


Disclaimer: The opinions and information expressed in this blog article are my own and not necessarily those of Amazon Web Services or Amazon, Inc.