Wednesday 27 December 2017

Mercurial hooks to enforce commit rules

If you've been in the software industry for a while, there's something that's very clear, no software can be done on your own for too long. Working as a team requires some form organisation, some rules, some standards so everyone is on the same page. The more standardised, the easier to automate processes. Let's start on simple day to day tasks such as committing code to source control.

In the team where I'm working, we set a couple of rules for all to follow.

Commit messages normalized

Since we are supposed to associate every commit with a corresponding ticket on our trello board, that is a fundamental part of the commit message. The format we set is as follows:

[Committer/Committer] - ticket ### - commit message

Where:

Committer is the initials of the dev(s) involved in that particular commit

Ticket ### is the trello card number associated with the task

Commit message is a meaningful message that summarises what has changed

spaces and dashes are mandatory

Limiting the maximum number of committed files

A common rule of thumb is that we should limit the number of modified files to a maximum so we can trace and understand what has changed in order to review when we need to find some specific change. However, it's not set in stone what that maximum is, so we opted for a manageable amount of files, initially 10 but it turns out that it works better with 20 and it's still a manageable amount.

How do we enforce this?

After some research, I put my bet on mercurial hooks, since we use mercurial as source control and bitbucket as code repository. Mercurial hooks allows the possibility of executing custom code when some event happens, for example before the commit transaction is finished. At this point, we can know all about the changes that are about to be committed. The main point of interest is the number of files in the current changeset and the description aka commit message.

Before considering a commit as valid, the previously mentioned rules are verified and if one of them is invalid, the commit is aborted.

What if we are merging ?

There is a scenario that needs to be treated differently, it's when a commit is done after the result of a merge. Even if every single commit is limited to a maximum number of modified files, when merging with a different branch such as stable cut, inevitably there will be a lot more files than the maximum and that's acceptable, so in this case, we ignore the rule of the number of files and the commit message should be different i.e Merge with stable.

Putting it all together

Installing the hook.

Inside .hg folder, there is a file named hgrc, it defines mercurial configuration for the current repository. Add the following lines.

[hooks]
pretxncommit = python:hooks/commit-rules.py:checkCommit

Here we assign the python function checkCommit to pretxtcommit event hook, and such function is located inside hooks/commit-rules.py file relative to the repository root. I was tempted to keep this file inside .hg but this is not possible.

Let's inspect what's inside this commit-rules.py

import re

maximum_files = 20

def checkCommit(ui, repo, **kwargs):
 
 # Check the commit message for compliance with our team rules, if not compliant
 # Then abort the commit transaction displaying useful information to the user
 hg_commit_message = repo['tip'].description()
 is_incorrect_message = checkMessage(ui, hg_commit_message)
 if is_incorrect_message:
  return True

 # Check if the current revision has more than one parent
 # if more than one, the we are on a merge => do not check the number of files
 revision_parents = len(repo['tip'].parents())
 if revision_parents > 1:
  return False
 
 # Check the number of files on current revision, if they're more than the maximum 
 # Then abort the commit transaction displaying useful information to the user
 repo_files_count = len(repo['tip'].files()) 
 is_incorrect_commit_files = checkFiles(ui, repo_files_count)
 if is_incorrect_commit_files:
  return True
  
 return False

def checkMessage(ui, message): 
 
 # Regular expression to check the commit message structure - for a single commit
 # Valid examples:
 #
 # [DG/APM] - ticket 123 - Two committers text for a valid commit
 # [APM] - ticket 123 - Single committer text for a valid commit 
 #
 commit_re = re.compile(r'(\[[A-Za-z\/]+\])\s\-\sticket\s(\d+)\s\-\s(.*)')
 commit_check = commit_re.match(message)
 
 # regular expression to check the commit message structure - for a merge commit
 # Valid examples:
 #
 # Merge with stable
 # Merge with default
 #
 merge_re = re.compile(r'Merge with [A-Za-z0-9- ]+')
 merge_check = merge_re.match(message)
 
 if not commit_check and not merge_check:
  ui.warn('Commit message does not comply with our team rules\n')
  ui.warn('\n')
  ui.warn('Use format "[Committer/Commiter] - ticket ### - commit message"\n')
  ui.warn('\n')
  ui.warn('* Committer initials should be separated by /\n')
  ui.warn('* Ticket ### should match Trello ticket number\n')
  ui.warn('\n')
  ui.warn('For merge, use format "Merge with branch-name"\n')
  ui.warn('\n')
  
  # return true if is invalid 
  return True
 
 return False

def checkFiles(ui, files):
 if (files > maximum_files):
  ui.warn("Commit with too many files (" + str(files) + ") - Maximum files (" + str(maximum_files) +")\n")
  # return true if is invalid 
  return True
 
 return False

Since this code has to run on each dev computer and does not propagate for security reasons according to mercurial documentation, we made a way to "install" it when the build operation is run in Visual Studio by creating a pre build event that runs a powershell script.

Here is the powershell script that installs the hook when it's not already there.

Param(
    [Parameter(Mandatory=$True)]
    [string]$hgrcFile
)

if (-not (Test-Path $hgrcFile) ) {
    exit
}

#$hgrcFile = ".\.hg\hgrc"
$hgrc = Get-Content $hgrcFile -ErrorAction Stop

$lineHook = "[hooks]"
$linePreTxnCommit = "pretxncommit = python:hooks/commit-rules.py:checkCommit"
$containsHooks = $hgrc.Contains($lineHook)
$containsPreTxtCommit = $hgrc.Contains($linePreTxnCommit)

if (-not ($containsHooks -and $containsPreTxtCommit)) 
{
    Write-Host "hook not installed. Installing ..."

    $hgrc += ""
    $hgrc += $lineHook
    $hgrc += $linePreTxnCommit
    
    $hgrc | Out-File $hgrcFile -Encoding utf8

    Write-Host "Done"
}

And here is the pre build event fragment from the main .csproj file in our solution

<PropertyGroup>
    <PreBuildEvent>
        powershell -ExecutionPolicy ByPass -File "$(SolutionDir)install-hook.ps1" -hgrcFile "$(SolutionDir).hg\hgrc"
    </PreBuildEvent>
</PropertyGroup>

For more reference about mercurial hooks, check Mercurial hooks examples

Monday 27 November 2017

A bug hidden for 4 years !

I have to admit I've abandoned my file uploader for some time, and I've been revisiting the code recently as part of some experiments with .NET Core 2.

After some work done to make it work with the new framework, to my surprise, it just looked to work perfectly until I tried to validate one of the basic implicit requirements: the uploaded copy must be exactly the same as the original file. It turned out that even if the files had the same exact size, to the byte, they weren't identical, how was that possible ?

Symptom:

After uploading a file larger than the packet size (default to 4 MB), the resulting copy was not identical to the original.

Simple test:

Upload a zip file larger than 4 MB, try to open it. It was corrupt.

The most uprising fact was that if I ran the .NET 4 version it still worked as expected ! Next, I migrated the solution to .NET 4.7 and still worked as expected, apparently it was only affecting to .NET Core version. So, under the circumstance, I tried to compare request / response side by side working version vs bugged version.

A few things looked a bit suspicious:

  • some extra headers sent on .NET working version
  • Transfer-Encoding: chunked from .NET Core bugged version

None of those were related. Back to square one.

I tried reducing the packet size to 1024 bytes and upload a text file so I could see how the content differed but no luck, it was indeed the same (verified comparing documents with Diff tool) Well now deep debugging showed that the buffer wasn't being written completely to output file, that would explain the corrupted resulting file but not the size difference. Let's see the code.

Original version

var buffer = new byte[contentLength];
var bytesread = inputStream.Read(buffer, 0, buffer.Length);
if (bytesread > 0)
{
    using (var fs = File.Open(actualfile, FileMode.Append, FileAccess.Write))
    {
        fs.Write(buffer, 0, buffer.Length);
    }
    //json.Data = new {guid, packet};
    return new UploadPacketResult(guid, packet);
}

Spot something weird ?

It turns out that Stream.Read can perform a blocking operation that reads up to the length specified but not necessarily the whole buffer, now it's obvious! but how it didn't fail before !! I don't have an answer for that :)

Well, here is the fixed version

var buffer = new byte[contentLength];
using (var fs = File.Open(actualfile, FileMode.Append, FileAccess.Write))
{
    int read = 0;
    while ((read = inputStream.Read(buffer, 0, buffer.Length)) > 0)
    {
        fs.Write(buffer, 0, read);
    }
}
//json.Data = new {guid, packet};
return new UploadPacketResult(guid, packet);

And that's the way it was supposed to be done in the first place, but there was no test case that proved otherwise (until now, 4 years later)