Musings of a PC

Thoughts about Windows, TV and technology in general

Restoring Edge from WSL 2 after installing Chrome

As described in https://pcmusings.wordpress.com/2022/05/13/google-chrome-under-wsl-2/, running Chrome under WSL 2 within Windows 11 is a fairly straightforward process … but it does cause me an issue when authenticating for AWS Single Sign-on.

So, I needed to (a) figure out how AWS SSO was triggering the browser and (b) how to restore the previous behaviour.

According to https://github.com/aws/aws-cli/issues/4715, AWS SSO is using the Python webbrowser code to trigger the default browser. Looking at https://github.com/python/cpython/blob/db388df1d9aff02f791fe01c7c2b28d73982dce6/Lib/webbrowser.py#L556, I could see that it was calling xdg-settings get default-web-browser to determine how to fire up the web browser.

Thanks to https://github.com/giggio/dotfiles, I was able to convince the system to trigger the Windows browser instead of Google Chrome by following these steps:

  1. Create ~/.local.share/applications/wslview.desktop and put the contents of https://github.com/giggio/dotfiles/blob/main/applications/wslview.desktop in there.
  2. Run these commands:
xdg-mime default wslview.desktop x-scheme-handler/http
xdg-mime default wslview.desktop x-scheme-handler/https
xdg-mime default wslview.desktop x-scheme-handler/about
xdg-mime default wslview.desktop x-scheme-handler/unknown
xdg-mime default wslview.desktop text/html
xdg-settings set default-web-browser wslview.desktop

After that, signing in with AWS SSO triggers Edger under Windows 11 (so I can continue to use the USB security key) and I can still run google-chrome under Ubuntu and have the Chrome window appear in Windows.

Google Chrome under WSL 2

Running graphical Linux applications on Windows under WSL 2 is a new(ish) feature that can now be tried under Windows 11 if you meet the system requirements detailed in https://docs.microsoft.com/en-us/windows/wsl/tutorials/gui-apps, which also includes the steps to install Google Chrome.

The installation steps are bit more involved than just “install this package” and I’m guessing that is because, by default, Ubuntu on WSL 2 is just the command-line version and does not include the full desktop, so various packages are missing.

Once installed, though, Google Chrome works really well from WSL 2, appearing in Windows 11. There are some error messages displayed in the WSL 2 console but they don’t seem to prevent the browser from working.

After installing Chrome, though, this then becomes the default browser for anything running in Ubuntu that wants it. Normally, that isn’t a problem but, for me, I use AWS Single Sign-on which triggers a browser session to authenticate. Again, not normally a problem, but I use a USB security key as part of my multi-factor authentication and that doesn’t seem to work under WSL 2 … so I needed to find a way to restore the ability to trigger Windows’ Edge browser from WSL 2. The solution has been documented in https://pcmusings.wordpress.com/2022/05/13/restoring-edge-from-wsl-2-after-installing-chrome/

Combining MP3 files losslessly [updated]

I have a SanDisk Sport Plus which I use to play audio books. It is a great little device but it has a couple of flaws:

  1. When a book finishes, the player starts playing the same book all over again, rather than moving on to the next book.
  2. There is a hard limit of 1,000 files of each filetype. If you exceed that limit, the player is unable to display all of the books, but you don’t get any errors.

So I’ve taken to combining the MP3 files for a single book into a single file, and then tagging each book for a series, e.g. James Bond, so that the device interprets each file as a separate “chapter” in the same book. This then results in the player playing one book (a single file) then moving on to the next book, and so on.

I found a really good primer for combining MP3 files losslessly but, when dealing with a lot of MP3 files, a simple cat command becomes unwieldy. Using a wildcard, e.g. *.mp3, may work except when filenames are not in strict alphabetical order. For example, if files have numbers like “1”, “2”, …, “10”, “11” then the actual filename sequence may change to “1”, “10”, “11”, “2”, etc.

So I’ve now devised a longer “recipe” which I’m documenting so that I don’t have to devise it all again in the future.

  1. Create a list of all of the MP3 files
    ls *.mp3 > list.txt
  2. Use an editor to reorder the lines so that they are in numerical order
  3. Combine the files
    xargs -d "\n" -a list.txt cat | mp3cat - - > ~/tmp.mp3
  4. Copy the metadata over from the first file
    id3cp <first file> ~/tmp.mp3

Update (2021-09-05):

I’ve discovered that my recipe wasn’t always creating a valid output file. Sometimes, the output file created looked to be the correct size but had an incorrect audio length, notably the length of the first file. Re-reading the originally linked article suggests that it is because I’m missing the last step from that article: “Finally, VBRFix will re-sync the VBR header to the actual size of the resulting file.

The reason I had been missing out that step was because I couldn’t get VBRFix to work on my Ubuntu 20.04 system. In my early testing, I’d clearly not had VBR source files.

So I’ve found an alternative mp3cat tool: http://www.dmulholl.com/dev/mp3cat.html. This tool explicitly allows you to specify the input files to read so step #3 above now needs to change to:

FILELIST=$(xargs -d "\n" -a list.txt ls); mp3cat $FILELIST -o ~/tmp.mp3

The advantage of this tool is that is has built-in handling for VBR files and therefore adds the correct VBR header, fixing the audio length issue.

Calling AWS APIs from xMatters

In Automated AWS EBS expansion with xMatters and Automated AWS EBS expansion with xMatters – part 2, I discussed a complete workflow in xMatters that reacts to a disc low free space alarm sent from AWS CloudWatch via SNS and then takes the appropriate steps to resize the affected volume, both in EBS and the (Linux) operating system.

xMatters is a really powerful and flexible platform, but it does have some limitations and restrictions. One of the big challenges to solve when creating the workflow was that the AWS SDK could not be used. It isn’t provided as part of the xMatters platform and it cannot be installed as a library for use by the Javascript step code.

The job of the SDK, ultimately, is to make it easy to use the underlying AWS APIs which, themselves, are accessed via REST calls. Amazon document the APIs, their endpoints and their payloads, so it should be quite straightforward to write some Javascript to call the APIs directly.

Right?

Well, not quite.

The biggest challenge is correctly signing the request so that it can then be processed by AWS. Again, Amazon has documented this process, including a step-by-step process. Some Internet searches later and I’ve even found that someone has already written the steps in Javascript. Unfortunately, there is another gotcha.

The Crypto library used uses setTimeout and this isn’t available on the xMatters platform. So … back to Internet searches … and ultimately I find this: jsSHA – SHA Hashes and HMAC in JavaScript (coursesweb.net). An implementation that has no external requirements. Yay!

There are still some gotchas around calling the APIs, both in the Javascript and when building a workflow in xMatters so let’s dig into the code a bit deeper to better understand what is required.

const jsSHA = require('jsSHA');

const hmacSha256 = (signingKey, stringToSign, type="HEX") => {
    var sha_ob = new jsSHA("SHA-256", "TEXT");
    sha_ob.setHMACKey(signingKey, type);
    sha_ob.update(stringToSign);
    return sha_ob.getHMAC("HEX");
};

function getSignatureKey(key, dateStamp, regionName, serviceName) {
    var kDate = hmacSha256(AWS4${key}, dateStamp, "TEXT");
    var kRegion = hmacSha256(kDate, regionName);
    var kService = hmacSha256(kRegion, serviceName);
    var kSigning = hmacSha256(kService, "aws4_request");
    return kSigning;
}

By adding jsSHA as a library to a xMatters workflow, the above code implements the steps required to create a signature for a given AWS secret key, date stamp, region and service name.

function prependLeadingZeroes(n) {
     if (n <= 9) {
         return "0" + n;
     }
     return n.toString();
}

function hashSha256(stringToHash) {
     var sha_ob = new jsSHA('SHA-256', "TEXT");
     sha_ob.update(stringToHash);
     return sha_ob.getHash("HEX");
}

function buildHeader(access_key, secret_key, region, request_parameters) {
     const method = "GET";
     const service = "ec2";
     const host = service+"."+region+".amazonaws.com";
     const t = new Date();
     const datestamp = ${t.getFullYear()}${prependLeadingZeroes(t.getMonth()+1)}${prependLeadingZeroes(t.getDate())};
     // 4-digit year, 2-digit month, 2-digit date, T, 2-digit hour, 2-digit minutes, 2-digit seconds, Z
     const amzdate = datestamp+"T"+prependLeadingZeroes(t.getHours())+prependLeadingZeroes(t.getMinutes())+prependLeadingZeroes(t.getSeconds())+"Z";
     const canonical_uri = "/";
     const canonical_querystring = request_parameters;
     const canonical_headers = `host:${host}\nx-amz-date:${amzdate}\n`;
     const signed_headers = 'host;x-amz-date';
     // Calculate the hash of the payload which, for GET, is empty
     const payload_hash = hashSha256("");
     const canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers + '\n' + signed_headers + '\n' + payload_hash;
     const algorithm = 'AWS4-HMAC-SHA256';
     const credential_scope = datestamp + '/' + region + '/' + service + '/' + 'aws4_request';
     const string_to_sign = algorithm + '\n' +  amzdate + '\n' +  credential_scope + '\n' +  hashSha256(canonical_request);
     const signing_key = getSignatureKey(secret_key, datestamp, region, service);
     const signature = hmacSha256(signing_key, string_to_sign).toString('hex');
     const authorization_header = algorithm + ' ' + 'Credential=' + access_key + '/' + credential_scope + ', ' +  'SignedHeaders=' + signed_headers + ', ' + 'Signature=' + signature;
     var requestHeaders = {
         "host": host,
         "x-amz-date": amzdate,
         "Content-type": "application/json",
         "Authorization": authorization_header };
     return [
         requestHeaders,
         host
     ];
 }

This code should be fairly self-explanatory but, in summary, it takes the access key & secret key, region and request parameters and returns the appropriate request headers and the required host (endpoint) for the request.

AWS has different endpoints not just for each service but also for each region. So, for example, making an EC2 call for us-east-1 requires using ec2.us-east-1.amazonaws.com, while making a SSM call for eu-west-2 requires using ssm.eu-west-2.amazonaws.com. xMatters doesn’t allow scripts to dynamically reference endpoints. Instead the endpoints must be separate configured as part of the workflow and the script then dynamically changes which xMatters endpoint is referenced:

function executeEc2Action(access_key, secret_key, region, request_parameters) {
     const blob = buildHeader(access_key, secret_key, region, request_parameters);
     var requestHeaders = blob[0];
     const host = blob[1];
     if (input.AWSSessionToken) {
         requestHeaders["X-Amz-Security-Token"] = input.AWSSessionToken;
     }
     var ec2Request = http.request({
         endpoint: host,
         path: "/?"+request_parameters,
         method: 'GET',
         headers: requestHeaders
     });
     return ec2Request.write();
 }

So, we’re almost there. We can now execute an EC2 call like this:

var ec2Response = executeEc2Action(access_key, secret_key, region, "Action=ModifyVolume&Size="+volume_size+"&Version=2016-11-15&VolumeId="+volume_id);

A really important point to make here: the different keys used in request_parameters function parameter must, repeat MUST, be in alphabetical order. In other words: Action, Size, Version, VolumeId. If they are not in alphabetical order, the call will fail with “AuthFailure – AWS was not able to validate the provided access credentials”.

In trying to troubleshoot that particular problem, I came across AWS Signature v4 Calculator (com.s3-website-us-west-2.amazonaws.com) which shows the result of the signature calculations at each step, thus making it easier to pinpoint where the code might be wrong. If you find yourself debugging/troubleshooting in this area, just remember to keep the date & time the same in both the website and the code as the signature calculations do rely on them so the slightly variance will give different results.

So, we now have the ability to call any AWS API so long as we present the parameters correctly. The final piece of the puzzle is decoding what comes back from AWS. If you’ve ever used boto3, you’ll know that it returns JSON. Curiously, the AWS APIs do not … they return XML! I’m not strong at parsing XML paths but, thankfully, xMatters includes a number of libraries for XML manipulation, including JXON, a library to convert XML to JSON.

var json_response = JXON.parse(ec2Response.body);

xMatters doesn’t allow one library to reference another library, unfortunately, which means that all of the AWS code needs to be duplicated in each script. Apart from that, though, it should now be quite straightforward to call any AWS API from within a xMatters workflow.

All of the scripts written for the resize workflow can be found at https://github.com/linaro-its/xmatters-ebs-automation

Automated AWS EBS expansion with xMatters – part 2

In part 1, I wrote about a workflow created for xMatters that reacted to CloudWatch alarms delivered via SNS when the free space on a server was running low.

Since writing that, a bug was discovered (now fixed) that prevented the filing system associated with the volume from being resized if the volume was not stored on a NVMe device. That bug resulted in the realisation that there was a “gap” in the workflow:

  • CloudWatch alarm goes off, triggering the workflow
  • Workflow expands the volume but doesn’t resize the filing system
  • CloudWatch alarm clears due to free space increasing
  • Time passes …
  • CloudWatch alarm goes off, triggering the workflow
  • … and around we go again

In other words, there was the risk that the workflow would continue to grow the volume without a corresponding resizing of the filing system, thereby never stopping the alarm loop.

To correct that behaviour, a new step has been added to the workflow:

CheckFS step added between CheckVolume and ModifyVolume

The new step – SSM-CheckFS – takes the following actions:

  • Runs some commands (see below) on the host to determine the size of the filing system.
  • Compares that size with the size of the volume.
  • Sets an output to indicate whether or not to proceed with the workflow.

The commands run on the host are as follows:

BLOCKCOUNT=$(sudo dumpe2fs -h /dev/<device> | grep 'Block count' | awk '{print $3}')
BLOCKSIZE=$(sudo dumpe2fs -h /dev/<device> | grep 'Block size' | awk '{print $3}')
echo "$(($BLOCKCOUNT * $BLOCKSIZE))"

By using dumpe2fs, we can determine the block count and size for the underlying filing system. That is then returned to the calling script.

The sources for each of the steps in the workflow can be obtained from https://github.com/linaro-its/xmatters-ebs-automation.

The entire workflow can be obtained from https://github.com/linaro-its/xm-labs-aws-ebs-resize.

Automated AWS EBS expansion with xMatters

The complete workflow in xMatters

Introduction

xMatters is a powerful incident management tool, used by Linaro to take various sources of alarms and coordinate them into alerts for the appropriate on-call staff. There are many such tools available on the market to choose from; the main reason why xMatters was picked was the flexibility provided by being able to write custom steps in Javascript. As a result, workflows can be very flexible.

Linaro uses Amazon Web Services (AWS) for most of its infrastructure, predominantly EC2 instances. One of the tasks I often find myself dealing with is responding to a low free disc space alarm. The alarm is generated by CloudWatch Alarms as a result of metrics submitted by the CloudWatch Agent running on each instance. The alarm is fed by a SNS topic directly to a xMatters webhook and I then get notified on my phone and by email.

Increasing the size of an EBS volume isn’t hard or onerous – expand the EBS volume then get the OS to grow the corresponding partition (if required) and resize the filing system. Ideally, though, I wouldn’t have to do it manually – particularly if the alarm goes off at 3am!

This article looks at the challenges around automating the process and how it has been solved with a xMatters workflow. Whilst there are many ways that the process could be automated, e.g. writing a script in Lambda, I wanted to try and solve it entirely within xMatters so that if the process completes automatically, nothing happens but if the process fails at any step, a xMatters event is still created.

Assumptions and prerequisites

  • This workflow has only been tested on Ubuntu instances. It should work on other variants of Linux but not with Windows due to the hard-wired commands being run at certain points.
  • For any non-NVMe volumes, there is an assumption that AWS sees the device as sdf but the operating system sees the device as xvdf.
  • For NVMe volumes, there is an assumption that only the root device will have a partition on it.
  • Instances will need to have the SSM agent running on them in order to be able to execute commands on the operating system.

Not quite a blank canvas

The workflow starts from the xMatters CloudWatch Integration. This provides the framework to receive the alarm via SNS and then process it.

Authentication

The first thing to deal with is credentials for any of the interactions with AWS. For a simple environment, an AWS IAM user could be created with appropriate permissions and the static access key and secret key then used. For Linaro’s environment, that isn’t going to work. We have multiple accounts so we’d need a user per account and that then becomes more unwieldy when it comes to using those credentials within xMatters.

To solve the multiple account issue, roles are used instead, with an account being able to assume the role. That still requires a user with static credentials, which is not ideal, particularly when the recommendation is to rotate those credentials on a regular basis. A suitable IAM policy for the role is:

{
     "Version": "2012-10-17",
     "Statement": [
         {
             "Sid": "VisualEditor0",
             "Effect": "Allow",
             "Action": [
                 "ec2:DescribeInstances",
                 "ec2:DescribeVolumes",
                 "ec2:DescribeVolumesModifications",
                 "ec2:ModifyVolume",
                 "ssm:GetCommandInvocation",
                 "ssm:SendCommand"
             ],
             "Resource": "*"
         }
     ]
}

To avoid the need to have an AWS IAM user for the workflow, Linaro uses Hashicorp Vault instead. There is a single Vault AWS IAM user, with the Vault software rotating the access key regularly so that it is kept safe. To use this approach, a step was written in the workflow that has inputs for the Vault authentication values plus the desired Vault role to assume. The step outputs the STS-provided access key, secret key and session token.

Vault-AssumeAWSRole

By using xMatter’s ability to merge free text with values from other steps, a Vault role value can be provided that is a combination of the AWS account ID for the affected volume plus the fixed string -EBSResizeAutomationRole.

Getting details from the SNS message

In order to grow the affected EBS volume, the following information is needed:

  • The AWS account ID
  • The AWS region
  • The EC2 instance ID
  • The device name on the instance for the volume

The account ID is provided as an output value (AWSAccountId) from the SNS step in the workflow. The other values need a further custom step script in order to extract the information from the the Trigger Dimensions (Trigger.Dimensions from the SNS step) and the SNS Topic ARN (TopicArn).

Inputs on SNS-ExtractValues step

There is one further piece of information required – the volume ID – but that isn’t provided by CloudWatch as part of the alarm information. There are two potential options to get it – use the alarm description as an additional field and store it there or script a solution. The former is easier to use but has the drawback that if the underlying volume’s ID ever changes someone needs to remember to update the alarm description. The latter, as will be explained, is rather tricky to solve …

Getting the volume ID

On the face of it, getting the volume IDs for attached volumes on an EC2 instance looks like being a straightforward task. The describe-instance API call returns the details of the attached blocks, like this:

<blockDeviceMapping>
    <item>
        <deviceName>/dev/sda1</deviceName>
        <ebs>
            <volumeId>vol-0f5bc3e51714c5d5f</volumeId>
            <status>attached</status>
            <attachTime>2020-11-18T11:07:51.000Z</attachTime>
            <deleteOnTermination>true</deleteOnTermination>
        </ebs>
    </item>
</blockDeviceMapping>

So, for a given device name from the SNS topic, it should be simple enough to find the volume ID … except for the fact that the device name given in the SNS topic never matches the device name in the block device mapping information. Sometimes, it is quite straightforward to resolve – the block device mapping uses a name like “/dev/sda1” and the SNS topic’s device name is “xda1”. Consistent and easy to code around, so long as that mapping is the correct assumption to make.

The introduction of NVMe block devices on Nitro systems is a completely different kettle of fish, though. For example, the block device mapping example above clearly states that the device name is “/dev/sda1”. What is provided in the SNS topic? “nvme0n1p1”

The implemented solution is to use Systems Manager (SSM) to run a command on the affected instance so that the NVMe device can be translated to the associated volume ID. This does require that the SSM agent is installed and running on the instance. If anyone knows a better solution that works on Ubuntu, do let me know!

So, the workflow needs to look at the device name from the SNS alarm information and branch depending on that device name. Either route will then give us the corresponding volume ID. To make that branch operation simpler, the SNS-ExtractValues step also provides an output called NVMEdevice which is set to true if the device name starts “nvme”, otherwise it is set to false.

Use the appropriate method to get the volume ID
Get the volume ID for a NVMe device
Get the volume ID for a non-NVMe device

Expanding the volume

Once the volume ID is known, the serious stuff can start.

AWS has the ability to allow an EBS volume to be expanded without any downtime … but it comes with the penalty that, after expansion has been requested, you have to wait for the background optimisation process to complete before you can request another expansion. Even then, there is a maximum modification rate per volume limit (which appears to be one), after which you have to wait at least 6 hours before trying to modify the volume again.

So, the first thing to be done is check the status of the volume and skip to raising a xMatters alert if the volume is still being optimised. If an already-expanded volume has run out of space that quickly, there may be a bigger problem for someone to investigate.

Check that the volume isn’t being optimised

If the volume is not being optimised, the workflow moves on to modify the volume. The approach taken is to multiply the current size by a factor, e.g. 2 to double the volume’s size. Rather than hard-code that into the script, it can be configured as an input value.

EC2-ModifyVolume

If the request to modify the size fails because the modification rate per volume limit has been reached, a xMatters alert is raised.

Growing & resizing the filing system

To grow and resize the filing system, commands need to be run on the operating system of the affected instance. The “grow” part only needs to happen if the filing system is on a partition of a larger volume. This only seems to happen on the root device and, as such, there is already a tool installed on the server that can be used to grow the partition – cloud-init:

cloud-init single -n growpart

That command will grow the root partition if it needs to be grown. Otherwise, it will just exit without error.

Once that command finishes, the filing system itself is resized with:

resize2fs /dev/<device name>

Both of these commands are run by using the SSM Run Command functionality that was referenced earlier in regards to getting the volume ID for a NVMe device.

SSM-GrowAndResize

If the commands do not succeed, a xMatters alert is created, otherwise the workflow ends and, eventually, CloudWatch will realise that the volume has be resized and clear the alarm.

Improving web site quality through tighter GitHub/Bamboo integration

Before I get into the nitty-gritty, a brief recap of how things are working before any of the changes described in this article …

A Linaro static website consists of one or more git repositories, with potentially one being hosted as a private repository on Linaro’s BitBucket server and the others being hosted on GitHub as public repositories. Bamboo, the CI/CD tool chosen by Linaro’s IT Services to build the sites, monitors these repositories for changes and, when a change is identified, it runs the build plan for the web site associated with the changed repositories. If the build plan is successful, the staging or production web site gets updated, depending on which branch of the repository has been updated (develop or master, respectively).

All well and good but it does mean that if someone commits to a repository a breaking change (e.g. a broken link or some malformed YAML) then no other updates can be made to that website until that specific problem has been resolved.

To solve this required several changes being made that, together, helped to ensure that breaking changes couldn’t end up in the develop or master branches unless someone broke the rules by bypassing the protection. The changes we made were:

  • Using pull requests to carry out peer reviews of changes before they got committed into the develop or master branch.
  • Getting GitHub to trigger a custom build in Bamboo so that the proposed changes were used to drive a “test” build in Bamboo, thereby assisting the peer review by showing whether or not the test build would actually be successful.
  • Using branch protection rules in GitHub to enforce requirements such as needing the tests to succeed and needing code reviews.

Pull requests are not a native part of the git toolset but they have been implemented by a number of the git hosting platforms like GitHub, GitLab, BitBucket and others. They may vary in the approach taken but, essentially, one or more people are asked to look at the differences between the incoming changes and the existing files to see if anything wrong can be identified.

That, in itself, can be a laborious and not always successful process at spotting problems which is why there is an increasing use of automation to assist. GitHub’s approach is to have webhooks or apps trigger an external activity that might perform some testing and then report back on the results.

We opted to use webhooks to get GitHub to trigger the custom builds in Bamboo. They are called custom builds because one or more Bamboo variables are explicitly defined in order to change the behaviour of the build plan. I’ll talk more about them in a subsequent article.

The final piece of the puzzle was implementing branch protection rules. I’ve linked to the GitHub documentation above but I’ll pick out the key rules we’ve used:

  • Require pull request reviews before merging.
    When enabled, all commits must be made to a non-protected branch and submitted via a pull request with the required number of approving reviews.
  • Require status checks to pass before merging.
    Choose which status checks must pass before branches can be merged into a branch that matches this rule.

There is a further option that has been tried in the past which is “Include administrators”. This enforces all configured restrictions for administrators. Unfortunately, too many of the administrators have pushed back against this (normally because of the pull request review requirement) so we tend to leave it turned off now. That isn’t to say, though, that administrators get a “free ride”. If a pull request requires a review, an administrator can merge the pull request but GitHub doesn’t make it too easy:

Clicking on Merge pull request, highlighted in “warning red”, results in the expected merge dialog but with extra red bits:

So an administrator does have to tick the box to say they are aware they are using their admin privilege, after which step they can then complete the merge:

If an administrator pushes through a pull request that doesn’t build then they are in what I describe as the “you broke it, you fix it” scenario. After all, the protections are there for a good reason 😊.

Index page: Tips, tricks and notes on building Jekyll-based websites

Link-checking static websites

In migrating the first Linaro site from WordPress to Jekyll, it quickly became apparent that part of the process of building the site needed to be a “check for broken links” phase. The intention was that the build plan would stop if any broken links were detected so that a “faulty” website would not be published.

Link-checking a website that is currently being built potentially brings problems, in that if you reference a new page, it won’t yet have been published and therefore if you rely on checking http(s) URLs alone, you won’t find the new page and an erroneous broken link is reported.

You want to be able to scan the pages that have been built by Jekyll, on the understanding that a relative link (e.g. /welcome/index.html instead of https://mysite.com/welcome/index.html) can be checked by looking for a file called index.html within a directory called welcome, and that anything that is an absolute link (e.g. it does start with http or https) is checked against an external site.

I cannot remember which tool we started using to try to solve this. I do remember that it had command-line flags for “internal” and “external” link checking but testing showed that it didn’t do what we wanted it to do.

So an in-house solution was created. It was probably (at the time), the most complex bit of Python code I’d written and involved learning about things like how to run multiple threads in parallel so that the external link checking doesn’t take too long. Some of our websites have a lot of external links!

Over time, the tool has gained various additional options to control the checking behaviour, like producing warnings instead of errors for broken external links, which allows the 96Boards team to submit changes/new pages to their website without having to spend time fixing broken external links first.

The tool is run as part of the Bamboo plan for all of the sites we build and it ensures that the link quality is as high as possible.

Triggering a test build on Bamboo now ensures that a GitHub Pull Request is checked for broken links before the changes are merged into the branch. We’ve also published the script as a standalone Docker container to make it easier for site contributors to run the same tool on their computer without needing to worry about which Python libraries are needed.

The script itself can be found in the git repo for the Docker container, so you can see for yourself how it works and contribute to its development if you want to.

Index page: Tips, tricks and notes on building Jekyll-based websites

Automating Site Building

As I mentioned in Building a Website That Costs Pennies to Operate, the initial technical design of the infrastructure had the website layout defined in a private git repository and the content in a public git repository.

The private git server used was Atlassian BitBucket – the self-hosted version, not the cloud version. Although Linaro’s IT Services department is very much an AWS customer, we had already deployed BitBucket as an in-house private git service so it seemed to make more sense to use that rather that choose to pay an additional fee for an alternative means of hosting private repositories like CodeCommit or GitHub.

So what to do about the build automation? An option would have been to look at CodeBuild but, as Linaro manages a number of Open Source projects, we benefit from Atlassian’s very kind support of the Open Source community, which meant we could use Atlassian Bamboo on the same server hosting BitBucket and it wouldn’t cost us any more money.

For each of the websites we build, there is a build plan. The plans are largely identical to each other and go through the following steps, essentially emulating what a human would do:

  • Check out the source code repositories
  • Merge the content into a single directory
  • Ensure that Jekyll and any required gems are installed
  • Build the site
  • Upload the site to the appropriate S3 bucket
  • Invalidate the CloudFront cache

Each of these is a separate task within the build plan and Bamboo halts the build process whenever a task fails.

There isn’t anything particularly magical about any of the above – it is what CI/CD systems are all about. I’m just sharing the basic details of the approach that was taken.

Most of the tasks in the build plan are what Bamboo calls a script task, where it executes a script. The script can either be written inline within the task or you can point Bamboo at a file on the server and it runs that. In order to keep the build plans as identical as possible to each other, most of the script tasks run files rather than using inline scripting. This minimises the duplication of scripting across the plans and greatly reduces the administrative overhead of changing the scripts when new functionality is needed or a bug is encountered.

To help those scripts work across different build plans, we rely on Bamboo’s plan variables, where you define a variable name and an associated value. Those are then accessible by the scripts as environment variables.

We then extended the build plans to work on both the develop and master branches. Here, Bamboo allows you to override the value of specified variables. For example, the build plan might default to specifying that jekyll_conf_file has a value of “_config.yml,_config-staging.yml”. The master branch variant would then override that value to be “_config.yml,_config-production.yml”.

The method used to trigger the builds automatically has changed over time because we’ve changed the repository structure, GitHub has changed the service offerings and we’ve started doing more to tightly integrate Bamboo with GitHub so I’m not going to go into the details on that just yet.

Index page: Tips, tricks and notes on building Jekyll-based websites

Linaro sites and repositories