Skip to main content

Terraform Service Principal Rotation

· 6 min read
Tim Alexander

The big brain box that is Jamie McCrindle wrote an excellent blog on how to handle secret rotation within Azure using terraform. This served as the cornerstone for our service principal rotation mechanism recently and everything was fine and dandy right up until it wasn't.

The Problem

The basis for the rotation is to take an input variable to your terraform code like this:

terraform apply -auto-approve -var="date=`date +%Y%m`"

The variable would then be converted to a number and some clever maths used to result in a rolling mechanism that allows a password to be rotated based on the value of the month.

variable "date" {

  type = string

}

locals {
  date        = tonumber(var.date)
  odd_keeper  = floor((local.date + 1) / 2)
  even_keeper = floor(local.date / 2)
  use_even    = local.date % 2 == 0
}

resource "random_password" "odd" {
  keepers = {
    "date" = local.odd_keeper
  }
  length  = 64
  special = true
}

resource "random_password" "even" {
  keepers = {
    "date" = local.even_keeper
  }
  length  = 64
  special = true
}

output "odd_keep" {
  value = local.odd_keeper
}

output "even_keep" {
  value = local.even_keeper
}

The major issue here is that come the switch from December in to January and a new year being thrown in to the mix you end up with a flow like this:

First Month

terraform apply -auto-approve -var="date=202303"
.....
**Plan: 2 to add, 0 to change, 0 to destroy.**

Changes to Outputs:
+ current_secret = (sensitive value)
+ even_keep = 101151
+ odd_keep = 101152
random_password.even: Creating...
random_password.odd: Creating...
random_password.even: Creation complete after 0s [id=none]
random_password.odd: Creation complete after 0s [id=none]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

current_secret = <sensitive>
even_keep = 101151
odd_keep = 101152

Looks good.

Second Month

And the next month:

terraform apply -auto-approve -var="date=202304"
......
**Plan: 1 to add, 0 to change, 1 to destroy.**

Changes to Outputs:
# Warning: this attribute value will be marked as sensitive and will not
# display in UI output after applying this change.
~ current_secret = (sensitive value)
~ even_keep = 101151 -> 101152
random_password.even: Destroying... [id=none]
random_password.even: Destruction complete after 0s
random_password.even: Creating...
random_password.even: Creation complete after 0s [id=none]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

Outputs:

current_secret = <sensitive>
even_keep = 101152
odd_keep = 101152

And so on until...

Change of Year

terraform apply -auto-approve -var="date=202401"
......
**Plan: 2 to add, 0 to change, 2 to destroy.**

Changes to Outputs:
# Warning: this attribute value will be marked as sensitive and will not
# display in UI output after applying this change.
~ current_secret = (sensitive value)
~ even_keep = 101152 -> 101200
~ odd_keep = 101152 -> 101201
random_password.odd: Destroying... [id=none]
random_password.even: Destroying... [id=none]
random_password.odd: Destruction complete after 0s
random_password.even: Destruction complete after 0s
random_password.odd: Creating...
random_password.even: Creating...
random_password.even: Creation complete after 0s [id=none]
random_password.odd: Creation complete after 0s [id=none]

Apply complete! Resources: 2 added, 0 changed, 2 destroyed.

Outputs:

current_secret = <sensitive>
even_keep = 101200
odd_keep = 101201

Ruh Oh. Two to change in the plan. This is cycling both passwords at the same time. Assuming your application can handle this then no worries. However, this does go against the initial approach to have a rolling 60 day secret and to enable app teams to seamlessly traverse the changes and be in control of the own release cycle. And from experience not all the apps we were controlling with this method could handle the change of both. Cue some rather confused support tickets about 401 entries in logs and general sketchy authentication behaviour.

After some dive through the logs and pipeline output we can see the pattern and see both passwords are changing. Throwing the data in to Excel for simplicity we can see that the maths used will work fine for months 1-12 of a particular year but changing the year will always yield this behaviour.

The Fix

Alas, my maths brain has long since been used up and try as I might I could not fathom a suitable equation to handle this so I went a bit more hardcoded in the approach assuming that the Gregorian calendar is unlikely to change. The core concept stays the same though:

  • have two credentials
  • automate the cycling of them every 60 days
  • automate allocation of them based on a known pattern
  • not have to bodge anything bleary eyed on New Years day with a hangover.

To acheive this I made use of a lookup function and ditched using the year in the input and just having the numeric value of the month:

variable "date" {

  type = string

}



locals {

  odd_keeper  = lookup({"01"="01", "02"="01", "03"="02", "04"="02", "05"="03", "06"="03", "07"="04", "08"="04", "09"="05", "10"="05", "11"="06", "12"="06"}, var.date, "odd_default")

  even_keeper = lookup({"01"="00", "02"="01", "03"="01", "04"="02", "05"="02", "06"="03", "07"="03", "08"="04", "09"="04", "10"="05", "11"="05", "12"="00"}, var.date, "even_default")

  use_even    = tonumber(var.date) % 2 == 0

}

Or in a table format for easier reading it looks like this:

MonthInputEvenOdd
Jan-2023010001
Feb-22020101
Mar-22030102
Apr-22040202
May-22050203
Jun-22060303
Jul-22070304
Aug-22080404
Sep-22090405
Oct-22100505
Nov-22110506
Dec-22120006
Jan-23010001
Feb-23020101
Mar-23030102
Apr-23040202

The key takeaway is that Jan and Dec for Even passwords have the same value so they survive a year change meaning both values do not get blatted on New Years Day. The rest of the code remains pretty much the same and the mechanism for rotation just needs to reference the new locals values instead of the previous.