The JSON Canonicalisation Scheme (RFC 8785) in action and how to secure JSON objects with HMAC

We recently had the task to add HMAC security to DynamoDB items stored by the Connect2id server in the AWS cloud. DynamoDB is a key-value database which works well with JSON objects. Hash-based Message Authentication Codes (HMAC) is a common cryptographic method for ensuring the integrity and authenticity of data, which employs a secret key and a hashing algorithm, such as SHA-2.

The principle of HMAC

The code in the HMAC algorithms is computed over the binary representation of the data and typically stored alongside the data, for easy inspection. To check that the data has not changed the computation must be repeated by inputting the same key and the data in the exact same order of its binary representation. The stored and the recomputed codes are then compared. If the two codes don't match this is a sign that the data integrity was affected, i.e. it was modified in some way. The secret key ensures that only its holder can compute and verify the code. The code thus also serve to authenticate the data.

Example HMAC with SHA-256 of the "Hello, world!" string, which is converted to its UTF-8 / ASCII bytes representation to be input into the computation:

Input text (UTF-8): Hello, world!
Secret key (BASE64 encoded): lHG3evumVZLuM2nluoxI8hVxnSL3V8r7U+8mqUp2NR4=
Code (BASE64 encoded): UtTLGRell8x8up6b8sF44o18telWQITfOHQSxUGSNPA=

Let's see the resulting code if the comma is removed from the "Hello, world!" string:

Input text (UTF-8): Hello world!
Secret key (BASE64 encoded): lHG3evumVZLuM2nluoxI8hVxnSL3V8r7U+8mqUp2NR4=
Code (BASE64 encoded): wATR9kyMBXiog8ENqMqsP68ZDQSuvUtxh4ArvL+sgx4=

Notice how the computed code changed entirely!

The issue with JSON serialisation

JSON can represent identical data in different ways and this is part of what makes it so useful.

Consider the following identical JSON data, formatted in three different ways:

JSON formatting for compactness:

{"key":"9cea8d2d","name":"Alice Adams","age":21}

As above, but with a different member ordering:

{"key":"9cea8d2d","age":21,"name":"Alice Adams"}

"Pretty" JSON formatting for better human readability:

{
  "key" : "9cea8d2d",
  "name" : "Alice Adams",
  "age" : 21
}

If we take the above JSON strings (as ASCII or UTF-8 bytes) this will produce three different binary representations. And if we input them into a HMAC SHA-256 computation we will get three different codes. This means that for HMAC to work reliably and prevent false errors (code mismatches) the JSON data must always be presented in some deterministic, also called normalised or canonical, order.

The JSON Canonicalisation Scheme (JCS) comes to rescue

The JSON Canonicalization Scheme (JCS) is a standard algorithm for putting arbitrary JSON data in a deterministic format. It was originally motivated by the need to perform reliable cryptographic operations on JSON data, such as applying digital signatures and HMACs.

The JCS employs methods such as stripping redundant whitespace and sorting the members in JSON objects.

When JCS is applied to the sample JSON object we get:

{"age":21,"key":"9cea8d2d","name":"Alice Adams"}

Now that we got our JSON data in a canonical format we can safely take the string bytes to be input into the HMAC computation.

Input text (UTF-8): {"age":21,"key":"9cea8d2d","name":"Alice Adams"}
Secret key (BASE64 encoded): lHG3evumVZLuM2nluoxI8hVxnSL3V8r7U+8mqUp2NR4=
Code (BASE64 encoded): MLE+O33O8Rv2fdSajlaK6h3wT8yKQkbNuPCoWDRWcz4=

Storing the HMAC as part of the JSON data

The computed message authentication code can be naturally included in the JSON object that it is meant to secure:

{
  "key" : "9cea8d2d",
  "name" : "Alice Adams",
  "age" : 21,
  "_hmac#s256" : "MLE+O33O8Rv2fdSajlaK6h3wT8yKQkbNuPCoWDRWcz4="
}

To recompute the code simply remove the _hmac#s256 member and input the bytes of the JSON string (in JCS form!) to the HMAC SHA-256 algorithm.

JCS in Java

Samuel Erdtman, co-author of the JCS standard, has provided a Java open source implementation. It is only 28 KBytes of Java bytecode!

https://github.com/erdtman/java-json-canonicalization

Example Java code to turn a JSON string into canonical form and gets its bytes for HMAC or digital signature computation:

import java.nio.charset.StandardCharsets;
import org.erdtman.jcs.JsonCanonicalizer;

String json = "{ /* some JSON */ }";
JsonCanonicalizer jc = new JsonCanonicalizer(json);
byte[] hmacInput = jc.getEncodedString().getBytes(StandardCharsets.UTF_8);

How to secure DynamoDB items with HMAC

Let's now apply our JSON canonicalisation strategy to secure DynamoDB items with HMAC SHA-256:

import java.nio.charset.StandardCharsets;
import javax.crypto.Mac;
import javax.crypto.SecretKey;
import javax.crypto.spec.SecretKeySpec;
import com.amazonaws.services.dynamodbv2.document.Item;
import org.erdtman.jcs.JsonCanonicalizer;

// Generate the secret key, must have the same size as the hash
// The key must be stored securely!
byte[] secretKeyBytes = new byte[256/8];
new SecureRandom().nextBytes(secretKeyBytes);
SecretKey secretKey = new SecretKeySpec(secretKeyBytes, "HmacSHA256")

// Some DynamoDB item to secure with HMAC
Item item;

// Get the item JSON in canonical format
JsonCanonicalizer jc = new JsonCanonicalizer(item.toJSON());
byte[] hmacInput = jc.getEncodedString().getBytes(StandardCharsets.UTF_8);

// Compute the HMAC
Mac hmacSHA256 = Mac.getInstance("HmacSHA256");
hmacSHA256.init(secretKey);
byte[] hmac = hmacSHA256.doFinal(hmacInput);

// Include the HMAC in the DynamoDB item, the item can be saved now
item = item.withBinary("_hmac#s256", hmac);

To verify the HMAC of a DynamoDB item:

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import javax.crypto.Mac;
import com.amazonaws.services.dynamodbv2.document.Item;
import org.erdtman.jcs.JsonCanonicalizer;

// Make sure the HMAC attribute isn't missing
if (! item.hasAttribute("_hmac#s256")) {
    throw new Exception("Missing item HMAC attribute");
}

// Extract the HMAC
byte[] storedHMAC = item.getBinary("_hmac#s256");

Item baseItem = item.removeAttribute("_hmac#s256");

// Get the item JSON in canonical format
JsonCanonicalizer jc = new JsonCanonicalizer(item.toJSON());
byte[] hmacInput = jc.getEncodedString().getBytes(StandardCharsets.UTF_8);

// Recompute the HMAC
Mac hmacSHA256 = Mac.getInstance("HmacSHA256");
hmacSHA256.init(secretKey);
byte[] recomputedHMAC = hmacSHA256.doFinal(hmacInput);

// Compare the two HMACs
if (! MessageDigest.isEqual(storedHMAC, recomputedHMAC)) {
    throw new Exception("Invalid item HMAC");
}