Connecting offline to online is a challenge, but this week we did it. We’ve measured our first offline sales in Google Analytics, and we can directly attribute these to online campaign sources! Awesome right? In this two-part post series, we share the theory behind the system that gives us these insights, so you can setup a similar system yourself. This post describes the general system. The second post will discuss the actual code used in the system.

A story of two data sets

At its core, the system consists of two data sets: one that we capture online, and one that we get from our client’s offline sales team. Both data sets contain a customer identifier:

The customer identifier allows our system to connect the Google Analytics client id to the offline revenue. We collect the data in two ways:

  1. Online data collection for the Google Analytics client id and the customer identifier;
  2. Secure server to put offline sales data on.

These two systems work like this:

1: Online data collection

The online data is collected every time the user fills out a customer identifier:

Cross channel online data collection

A simplified view of the online data collection for cross-channel analysis.

As soon as we have a customer identifier on site, we get the Google Analytics client ID and send the two values to a database.

2: Offline sales data

The offline sales data is made available through a secure FTP server:

Cross channel offline data collection

A simplified view of the offline sales data server.

This data set is supplied to us on a daily basis.

The magic

The magic is in connecting these two data sets. The system is easy. Every day we run a Python script after the new offline sales data set becomes available. This system uses the online data as a lookup table for the Google Analytics client id. Based on this lookup, it fires a Google Analytics event through the measurement protocol:

Cross channel data connection

A simplified view of the script that connects online data to offline sales.

We format the event as follows:

  • Event category: Offline.
  • Event action: Purchase.
  • Event value: revenue.

Now we can use the event value to measure offline revenue in Google Analytics. We also set up a goal based on the event that uses the event value as goal value.

How the attribution works

This system fires the event to Google Analytics without campaign tagging. Therefore, the event will be attributed to direct traffic. Google Analytics applies a last non-direct click model. This means that Google uses the client ID to attribute direct visit to the last known source of that user over the last six months: the source of the user when the customer identifier was filled out online. Boom! Offline sales with online source attribution.

Things to keep in mind

The system is easy at its core, but there are things to keep in mind while building the system:

  • Depending on your type of customer identifier (broad or narrow), it may appear multiple times for different users. Determine how to handle these duplicate entries.
  • Find out what the maximum time is for a customer journey, from first interaction to purchase. If the time gap between the online data point and the offline sale exceeds this time window, ignore the event.

It’s a team effort

Creating an automated system that connects offline sales to online sources is a team effort. It requires commitment from the client, our data department, our developers, and the account team to align the client’s view with our project purpose. In the end, the result is awesome.

There will be a follow-up post to this post that handles the actual Python scripts that are used in this system.

Leave a Reply