Connecting offline to online is a challenge, but this week we did it. We’ve measured our first offline sales in Google Analytics, and we can directly attribute these to online campaign sources! …. This post describes the general system. The second post will discuss the actual code used in the system.

The text mentioned above is a recap of the first part in this series. If you missed part 1 in some mysterious way, catch up by reading the full post.

In this post we’ll dive into our solution’s technical details, describe the components used / needed and challenges you will encounter. We’ll throw in some code snippets as well!

What’s happening

  • It all starts with data collection. Let’s say your client owns a car dealership and a website for requesting a test drive. You’ll want to grab the email address from the form and store it, together with the GA Client ID.

  • Then, every night, the car dealership shares the day’s sales data. You’ll match the email addresses from the orders with your online data (the GA Client ID) and update your records (revenue). TADA!! You’ve combined online and offline data.

  • The last step is sending this data to our GA account for reporting, and we’ll have all the valuable information GA has to offer, for our offline sales!

What you’ll get

You’ll be able to see what the offline generated revenue is for your online channels like organic / paid / social. Did this Facebook campaign, aimed at a particular audience do better than another? You might sell 10 cars a month by running your Adwords campaign you didn’t know about. Or not? You’ll know now! And that’s what we do at the Greenhouse Group we don’t guess, we know.


Technical Details

Components

  • JavaScript for data collection
  • Web Application for storing the data
    *
  • Sales Parser (CSV, XML)
  • GA connector
  • Database
  • SFTP Server
1.1 JavaScript

Code a script for grabbing the form data and send it to your web application right before submitting. To keep things clean, use a tag manager to implement the script on the website.

1.2 Web Application
  • NodeJS / Express
  • MongoDB + Mongoose

Your application needs one endpoint where the JavaScript code can posts the data to:
https://sub.example.com?client_id=1234567.1234567&custom_identifier=elon@musk.com

Because of the same-origin policy, it’s not possible to send data from domain A to domain B using javascript. That’s a good thing, but is that problem for us? No, we serve our application from a subdomain (sub.example.com) and set the following header Access-Control-Allow-Origin: http://www.example.com. Now we are able to post data from www.example.com to sub.example.com Fixed. For ExpressJS:

app.use(function(req, res, next) {
      res.header("Access-Control-Allow-Origin", 'http://www.example.com');
      res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept");
      next();
  });
  

Example Mongoose model:

var schema = {
    email: {
        type: String,
        required: true,
    },
    cid: {
        type: String,
        required: true
    },
    domain: {
        type: String,
        required: true
    },
    timestamp: {
        type: Date,
        default: Date.now,
        required: true
    },
    saleDate: {
        type: Date,
        default: undefined
    },
    saleRevenue: {
        type: Number,
        default: 0
    },
    sales: {
        type: Number,
        default: 0
    }
  };
  module.exports = schema;`
  

Recommendations:

  • You need one running application per implementation because of the Access-Control-Allow-Origin header.
  • Validate requests using the packages express-validation and Joi.
  • Please use the https protocol for obvious reasons.
  • Log all request so you always have the raw data.
  • I like to remove the powered by head app.disable('x-powered-by');
  • Using nginx as a proxy you’ll be able to run multiple ExpressJS servers on port 80 using one webserver. Please give a shout if you need more info!
1.3 Sales Parser

For parsing the daily order data a simple Python script can be used, it:

  • grabs the CSV / XML from a SFTP server;
  • updates the records in our database;
  • use the GA measurement protocol to send the combined data to GA.

Use pysftp to connect to a SFTP server. You can, of course, use other data sharing solutions. Dropbox / Drive come in mind, but we use SFTP.

# downloading files and remove them
      with pysftp.Connection(host, username, password, cnopts=cnopts) as sftp:
          sftp.get_d('upload/', FILE_DIR)
          for file in sftp.listdir('upload/'):
              sftp.remove(os.path.join('upload', file))`
  

Our clients supply us with CSV files. As for the horrors of python encoding issues, I won’t get into that now. In the end we used a custom CsvDictreader:

def Latin1DictReader(latin1_data, **kwargs):
      csv_reader = csv.DictReader(latin1_data, **kwargs)
      for row in csv_reader:
          try:
              yield {key.decode('latin-1'): value.decode('latin-1') for key, value in row.iteritems()}
          except:
              yield False`
  

The following snippet is executed for every record in the CSV file. It checks if there’s a match in our database, updates the record and posts the data to GA. We set a minimum timestamp to exclude submitted forms older than X days.

# get record from db by email
  sale = Sale.objects(email=email,timestamp__gte=min_timetamp).order_by('-timestamp').first()

  if sale:

      # check if new(er) sale
      if sale.saleDate is None or sale.saleDate < sale_date:

          # update sale in database
          sale.update(**{
              "set__saleDate":    sale_date,
              "set__saleRevenue": sale_revenue,
              "inc__sales":       1,
              })

          # reload doc
          sale.reload()

          # calculate delta date
          sale_timestamp = sale.timestamp.replace(hour=0, minute=0, second=0, microsecond=0, tzinfo=None)
          delta_days = (sale_date - sale_timestamp).days

          # call ga using the measurement protocol
          url = 'https://ssl.google-analytics.com/collect?v=1&tid=%s&z=%d&cid=%s&t=event&ec=Offline&ea=Purchase&cd5=%d&el=%d&ev=%d&dh=%s' % (
                  config['ga']['prop'],
                  time.time(),
                  sale.cid,
                  delta_days,
                  int(sale.saleRevenue),
                  int(sale.saleRevenue),
                  'www.example.com'
              )

          headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36' }

          r = requests.get(url, headers=headers)

          logging.info("%d\t%s" % (r.status_code, url))`
  

Run your python script every night / morning:
0 5 * * * /usr/bin/python sync.py


That’s it!

Thanks for sticking with me till the end. The above instructions should be sufficient to setup your own online / offline measurement solution! If you need more details / help or got some feedback, please drop us a note below!

Leave a Reply