This post originally appeared on Heartbeat.

Motivation

The handoff of labeled data from domain experts to machine learning teams is a critical step in machine learning model development. This interface oftentimes sits between two different business units and may be a source of friction because of divergent formats, undocumented conventions, or unshared context.

Fortunately, Labelbox can be used to eliminate this friction point and enable seamless end-to-end machine learning workflows that span all the way from label collection to model development and evaluation.

In this article, we demonstrate how machine learning teams can use Labelbox’s TFRecord export feature to seamlessly and automatically transfer data from a labeling interface directly into a TensorFlow model. Unlike many other solutions, users of Labelbox:

  • Can expect a standardized data format (documentation) that all labels will be automatically formatted into without any effort required from those providing labels
  • Are able to train TensorFlow models directly on data hosted by Labelbox without requiring the download of potentially large datasets

Using TFRecord exports for frictionless end-to-end workflows

This example uses Labelbox to train an image segmentation model for segmenting Tesla Model 3 cars from the rest of an image. The code for this example is available on Github.

Exporting TFrecords from Labelbox

Using Labelbox, we collect bounding box annotations which look like this:

Creating bounding box around Tesla Model 3 in Labelbox
An image from our dataset
The corresponding bounding box annotation collected using Labelbox

After performing a TFRecord export, Labelbox provides us with a link to an export.json file (example) with links to the exported TFRecords. Examining this file reveals:

{  "tfrecord_paths": [    
"gs://tfrecord-exports/cjjytosyoavre0734sdeshlpv-2018-07-24T17:46:34-33ff1e37ad3378fdea51a90e/cjjzzig4965ok07222hltiu3f.tfrecord",    "gs://tfrecord-exports/cjjytosyoavre0734sdeshlpv-2018-07-24T17:46:34-33ff1e37ad3378fdea51a90e/cjjzzn34ybgco0734jbbmrhce.tfrecord",    
...],  
"legend": {"Model 3": 1}}

There are two keys in this file: tfrecord_paths contains Google Cloud Storage URIs to the exported TFRecords (the full list is truncated for brevity) and legends contains a mapping between class labels and the pixel value representing that class within the exported segmentation maps.

Using the exported TFRecords in a dataset input pipeline

Leveraging the tf.data.TFRecordDataset API and Google Cloud Storage (GCS)filesystem support, we can quickly and succinctly specify a dataset input pipeline while uses the TFRecord exports Labelbox has stored on GCS:

import json
tfrecord_paths = json.load('./export.json')['tfrecord_paths']
test_set_size = math.ceil(0.2 * len(tfrecord_paths))
training_dataset = (tf.data.TFRecordDataset(tfrecord_paths)
  .skip(test_set_size)
  .map(_parse_tfrecord)
  .apply(tf.contrib.data.shuffle_and_repeat(50))
  .apply(tf.contrib.data.map_and_batch(_resize(512), 8)))

In this code snippet, we have defined a pipeline which:

  • Reads the export.json file (assumed to be in the same directory as this script), extracts the tfrecord_paths list, and initializes a TFRecordDatasetwith them
  • Skips the first 20% (test_set_size) of the data, which we’ll save for evaluating generalization performance
  • Maps the _parse_tfrecord function (described later) over each TFRecord to parse them into tf.train.Example
  • Uses shuffle_and_repeat to shuffle the data in batches of 50 and repeat it indefinitely
  • Uses map_and_batch to resize (using _resize, described later) all images to 512x512 and batch them into mini batches of size 8

The _parse_tfrecord function uses the schema documented here to deserialize tf.train.Examples from our TFRecords and decode the images into [width, height, 3] (three channels because these images have RGB colorspace) and labels into [width, height, 1] Tensors of float32s.

def _parse_tfrecord(serialized_example):
    example = tf.parse_single_example(
        serialized_example,
        features={
            'image/encoded': tf.FixedLenFeature([], tf.string),
            'image/filename': tf.FixedLenFeature([], tf.string),
            'image/ID': tf.FixedLenFeature([], tf.string),
            'image/format': tf.FixedLenFeature([], tf.string),
            'image/height': tf.FixedLenFeature([], tf.int64),
            'image/width': tf.FixedLenFeature([], tf.int64),
            'image/channels': tf.FixedLenFeature([], tf.int64),
            'image/colorspace': tf.FixedLenFeature([], tf.string),
            'image/segmentation/class/encoded': tf.FixedLenFeature([], tf.string),
            'image/segmentation/class/format': tf.FixedLenFeature([], tf.string),
            })
    image = tf.image.decode_image(example['image/encoded'])
    image.set_shape([None, None, 3])
    label = tf.image.decode_image(example['image/segmentation/class/encoded'])
    label.set_shape([None, None, 1])
    image_float = tf.to_float(image)
    label_float = tf.to_float(label)
    return (image_float, label_float)

The _resize function returns a function that uses bilinear interpolation to resize images and labels to have width and height both equal to image_dim(512 in our pipeline).

def _resize(image_dim):
    def _inner(images_orig, labels_orig):
        images = tf.image.resize_images(
                images=images_orig,
                size=[image_dim, image_dim],
                method=tf.image.ResizeMethod.BILINEAR)
        labels = tf.image.resize_images(
                images=labels_orig,
                size=[image_dim, image_dim],
                method=tf.image.ResizeMethod.BILINEAR)
        return (images, labels)
    return _inner

An iterator over training_dataset can then be used as an input node into a TensorFlow graph, allowing us to train our model directly on the data collected on Labelbox.

Training a 3-layer convolutional neural network on the input pipeline

We then train a 3-layer convolutional neural network where the final fully-connected layer is replaced with a convolution layer (a transformation sometimes called “convolutionalization” — see Figure 2 of Long 2014) to create a fully convolutional network capable of dense pixel-wise predictions. For simplicity, we avoid the use of pooling layers so that there’s no need for upsampling of predictions (e.g. “deconvolution” in Section 3.3 of Long 2014).

The model trains after ~20 minutes on a Nvidia Quadro P6000 (~120 iterations), after which we see a minimal decrease in loss on both training and test losses:

Looking at the predictions over time, we see the initial prediction is not great and includes significant background. However, after just a few (~30) iterations, the model becomes fairly refined and can separate the car from the remainder of the image. Not bad for a toy 3-layer CNN trained on just 50 bounding box annotations!

Conclusion

This wraps up our demonstration of how to build an end-to-end image segmentation model using TensorFlow and Labelbox. Using the newly introduced TFRecords export feature, we developed a car segmentation model which trained directly on labels saved within Labelbox. We hope you’ve found this demonstration helpful. Let us know how you’re using Labelbox to train your models!