Building an Image Classifier with Google Cloud AutoML Vision: Part 1 | Lucidchart Blog
Skip to main content

Have you been thinking about checking out Google’s new Cloud AutoML Vision service? Are you interested in what AutoML can do for you and where you should use it? Then read on!

This article is the first in a series about the Google Cloud AutoML Vision service. This series will explore AutoML Vision features and capabilities and how they can be used to build a working image classification system.

Before we jump into the process of building an image classification, I wanted to provide an introduction to the Google Cloud artificial intelligence (AI) family of services and an overview of the AutoML services, with special focus given to AutoML Vision, the topic of this series.

Let’s get started!

An introduction to Google Cloud AutoML

At the Google Next conference this year (2018), Dr. Fei Fei Li, Google’s chief AI/ML scientist, announced the beta release of Google Cloud AutoML Vision, previously available only in limited beta. In addition, Dr. Li announced that AutoML Vision had two new siblings, AutoML Natural Language and AutoML Translate. Both of these services would be included in the beta as well.

These three new services, the AutoML family, join an already impressive suite of AI services provided by Google and help to further Google’s goal of democratizing AI. Democratizing AI, according to Google, will make AI available to everyone, from developer to data scientist, regardless of skill or experience.

Google Cloud AI family

The AutoML family helps Google further this strategy by filling the gap between Cloud ML Engine and its close cousins, Cloud Vision, Cloud Natural Language, and Cloud Translate. AutoML will provide more options for developers and scientists to choose from, options that provide additional flexibility without requiring significant skill or experience to use.

Choosing when to use Cloud AutoML services

As an example of AutoML’s flexibility, consider the Cloud Vision service. Cloud Vision can detect objects in an image using a machine learning model trained by Google. It can accurately detect a wide variety of objects, scenes, and activities, but it does not allow other objects to be added to the model. Cloud Vision can also detect faces, text, and other image attributes.

Alternatively, Cloud AutoML Vision can detect objects in an image using a machine learning model that you train. Given a set of labeled images (labels that indicate the type of objects in the images), AutoML Vision will automatically build and host a custom model for you to use in your application. AutoML Vision provides you with the flexibility of defining your own object categories, without requiring any knowledge of the underlying machine learning implementation.

AutoML Natural Language and AutoML Translate are similar to AutoML Vision in this respect. They each provide flexibility in modeling without an accompanying skill or experience requirement.

Which member of the Google Cloud AI family you choose to use in your application is largely determined by the requirements of your particular use case and by the experience of your team:

when to use Cloud AutoML services

You can also use a combination of these services. For example, if you require custom object detection and text recognition, you can build a custom model using Cloud AutoML Vision for object detection and then use Cloud Vision for text detection.

Putting AutoML Vision through its paces

Google claims that Cloud AutoML “enables developers with limited machine learning expertise to train high-quality models.” It’s time to put that claim to test.

With AutoML Vision, you can build an object classification system by simply providing a set of categorized training images. AutoML takes care of the rest, offloading all of the grunt work associated with building an accurate, scalable machine learning system. This includes preparing the training images, provisioning the training systems, managing, monitoring and tuning the training process, and deploying and scaling the resulting machine learning model for you to use.

aircraft classification system

This week, I decided to build a simple aircraft classification system using AutoML to see how easy it actually was. The application I built detects and classifies aircraft found in images and successfully identifies most types of modern commercial airliners, such as Airbus A380s and Boeing 737s.

You can see the end results here and judge them for yourself: http://acml.jerryhargrove.com

More to come

Over the next couple of weeks, I’ll be sharing, in more detail, how I used Cloud AutoML to build my classification app, from image selection for training to using Cloud Storage and Cloud Functions to validate the accuracy of the model.

The second part in this series will include a tutorial on training an aircraft classification system with AutoML Vision. In particular, I’ll focus on collecting and curating training images in order to train an accurate classifier, and I’ll demonstrate how the quality and quantity of training images impact the accuracy of the resulting AutoML model.

The third part in this series will detail how to build a complete web application to validate and share the trained AutoML model with others. I’ll include instructions on how to use Google Cloud Storage to build a static website to host the web application and how to use Google Cloud Functions to act as a gateway to the AutoML model.

If you want to follow along and build your own image classifier with me over the next couple of weeks, here’s some homework to get you started: Start collecting and curating images now. Training material turns out to be the crux of building accurate classification systems using AutoML Vision, both in terms of quantity and quality.