Skip to main content

Google’s Cloud Vision API brings deeper understanding to apps, robots and drones


Google is at the forefront of machine learning, and has already brought some of its AI-powered technology to apps like Gmail and Search. It’s also keen to get its tools in to the hands of developers and recently made Tensorflow machine open source. As part of that focus on giving developers the resources, it’s also launched the Cloud Vision API, giving devs the ability to build apps (and robots) which recognize objects and facial expressions, then respond to them…

There are several specific tools developers can use to make their apps smarter, as detailed in the official Google blog post:

  • Label/Entity Detection picks out the dominant entity (e.g., a car, a cat) within an image, from a broad set of object categories. You can use the API to easily build metadata on your image catalog, enabling new scenarios like image based searches or recommendations.
  • Optical Character Recognition to retrieve text from an image. Cloud Vision API provides automatic language identification, and supports a wide variety of languages.
  • Safe Search Detection to detect inappropriate content within your image. Powered by Google SafeSearch, the feature enables you to easily moderate crowd-sourced content.
  • Facial Detection can detect when a face appears in photos, along with associated facial features such as eye, nose and mouth placement, and likelihood of over 8 attributes like joy and sorrow. We don’t support facial recognition and we don’t store facial detection information on any Google server.
  • Landmark Detection to identify popular natural and manmade structures, along with the associated latitude and longitude of the landmark.
  • Logo Detection to identify product logos within an image. Cloud Vision API returns the identified product brand logo, with the associated bounding polybox.

Of course, detection and recognition is just one stage of developing tools. Using the APIs, developers can then teach various pieces of software to respond to these indicators with pre-programmed actions. And it’s not just onscreen software, it can be software used to control a home-made robot, like the Raspberry Pi demo robot.

In the video below, Google shows off how the robot can recognize objects and say what they are, or move towards someone when they smile, because it recognizes expressions. This is just a basic demo, but the imagination runs wild with what could be achieved in the future. Personally, I’m thinking of a Baymax-style inflatable robot that helps take care of people, but that’s just because I watch too many kids movies.


Big companies are using it already, one of which is AeroSense — a Sony Mobile subsidiary — which uses the API to organize photos taken by its drones. If you want to sign up to join the Limited Preview, you can do so here.

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel