Cloud Dataflow“My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware — not the traditional hardware first mentality.” – Tim Mattson

I wanted to start with this eloquent quote from Tim Mattson to highlight today’s most important need generated by Big Data and IoT: an efficient large data processing model. Parallel computing technologies are providing tangible answers to such problems, but are still quite complex and difficult to use.

On June 25, 2014, Google announced Cloud Dataflow, a large-scale data processing Cloud solution. The technology is based on a highly efficient and popular model, used internally at Google, which evolved from MapReduce and successor technologies such as Flume and MillWheel. You can use Dataflow to solve “Embarrassingly Parallel” data processing problems – the ones, which can be easily decomposed into “bundles” for processing simultaneously.

Fortunately, I had the chance to be whitelisted for the Dataflow private alpha release and was, therefore, able to put my hands on this very new Google Cloud Service. I will not keep the suspense up any longer: I have been really impressed by the solution and am pleased to share five reasons why every business, concerned with large data processing, should at least try out the Google Cloud Dataflow solution once:

1 – Simple: Dataflow has a unified programming model, relying on three simple components:

Cloud Dataflow program

A- Pipelines – Independent entity reading, transforming and producing data
B – Pipelines data – Input date that will be read by the pipeline
C – Pipelines transformations – Operations to perform on any given data

Building a powerful pipeline, processing 100 millions of records, can be done within 6 lines of code.

2 – Efficient: I am used to performing computing on large data sets. The program was generating statistics pertaining to more than 56 millions records and took 7 to 8 hours on a 12 cores machine. Whereas, it took about 30 minutes with DataFlow. It is also very important to notice that Dataflow provides a real-time streaming feature, allowing developers to process data on the fly.

3 – Open Source: Dataflow consists of two major components: an SDK and a fully managed Cloud platform, managing resources and jobs for you. The Dataflow SDK is an open source & is available on GitHub; this means, a community can quickly grow around it. It also means that you can easily extend the solution to meet your requirements.

4 – Integrated: Dataflow can natively take inputs and provide outputs from/to different locations: Cloud Storage, BigQuery & Pub/Sub. This integration can really speed up and facilitate adoption if you are already relying on these technologies.

5 – Documented: The Dataflow documentation website is well furnished with theories and examples. If you are familiar with the Pipelines concept, you should then be able to run your customized pipeline in less than an hour. Many valuable examples are also provided on the SDK GitHub page.

Dataflow is in alpha, but is already promising to be a strong player in the world of Cloud parallel computing.

Stay tuned for more posts on this topic.

To read the original post and add comments, please visit the SogetiLabs blog: Google DataFlow: Get rid of your embarrassingly parallel problems

Related Posts:

  1. How do Facebook and Google handle privacy and security?
  2. Google+ is dead, long live Google+!
  3. Is Google God?
  4. EAZE introduces “Nod To Pay” service combining Go

Jean-Baptiste Clion AUTHOR:
Jean-Baptiste Clion is Google Practice Technical Lead for Sogeti Switzerland in Basel since 2013. In this position, he is in charge of all technical aspects regarding Google activities. This role demands strong research and innovation skills in order to design and develop cloud solutions matching customers’ requirements.

Posted in: Big data, Cloud, integration tests, Internet of Things, Transformation      
Comments: 0
Tags: , , , , , , ,


googleCardboardTuesday 25th June 2014 – 10:55 AM PDT, Level 3 – Room 8 – Moscone Center, San Francisco.

2014 Google I/O Keynotes is about to end: world famous investors, innovators, and technical journalists joined by millions of Youtubers are carefully listening to Sindai Pichai (Google Senior Vice President Android, Chrome, and Apps) introduce a mysterious thing called Cardboard. If there was ever a time when virtual reality was reserved for the wealthy and privileged elite, then at this moment it just came to an end.

On Friday 25th July 2014, exactly one month later, I was finally able to put my hands on this enigmatic item. One burning question immediately came to my mind, how could this simple piece of cardboard change the very closed world of VR (Virtual Reality)? Let’s explore Cardboard and figure out how this small item could have an impact on business life.

For those of you that haven’t heard of Google Cardboard, or what it does, here’s a quick overview of what it is and what you can physically do with it.

Your first reaction might be: “Wahoo! I want one!”, followed quickly by: “Hey, wait a minute, is that a smartphone there?” and the answer is: “Kind of”. Google Cardboard is not self-sustained – the smartphone is not even the most important piece of it. If you really want to play with it you better have one of these devices: Google Nexus 4 and 5, Motorola Moto X, Samsung Galaxy S4 and S5, or Samsung Galaxy Nexus. HTC One, Motorola Moto G, and Samsung Galaxy S3 are partially compatible. It will not work with any other device. So if you do not have such equipment, but are still keen to play around with Cardboard then now is the time for you to acquire it.

Once you have overcome the device issue, you are now almost ready for your first Cardboard experience. Almost? Yes. To make the best of it, Google has developed the #cardboard app available on Google Play. You might then be interested in knowing how to use your new gadget – Google has made this step really simple for users with two actions:

-> Validate: Pull the down the small ring on the left side of the box

-> Go back to home screen: Simply rotate your device  90°

That’s all you need to know. So app installed and started? Then place your device in the Cardboard and your first Google virtual reality encounter begins.


Trying Cardboard for the first time has pretty much always the same effect; “the Wahoo” effect. Your first contact with this new world will then be a menu of 7 apps:

– Tour Guide: Visit Versailles with your personal tour guide.

– Exhibit: Discover & inspect cultural items.

– Windy Day: Funny interactive animation, best for kids.

– Earth: Become superman and fly all over the world!

– YouTube: Welcome to the next generation of the famous video sharing platform.

– Photo Sphere: Jump inside your own pictures (Photosphere).

– Street Vue: Want to visit Paris on a sunny day?


Okay, we get it. Google Cardboard is cheap, fun, and ingenious, but what about next steps, what about business application? Well, based on private and business feedback here are the potential applications:

Education: Visiting historical places, visualising molecular structures, exploring architectural blueprints, etc. Education is really the most promising sector for such technology. The cheap price is also a strong asset.

Property business: Want to rent our your house or flat? Then just build a virtual tour of your place and publish it on the web. Your prospect will then be able to visit their next living room without moving from their current sofa.

Corporate Presentation: Want to show that your company is on the leading edge of innovation? Then build a presentation of your business activities on Google Cardboard This is sure to impress customers and investors.

Video Games: This industry has always tried to improve immersion in gaming. Cardboard is able to offer one of most exciting ways to play – you are just simply in the game.

Jean-Baptiste Clion AUTHOR:
Jean-Baptiste Clion is Google Practice Technical Lead for Sogeti Switzerland in Basel since 2013. In this position, he is in charge of all technical aspects regarding Google activities. This role demands strong research and innovation skills in order to design and develop cloud solutions matching customers’ requirements.

Posted in: 4G, Innovation, mobile applications, mobile testing, Mobility, User Experience, Virtualisation      
Comments: 0