Exact Median calculation in Impala

If you have found this post, you have probably discovered that Cloudera’s Impala, Hive or Apache Spark, is lacking a bit of the out-of-the-box support for calculating the exact Median of a column.

Unfortunately, Impala only offers a function that calculated the approximation of the Median. This is done use the APPX_MEDIAN function. In Hive, you can calculate the exact

My experience from the Google Advanced Solution Lab (ASL) for Machine Learning

What is the ASL for AI/Machine Learning?

At the end of last year, I was in the fortunate opportunity to join the people of Google in their EU HQ in Dublin. I spend 4 weeks at their Advanced Solution Lab (abbreviated with ASL) to be completely immersed into the field of Artificial Intelligence and Machine Learning. This was one of

Why there’s no time to wait to start protecting the AI’s mind

This is a personal copy of my Atos Ascent Blog post from:
The importance and popularity of artificial intelligence (AI) has seen a great rise in recent years. However, the successful creation and application of AI models require significant investment to get it right and to harvest the

Analysis design: A BIG question for Big Data

Nowadays more companies are testing out new Big Data concepts. During their experimental efforts, questions about how to approach these various types analyses may arise. In my work as a data scientist, I noticed an important question that influences the way the analysis process is set-up. This question also determines how the Data Analytics environment would look like.

Going beyond