Recently, few of my clients have been showing a keen interest on Lambda Architecture for big data applications. What was surprising was that lots of feedback came on how it is a new paradigm and would change how big data applications are designed.
Well, even though with SPARK evolution, building an Analytic application involving real time and batch processing is getting more popular, the architecture has been practiced over the years at least at IBM. While I personally feel that as SPARK evolves & becomes successful it would bring the real simplicity in deployment for Lambda Architecture and make it more popular.
I wanted to share my experience leveraging IBM Big Data Reference Architecture for applications involving both batch and real time processing. For a moment Replace Hadoop with a Data Warehouse database and Lambda architecture looks very familiar. Few common use case we have been deploying over several years has been in Telco where Infosphere Stream pick the incoming CDR(call details record), does the on the fly data conversion such as de-duplication and write the data into a warehouse and in parallel process the data for aggregation based on time, event etc. to generate real time views. Some of this real time analysis (views) was synch to data warehouse and you have the single Serving Layer.
Later with Hadoop popularity we evolved the architecture to dump the original data to Hadoop, while stream was processing it in parallel, writing some of the aggregated view to warehouse or Hadoop itself. With Infosphere Stream capability to directly read and write to HDFS as well as any database, the deployment is more simplified. This also ensured simplicity as the batch re-computing can be done at the stream layer itself without need to write separate code at Hadoop layer.
I hope this gives some perspective to existing Streams Developers that they have embraced the right technology and tools much ahead of time.
References:
Lambda Architecture Overview: http://lambda-architecture.net/
IBM Big Data Reference Architecture:
http://channelbigdata.com/video/reference-architecture-for-big-data-and-the-data-warehouse-part-2