As you might've noticed, I'm keeping an eye on Continuous Integration for Business Intelligence. Currently, I'm looking for a grad student to help me implement CI inside BI in his/her internship. As we're located in Holland, speaking Dutch is a prerequisite - but as a bonus, you'll be able to share your findings in English on msbiblog.com. The rest of this post will be in Dutch.
Case: we've integrated two sources of customers. We want to add a third source.
Q: How do we at the same time know that our current integration and solutions will continue to work while at the same time integrating the new sources?
A: Test it.
Q: How do we get faster deployments and more stability?
A: Automate the tests, so they can run continuously.
When integrating data, especially in agile environments, our already-integrated data is very likely to get some more integration. So WHY does automated testing happen so rarely within Data Warehouse projects?
Recently I saw this post from Davide Mauri (basically he announced his session "Reference Big Data Lambda Architecture in Azure" at SQL Nexus). Although I would've loved to attend SQL Nexus, I'm not able to do so this year 🙁 . The next best thing I could do was order and read Marz & Warren's "Big Data" (Manning) - and I did so immediately. Boy, what a read! Especially the first few chapters: well-written, concise, more data- than business-oriented (which is okay - after all we're talking about architectures not BI methodologies), and explains the Lambda Architecture really well.
All this raised a question within me: Is the - primarily Big Data oriented - Lambda Architecture suitable for Data Warehouses too? A quest with a rather surprising outcome, IMHO - read on for the details!