Choosing the right tool

I once heard someone say:

if you use SSIS to do ETL, you are just not a good enough programmer in SQL.

Needless to say I had some thoughts about that! We had a discussion about ETL and what tool is the most suited for that task. In the end it came down to the fact that he was better at coding SQL.

There are a lot of reasons why you may prefer one tool over the other. But if your main reason is "I am better experienced with tool A than B", then I wonder if that is a good reason enough on it's own.

I prefer to first look at the right tool for the job, regardless of who is going to do it eventually.
Sure, if tool A is better for the job than tool B - and no one can learn tool A in time, and tool B does the work just fine - then sure, go for it. But the point is: be aware why  you chose the tool to do the job. You shouldn't be blinded by your own preference. When I ask why a tool was chosen, I often get this reason.

So the original question was: which is better? SSIS or SQL for ETL? Here's my two(3) cents:

  1. regardless of the decision, use SSIS to fire off SQL if you choose SQL over dataflows. You keep the visual overview and the ability to run stored procedures in parallel.
  2. loading dimensions parallel is a sure time saver. Not so easy to do in a CSP.
  3. It is much easier to change a source or a destination connection string in a SSIS project.

 

 

 

Principal BI consultant at Rubicon

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 Comments

  1. My usual reaction:
    picture the following use case. You need to fetch daily weather forecast data from a web service. On this data, you need to do some regex expressions to clean the data up.

    How are you going to do this in SQL?

  2. Jesse Gorter

    Jesse Gorter

    nice reaction, going to use that 🙂

    Not quite sure if you can set up a linked server between on premise SQL and Azure SQL db? If that's not possible then you might have to alter some stored procedures too if you move one layer to the cloud.

  3. First off all, nice to see you guys started a blog!

    Of course the answer to this question is "it depends". In my opinion you should use them both for the tasks they're good at. Another one to add to your 3 cents:
    - Join data sets in SQL Server rather than SSIS. SQL Server is extremely efficient (compared to SSIS) when it comes to data set joining and data filtering. Especially when you're working with large data sets, the difference in performance is significant.

Next ArticleA drive with Cortana