The right data model for you

When you start a new Power BI report, you may run into some options that might confuse you. Maybe you already know what they are, but you don't really know when to use what? This is the blog post for you 🙂

Import

Import: The source database sends the data to Power BI

This option loads all the data into the Power BI report. This means that every calculation is performed by Power BI. If you save your report, you save all the data with it. This is something to keep in mind. All data is compressed to a factor around 10, but still, files might get big.

When publishing to www.powerbi.com, your data gets transferred to the cloud. Your reports / data models are limited to 1 gigabyte, and 10 gigabyte if you have premium. You will also need to refresh your data through Power BI. With a pro account you can refresh your data 8 times a day, 48 times if you have a premium license.

The ideal scenario for import is when you are a business user, creating self service BI reports. Typically you do not have a datawarehouse or your own Analysis Services cube, so Power BI is the work horse that has to do the heavy lifting, integrating multiple sources.

If you are a BI professional, import may be a good choice if you have a small data set, and setting up a datamart - analysis services chain just takes up more time than needed. You do get the fast in memory power Power BI (and Analysis Services) uses, so you normally can expect good performance.

Access to the data can be secured in www.powerbi.com. Be aware that if someone has access to the report file, they also have your data. And if user "A" creates the report based on a database, where he can see all data, and shares the report with user "B" who on that same database can only see a portion of that data, he will see all data because Power BI does not replicate that database security. You can set up row level security in the report if needed.

Direct query

Direct Query: data stays in source, only results get sent back

If you happen to have access to a database, it may be a good idea to connect using a direct query connection. Direct query means that once a report is shown, the data is queried directly at the source, giving you real time results. The data is not loaded into the report itself, meaning that the performance is determined by the ability at the source to run the queries. Power BI and Analysis Services have a fast in memory engine to make analysis fast. Be aware that not all source databases are that fast.

Since data is not loaded into the report, sharing the report file does not automatically mean sharing your data. The user opening the PBIX file will have to have access to the underlying database, or they will see nothing. When publishing the report to www.powerbi.com however it works a bit different. For the cloud to access your database, it will need a gateway if your database is not a Azure SQL database. The gateway uses a service account to query the database. That user will not impersonate the account viewing the report online, meaning that you may have to set up row level security in Power BI. The database security is not applied in this scenario.
So the user opening the pbix does not have access, but in www.powerbi.com, he can see the data no problem.

Microsoft says the following about this (https://docs.microsoft.com/en-us/power-bi/desktop-directquery-about):

When the data is imported, Power BI connects to the data source using the current user's credentials from Power BI Desktop, or the credentials defined as part of configuring scheduled refresh from the Power BI service. In publishing and sharing such a report, be careful to only share with users allowed to see the same data, or to define row-level security as part of the dataset. Ideally, because DirectQuery always queries the underlying source, this configuration would allow any security in that underlying source to be applied. However, currently Power BI always connects to the underlying source using the same credentials as would be used for import.

Until Power BI allows for the identity of the report consumer to pass through to the underlying source, DirectQuery offers no advantages for data source security.

Direct query may be a good choice if you need real time access to the source and your source database is optimized to allow for fast queries. Do note that direct query does not allow all DAX formulas.

Live connection

Live connection to Analysis Services


A third option is to connect to a Analysis Services model. We used to call this a cube, when the models were using the multidimensional model. This version of Analysis Services is not being used as much anymore. But the word "cube" is still recognized and used, even though the current models might not be modeled as star schema or snowflake.

In this example Power BI does not contain your data. It is kept in Analysis Services. The technology of Power BI import and Analysis Services is exactly the same. They both keep the data in memory for fast results. The difference here is that the data model is separated from the report. This is an ideal solution for IT departments delivering reports to the business, as they can create one model for a lot of reports. Analysis Services in Azure also gives the option to scale out, allowing much larger models than Power BI can handle natively.

You will need a gateway if Analysis Services is on premise. The gateway user will need full access to Analysis Services and it will impersonate the user visiting the report. This means that the report visitor needs to have read rights to the data model. This is a big difference with direct query.

Live connection is a great idea if you have a need for large models. Or if you need to create models with advanced tooling like Visual Studio. Integration with source control is also much more easy, as the source file of a model is xml, whereas a Power BI report is a zip file.

At last, there is the matter of the Analysis Services model itself. The Power BI data model may be in import mode or direct query. When you realize that Analysis Services is basically the data model of Power BI removed from the report, you may also realize that Analysis Services itself can be in direct query or import mode! Most models tend to be in import mode, so one does not often come across the last scenario.

Live connection to a data model in direct query mode

In this case, both query and results are passed through Analysis Services

Like I said in the previous section, the data model can also be in direct query mode. Data is only stored in the source database. So what is the difference between this scenario and Power BI in direct query mode? Well the biggest issue with direct query in Power BI is that the gateway does not impersonate the user opening the report. With Analysis Services this is not the case; the gateway user does impersonate the user, making it more easy to create a single place to define your security.

Analysis Services acts like a virtualization layer. Otherwise there are not much differences. Dax is also limited in Analysis Services direct query mode.

Have anything to add? Let me know!

Principal BI consultant at Rubicon

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next ArticlePower BI: total value above stacked column chart