As discussed recently on the blog, I have been on a journey to try and attain the Microsoft Certified Solutions Associate Certification in BI Reporting. I was very fortunate to overcome the final hurdle of this task by passing Exam 70-778: Analyzing and Visualizing Data with Microsoft Power BI the other day. I enjoyed the opportunity to dive deeper into the world of Business Intelligence, particularly given the enhanced role Power BI has within the Business Applications space today. With this in mind, and in the hopes of encouraging others, today’s post is the first in a new series of revision notes for Exam 70-778. I hope that you find this, and all future posts, useful as either a revision tool or as an introduction into the world of Power BI.
The first skill area of the exam is all around how to Import from data sources, as described on the exam specification:
Connect to and import from databases, files, and folders; connect to Microsoft SQL Azure, Big Data, SQL Server Analysis Services (SSAS), and Power Query; import supported file types; import from other Excel workbooks; link to data from other sources
To begin with, I will provide a detailed overview of the topic areas covered above, before jumping into an example of how to import data into Power BI.
Supported Data Sources
The great benefit of Power BI is its huge list of supported connectors, which are integrated neatly within the application itself. The list of all possible data sources changes on a monthly basis, and it is impossible to go into detail on each one. Suffice to say; you should at least be familiar with the following data sources:
- SQL Server (on-premise & Azure)
- SQL Server Analysis Services
- A wide range of vendor-specific Relational Database Management Systems (RDBMS’s), such as Oracle, MySQL, PostgreSQL, SAP Hana
- Any data source that supports Open Database Connectivity (ODBC) or Object Linking and Embedding, Database (OLEDB).
- The following flat file types:
- Excel (.xlsx)
- Text (.txt)
- Comma Separated Value documents (.csv)
- Extensible Markup Language (.xml)
- Web sources, such as Web pages or OData Feeds
Some RDBMS vendor solutions have a requirement to install additional software, which will enable you to interact with that particular data source. You should check the relevant documentation for each vendor to verify any specific requirements.
Power BI also supports a wide range of Microsoft proprietary and non-proprietary applications, such as Dynamics 365 Customer Engagement, SharePoint, Google Analytics & SalesForce. If you are feeling particularly technical, then you can also use the Blank Query option to, in theory, connect to any data source of your choosing or even go as far as building custom connectors yourself to interact with a specific application.
Bulk File Loading
As well as supporting connections to single flat files, it is also possible to interact with multiple files existing in the same location. This feature can be useful if, for example, there is a requirement to process hundreds of .csv files with different data, but the same overall structure. The supported list of bulk file locations are:
- Windows file system folder
- SharePoint document folder
- Azure Blob Storage
- Azure Data Lake Storage
When loading multiple files into Power BI, you not only can read the contents of each file but can also access file-level metadata, as indicated below:
Import vs DirectQuery
An important design decision when working with data sources concerns the data connectivity mode to be used. Your final choice will generally fall into one of two options:
- Import: When connecting to your data source, Power BI takes a copy of all data and stores it within your report. By implication, this places additional pressure on your local machines disk space and memory consumption. Import is the default option for most data sources and, to ensure that your data remains consistently up to date when deployed to the Power BI service, you have the opportunity of defining your data refresh frequency – 8 times a day for Power BI Professional and 48 times a day for Power BI Premium subscriptions. Import is the most sensible option to choose when there is no requirement for regular refreshing of your data sources or if performance concerns arise when using…
- DirectQuery: Instead of taking a snapshot of the data, Power BI will read the data at source and store only the schema of the data within the model. At the time of writing this post, only a select number of mostly SQL based data sources are compatible with this feature. DirectQuery is your best choice when there is a need to keep reports continually up to date, and when your target data source is sufficiently beefed up to handle frequent requests. It’s also worth bearing in mind the following points when evaluating DirectQuery:
DirectQuery only supports a single data source connection for the entire model, with no option of defining additional sources.While traditionally true, the release of composite models for DirectQuery removes this much-loathed limitation.
- There are limitations when it comes to data transformation options, especially for non-Microsoft data sources.
- Some query types will be unsupported.
- For data modelling using DAX, there are some crucial limitations. For example, Measures that use the SUMX & PATH functions (or their related counterparts) are not allowed.
You should also be aware of a third option – Live Connection – which behaves similar to DirectQuery but is for SQL Server Analysis Services only. This option has the following limitations:
- Not possible to define relationships
- No possibility to transform data from within Power BI.
- Data modelling options, except for Measure creation, are almost non-existent.
Importing Excel Workbooks
There are some aspects of working with Excel documents in Power BI that are worth further consideration. You mostly have two options at your disposal to consume Excel workbooks:
- Import Data: Similar to working with any other flat file source, data within each of the following Excel objects is importable into Power BI:
- You can see below how this looks for a file containing four worksheets:
- Import workbook contents: If you have built out a complex spreadsheet that utilises the full range of features available in the Excel Data Model, then it is also possible to import these into Power BI “as-is”. The following Excel Data Model features are exportable in this manner:
- Power Query queries
- Power Pivot Data Models
- Power View Worksheets
- (Most) Power View visuals; where a visual is unsupported in Power BI, an error appears on the appropriate visual.
Example: Importing SQL Server Database Data
What follows now is a typical data connection exercise in Power BI Desktop, which involves connecting to a SQL Server database. The experience described here is mostly similar for other data sources and, therefore, represents an optimal example to familiarise yourself with connecting to data sources in the application:
- Launch Power Bi Desktop and, on the splash screen, select the Get data link on the left-hand pane:
- On the Get Data window, choose Database on the left-hand list, select SQL Server database and then press the Connect button:
- You will now be prompted to provide the following details:
- Server: This will be either the Fully Qualified Domain Name (FQDN) of the computer with a default SQL Server instance or the computer name and named instance name (e.g. MYSQLSERVER/MYSQLSERVERINSTANCE). In the example below, we are connecting to a default SQL Server instance on the computer JG-LAB-SQL
- Database: If you already know the name of the database you want to access, you can type this here; otherwise, leave blank. In this example, we are connecting to the WideWorldImporters sample database.
- Data Connectivity mode: See the section Import vs DirectQuery above for further details. For this example, select the Import setting:
- There are also several additional options that are definable in the Advanced options area:
- Command timeout in minutes: Tells Power BI how long to wait before throwing an error due to connection issues.
- SQL statement: Specify here a pre-compiled SQL statement that will return the objects/datasets that you require. This option can be useful if you wish to reduce the complexity of your model within Power BI or if there is a requirement to return data from a stored procedure.
- Include relationship columns: Enabling this setting will return a single column for each defined relationship which, when expanded, gives you the ability to add related column fields onto your table object.
- Navigate using full hierarchy: Enabling this will allow you to navigate through the database hierarchy using schema object names. In most cases, this should remain disabled, unless there a specified schema names in your dataset (like Application, Sales, Purchasing etc. in the WideWorldImporters database).
- Enable SQL Server Failover support: If enabled, then Power BI will take advantage of any failover capability setup on your SQL Server instance, re-routing requests to the appropriate location where necessary.
- Illustrated below are some example settings for all of the above. For this walkthrough, leave all of these fields blank and then press OK to continue.
- The Navigator window will appear, which will enable you to select the Tables or Views that you wish to work within the model. Selecting any of the objects listed will load a preview in the right-hand window, allowing you to see a “sneak peek” of the schema and the underlying data. Tick the object Sales.CustomerTransactions and then press the Select Related Tables button; all other Tables that have a relationship with the Sales.CustomerTransactions are then automatically included. Press Load when you are ready to import all selected table objects into Power BI.
- After a few moments, the Load window will appear and update accordingly as each table object gets processed by Power BI (exact times may vary, depending on the remote server/local machines capabilities). Eventually, when the window closes, you will see on the right-hand pane that all table objects have been loaded into Power BI and are ready to use for building out visualizations:
- At this stage, you would then look at loading up your imported objects into Power Query for fine-tuning. But that’s a topic for the next post 🙂
- Power BI supports a broad range of database systems, flat file, folder, application and custom data sources. While it is impossible to memorise each data source, you should at least broadly familiarise yourself with the different types at our disposal.
- A crucial decision for many data sources relates to the choice of either Importing a data source in its entirety or in taking advantage of DirectQuery functionality instead (if available). Both routes have their own defined set of benefits and disadvantages. DirectQuery is worth consideration if there is a need to keep data regularly refreshed and you have no requirement to work with multiple data sources as part of your solution.
- Live Connection is a specific data connectivity option available for SQL Server Analysis Services. It behaves similarly to DirectQuery.
- It is possible to import an existing Excel BI solution into Power BI with minimal effort, alongside the ability to import standard worksheet data in the same manner as other flat file types.