A common requirement as part of any cloud IT project may involve surfacing data from on-premise systems into, for example, a cloud Customer Relationship Management (CRM) application such as Dynamics 365. Meeting this requirement is often easier said than done. Often - and quite rightly so - on-premise database systems will be segregated away from any internet-facing endpoints, protected from access via complex firewall rules. While this can serve the organisation well in security audits or as part of a penetration test, it can often cripple any endeavour to start embracing cloud solutions. As such, efforts to adopt exciting new tools like the Power Platform, become virtually impossible, with on-premise data often remaining in their silos and not actively utilised by the business.
Recognising this need early on, Microsoft has made available several tools that can significantly simplify the process involved in putting on-premise data to work more effectively, in the new cloud-first world. At the time of writing this post, there are three ways in which you can achieve this requirement, all of which involve the installation of an application of some kind within your on-premise network and minimal configuration on your Office 365 / Azure Active Directory (AAD) tenant. This state of affairs is all well and good, but how do you go about figuring out the correct tool to use? What are their benefits and disadvantages? And what are some the things to watch out for when working with them for the first time? Today, I wanted to address some of these questions, as we take a more in-depth look at the features and usage cases behind the following on-premise integration applications - the On-Premise Gateway, the Self-Hosted Integration Runtime and, finally, Hybrid Connections.
On-Premise Gateway
Overview
Perhaps best referred to as the Power Platform gateway, given that it almost exclusively targets the services within this “family”, this will be the gateway that Power BI developers will have the most familiarity with. Available as an online download, once installed, you can use it to target your on-premise data sources as if they were cloud-based, by merely specifying the connection properties you would use when working with them on-premise. Data is then traversed through the gateway as part of a secure tunnel, often requiring minimal changes to network or firewalls (you can run a network test using the tool at any time to determine whether a change is needed). You can deploy the on-premise gateway can be in one of two ways:
- The recommended mode, which is selected by default during installation. This deployment mode makes the gateway instantly available to everyone on your Office 365 / AAD tenant and gives you access to the full range of features available.
- In personal mode, the gateway is scoped and usable only by the account you authenticate with during setup. No other user can interact with the data sources you create, and you are responsible for managing it from within the Power BI portal. This gateway mode is useful for when you need to set up a personal development environment, targeting Power BI development only. I would not recommend to use it for any other purpose, as it can become challenging to maintain and lead to performance issues when deployed out to the user’s machine. This mode also does not support Direct Query mode in Power BI.
Other than that, there’s very little else to say about it, which is perhaps a testament to how easy it is to put in place and manage on an ongoing basis.
Advantages
- Compatible with multiple cloud applications, namely, Power BI, Power Apps, Power Automate, Logic Apps and Azure Analysis Services.
- Supports multiple data sources and, with some additional configuration, custom connectors as well.
- Lightweight installation client.
- Can be managed from Office 365.
Disadvantages
- To use the gateway alongside Logic Apps or Azure Analysis Services, some additional setup is needed, requiring an Azure subscription.
- Does not support R scripts within Power BI, if deploying the gateway in personal mode.
- Some data sources are not explicitly supported - you can refer to the following article for further details.
- Cannot be used by any other Azure service, other than Logic Apps and Azure Analysis Services.
Conclusions
The on-premise gateway is the natural solution to turn to if your needs are focused solely within the Power Platform range of apps. It provides a fast, scalable and - ultimately - secure mechanism to expose out your on-premise data for multiple purposes. This includes for analysis within Power BI/Azure Analysis Services, utilisation as part of a Power Apps or to help automate a complex business process via Power Automate flows or a Logic Apps. Where the solution begins to lose some of its potential usefulness is if you are wanting to implement a genuine Extract, Transform & Load (ETL) process involving your on-premise data sources. In this situation, the next solution will almost certainly tick your boxes…
Self-Hosted Integration Runtime
Overview
The self-hosted Integration Runtime (IUR) is a specific application bound tightly to Azure Data Factory (ADF), a solution that you can best think of as the natural, cloud-based evolution of SQL Server Integration Services (SSIS). As well as providing a mechanism for organisations to execute their ADF pipelines within their environment, they also allow you to make connections to any supported on-premise resources that the IUR machine can access. Installed via a similar client to the on-premise gateway, administrators register their self-hosted IUR to a specific ADF resource on Azure; this can then be shared to others on the same subscription if required. Managed in a similar way to the on-premise gateway, from within the ADF interface, they provide the quickest and securest means of processing on-premise data stores as part of your ETL processes.
Advantages
- Lets you manage and scale performance of your ADF pipeline runs.
- Supports a far greater list of data sources compared to the on-premise gateway - full details can be viewed on this Microsoft Docs article.
- Easy to install and fully manageable from within Azure Data Factory. For example, you can automatically push updates to your integration runtime without even needing to logon to the machine in question.
- Multiple IUR’s can be installed on a single machine, targeting different resources, and you can also share an IUR across one or several ADF resources.
Disadvantages
- Although possible in practice, Microsoft recommends not to install the IUR on the same machine as an on-premise gateway; in this scenario, it would be necessary to have two separate Windows machines available, which can lead to additional complexity/cost.
- Does not currently support the ability to execute a data flow on-premise.
- The IUR is limited to use within a Copy Data Activity only.
- Cannot be used with other Microsoft Azure / Power Platform services.
Conclusions
Targeted towards a specific usage case, the self-hosted IUR achieves its purpose remarkably well. It significantly reduces the barrier of adoption for ADF, following the same principals as the on-premise gateway when it comes to its deployment mechanism. As such, it really can be remarkably effective when building out a genuinely cloud-based ETL solution, that can leverage the additional benefits ADF can deliver - a subject which I have hammered on about on the blog plenty of times previously. However, it’s greatest strength is also its greatest weakness - as a solution tied so closely together with ADF, it is impossible to use with other Azure or Power Platform resources, such as Logic Apps or Azure Functions. This circumstance limits the runtime in several ways, and also seems somewhat baffling, when I understand that both the on-premise gateway and IUR share almost the same code base. Merging both of these tools in the future would, in my view, help organisations to more readily consider ADF as a possible solution and reduce the complexity of any solution involving elements of the Power Platform alongside ADF.
Hybrid Connections
Overview
The final type of gateway is technically not a gateway but termed as a connection. Despite this, it still involves the installation of an on-premise application, that you then register to Microsoft Azure in a similar way to the other tools discussed so far. Compatible with Azure App Service and Function apps, they provide a mechanism to access local network resources, targeting any potential environment. For example, a developer can connect to an on-premise SQL Server database using Hybrid Connections. Once configured and exposed, via the appropriate network address and port, the web apps connection string can then reference the local server name and port number as if it were being connected to locally, from the Hybrid Connection Manager machine. Due to its configuration options, developers have a high degree of control over which network resources they can interact with; provided that the appropriate destination and port number is contactable through Hybrid Connection manager, it is a valid endpoint for connections. As such, it represents the most powerful of all the gateway applications discussed so far but designed for meeting very bespoke requirements.
Advantages
- Lightweight and straightforward client installation.
- Provides highly granular control over the on-premise network resources to expose out.
- As a network-level solution, it is agnostic towards your chosen language or app technology, thereby increasing its potential usability.
- As a metered service, you only pay for what you use…
Disadvantages
- …but if you are transferring terabytes of data per month, could cost you anywhere in the region of £700+/month to maintain.
- It is limited in scope to only Azure App Services and Azure Functions.
- Requires specific configuration to expose the ports/local addresses for access online.
- Hybrid Connections do not support Windows authentication for data sources, such as SQL Server.
Conclusions
Hybrid Connections is the gateway of choice for bespoke application developers. Unlike all previous connectors, which work within the confines of existing solutions firmly targeting core Microsoft products, Hybrid Connections don’t care what your app is doing or even which language/framework it is written in. Instead, it provides a highly configurable, simplistic means of accessing any on-premise resource, to achieve almost any conceivable task - whether it’s updating a database table, obtaining the contents of a local file or firing an HTTP request to an internal web application. However, since it doesn’t support Windows authentication, it could prove challenging to implement if you are targeting services such as SQL Server. Also, it’s granular approach to configuration could make it challenging to implement, without some trial and error involved.
Final Thoughts
Hopefully, this post has clarified and dispelled any confusion you may have regarding options for getting your on-premise data working within the Microsoft cloud. As always, you should evaluate and determine the precise nature of your business requirement, and align yourself towards the tool that is going to be the easiest to deploy and maintain. In most cases, the answer to this will be to use the on-premise gateway, given that it is the tool with the highest “spread” of potential compatible applications. The other connectors are more evidently tailored to situations where a degree of bespoke development is required for your solution and, although equally as straightforward to configure, comes with the additional baggage and technical complexity involved whenever you consider building a bespoke solution.