Azure Logic Apps and Copying Files to Azure Blob Storage

I've been reviewing different approaches to managing a file transfer task for my enterprise.  A hardware device will automatically produce pdf business documents (lets say they are invoices for the purpose of the proof of concept), and these are being saved into a network share.  The business requires these be available to our Azure hosted application.  The application knows what to do with them once they arrive in a staging area on Azure Blob Storage, so the solution simply has to identify when new invoices have been received, move them to blob storage, and then delete them from the network share.

On first investigation there are quite a few avenues that I can go down.
  1. Stand up a Azure BizTalk server.  This can easily handle file transports and if it can't write to blob storage directly I'm pretty sure it can trigger a logic app which would.
  2. Use an Azure Data Factory copy activity.  This looks like it will do it, but I may have issues with removing the files from the source after successfully writing them to blob storage.
  3. Use a Logic App in conjunction with an on-premise data gateway to process newly created files, save them to blob storage and clean up the source directory afterwards.
  4. Use a Logic App in conjunction with an FTP server as long as I can get the hardware to output to a directory on the FTP server that will be accessed by the logic app.
  5. Just write a service to listen to the directory and use the Azure storage SDK to write the file, then have it tidy up afterwards.
Not surprisingly since we are talking about a hybrid networking scenario, the first 4 approaches involve providing tin or IaaS vm's in some shape or form.
  • Option 1 would require a full BizTalk license to run on physical tin, or a IaaS BizTalk license on a VM. 
  • Options 2 & 3 require a data gateway, 
  • Option 4 I can avoid the data gateway, but I am now standing up a FTP server. 
  • Option 5 is one of those things where, as a developer, you know you could just do that (in about 2 hours) but with such focus on PaaS services these days I'm not sure a happy path quick win is robust enough, plus I still have to find resilience to host anything we write.
I'd love to use BizTalk, but I can't spend that on a license, and I am working under the constraint of preferring PaaS solutions, so expanding my IaaS VM's with a complex platform like BizTalk is probably not something I will pursue.  ADF is something I like, it's configuration based and if it had out the box ability to delete the files I would likely look at it further, but for now I'm discounting it.

That leaves me with the Logic Apps solutions; and a decision to either open up a FTP server or stand up a small VM as a data gateway.   Since I don't have a FTP server handy, I'll go with option 3 - the network topology includes express route, so my IaaS vNet is well connected to on-premise sub-nets at-least in terms of speed.  And I won't have to manage FTP accounts or infrastructure considerations with this option. So enough talking already, lets get to work here.

Implementing the Logic App

What I have to do is provision a logic app, a storage account, two api connectors and one on-premise data gateway.

So working backwards, the gateway is not dissimilar to the type of gateway I configured in this post - but as with many things in  Azure it's not quite that simple.  That gateway is Azure Data Factory specific so cannot be used here.  A more generic data gateway exists and this will need to be installed.  I won't walk through the entire installation - you can read that here.  A successful installation will look like this

Now I can go back to the portal and start standing up Azure services.  So lets start by adding a storage account first, I configured this to be of type general and to be locally redundant (LRS).



Next I want to add my data gateway. So search like below to find the correct resource.



I will point out that the installation will by default have added the gateway to your subscription region unless you tell it otherwise at the appropriate step.  This tripped me up as when I wanted to add it on the Azure side, I was looking in the wrong region. I'll name it the same as the server.


As long you track down the installation location, your on-premise data gateway will be added to the resource group and ready for use in the Logic App.


The two API connectors will handle authentication between the logic app connector and the file system, and the logic app connector and blob storage.  You can think of these like bindings you would would find in BizTalk, where the logic app connector is akin to the adapter.  You can add these manually before configuring the logic app, or you can do it when prompted by the logic app designer.  I left it to the logic app designer.





So lets look at the logic app.  It took about 20 mins to muddle through this.  It will use a combination of File System and Blob Storage connectors.


This is fairly easy to understand,

  1. In the first step, it will look in the root of the directory I configured the FileSystem API connector with - if I had sub directories these would be available here but I don't in this scenario.  it will do this every 3 minutes and pick up any new files that have been added.
  2. In the second step, the path to the new file has been made available by the logic app as dynamic data, this means I can use it in this step to specify which file to grab.  This is significant as there is a difference between integrating an event relating to a file, and integrating the file content.  Since we want to copy the file I need to specifically tell the file connector to go and get its content.
  3. In step 3 I will write out the content to blob storage.  I had to select a container in this action - so make sure you have created the container first.  Next I was able to dynamically name the file, plus prefix it with a virtual directory of Accounts/Invoices/.  If you are familiar with blob containers, you will know that there is no directory structure,  but you can create the effect of a directory like structure by prefixing a path to the file name.
  4. Lastly in step 4 I need to go back to my file system connector and select the delete file action, supplying the path.

If it has all worked well, drop a few files into the start directory to test, and within a few minutes the files will appear on Azure storage like here


There were no issues with processing multiple files, and no need to handle debatching in the logic app itself.  The log shows that each file addition was treated as an individual trigger.  I dropped the last 2 in together at the same time, and 2 triggers were recorded.


All in from building the vm for the data gateway it took 90 minutes to have the solution ready, and that included learning on the fly and making some mistakes along the way.  I think this was clearly quicker and cheaper than a BizTalk solution, simpler than a ADF solution, and as an example of serverless computing it shows just how much you can achieve in a relatively short timeframe.


Comments

  1. Wow! This is great to see some nice solutions around it. Thank you!

    ReplyDelete
  2. What would be the estimated cost of setting this up per month basis if we had say 100 users to service per day each copying 10000 files per day ? and we trying to store them in a blob storage account?

    ReplyDelete
    Replies
    1. Thats an interesting addition to the problem. If we reframe my scenario and it start with the headline requirement to transfer 100 users * 10K docs = 1 million documents per day, every day for a month, thats a whole lots of (a) documents, (b) capacity, & potentially (c) data ingress if this is on-premise to cloud every month. The solution options may looks different at those volumes. I wouldnt want to put a specific cost on it, it would take a bit of time to work it out accurately and there are a lot of unknowns. Mainly, how the source is connected to Azure at a network level (VPN, Public Web, Express Route) which will influence data ingress costs. The size of the files to be transferred (which will influence storage costs) and also the need to come within Logic App message limits for a file transfer. How redundant this has to be (affects storage costs again). And another is how quickly do the files need to moved from source (its not a big cost, but if you have to check every 10s versus every 5 mins, then the number of invocations per day will affect the cost). Logic app invocation, even at 31 million invoacations a month, is not going to be unreasonable. I'm pretty confident the high level solution would scale to the capacity you have outlined if file sizes were ok - but for this you want to look at the yearly cost and then compare with other options. The savings here may not be on the nuts and bolts of the copy operations, but on the management and support side of the process. Hope this helps.

      Delete

Post a Comment