This article is contributed. See the original author and article here.
Introduction
Azure Form Recognizer is an amazing Azure AI Service to extract and analyze form fields documents. One benefit of using Form Recognizer is the ability to create your own custom model based on documents specific to your business needs.
To create custom models, Azure provides Form Recognizer Studio, a web application that makes creation and training of custom model simple without the needs of an AI expert.
One common challenge most customers have when dealing with more than a handful of models is to apply the same DevOps processes, they are familiar with when promoting code changes from one environment to another to their AI models.
This article demonstrates the use of Form Recognizer’s REST APIs to implement a CI/CD pipeline for model management.
The complete implementation is available on Github.
Why DevOps matters
When creating a custom model based on documents that map your business needs, your programmers and data scientists will go through multiple iterations of training, resulting in multiple different models in the development environment. Once they have a model that works well for the specific scenario, this model will then need to be promoted to other environments.
While it is possible to copy the training dataset and train a model in all of the other environments, that process is cumbersome and can result in missed labels resulting in lower accuracy models. A more effective approach is to treat the model like a source code artifact and use a DevOps pipeline to orchestrate the movement of the model across the different environments ensuring traceability and compliance. The diagram below explains the proposed implementation flow to achieve this result.
Our implementation
For our use case, let’s assume we have three environments: each is in a respective resource group. This approach would work even if the resource groups are in three different subscriptions.
The model is trained in the development environment, where , data scientists and/or developers will use the Form Recognizer Studio to label the documents in a storage account. The following steps describe that process in greater detail.
Start by moving the training dataset to a specific container in Azure Blob Storage.
In the Form Recognizer Studio, start by creating a custom project and connecting the Studio to work with the dataset, you just created.
When you train your model, be sure to save the model ID you provided or find the model from the list of models within the project. This model ID will be needed when it’s time to migrate the model to other environments.
Form Recognizer provides a model copy operation that starts with generating a model copy authorization for the target resource, in this case the QA environment. This copy automation is provided to the Form Recognizer development resource to then execute the copy action.
To orchestrate this set of actions, a simple GitHub action was created. The API can be integrated into your CI/CD pipeline using REST or any of the language specific SDKs. In this case, we created an Azure Function that uses the .NET SDK for Form Recognizer. This Azure Function provides multiple endpoints that are leveraged in the GitHub Action.
The following diagram describes the GitHub Actions Orchestration.
- The developer moves all the documents needed to train the custom model into Azure Storage account.
- The developer uses the Form Recognizer Studio to train the custom model in the development environment. Once the model is trained and the developer is satisfied with the model quality, the model ID is saved for use with the GitHub action.
- The developer initiates the GitHub action by providing the model ID from the previous step as an input parameter.
- The first step is to validate that the model ID exists in the DEV environment.
- If the model exists in the DEV environment, it will be copied to the QA environment
- Now, a QA engineer can validate the model produces the expected results.
- Once the QA tests are successful, an approver needs to approve the next job to promote the model to the production environment.
- The model is now copied to the production environment and is available for use.
Once the model is in production, you can use it within your applications. The following example demonstrates how the model is being used to analyze documents.
Here are the GitHub Action used in this sample, it consists of 3 jobs that invoke the Azure Function using a PowerShell script. As stated earlier, the Azure Function implementation is optional, this could be accomplished via HTTP requests directly to the Form Recognizer resources from your pipelines.
Here is the simple PowerShell script.
Conclusion
In this blog post, we explain the importance of implementing a DevOps practice around your custom models in Azure Form Recognizer. We provide an implementation and illustrate how easy it is to implement with the REST API.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments