This article is contributed. See the original author and article here.

Overview


Businesses today are applying Optical Character Recognition (OCR) and document AI technologies to rapidly convert their large troves of documents and images into actionable insights. These insights power robotic process automation (RPA), knowledge mining, and industry-specific solutions. However, there are several challenges to successfully implementing these scenarios at scale.


 


The challenge


Your customers are global, and their content is global so your systems should also speak and read international languages. Nothing is more frustrating than not reaching your global customers due to lack of support for their native languages.


 


Secondly, your documents are large, with potentially hundreds and even thousands of pages. To complicate things, they have print and handwritten style text mixed into the same documents. To make matters worse, they have multiple languages in the same document, possibly even in the same line.


 


Thirdly, you are a business that’s trusted by your customers to protect their data and information. If your customers are in industries such as healthcare, insurance, banking, and finance, you have stringent data privacy and security needs. You need the flexibility to deploy your solutions on the world’s most trusted cloud or on-premise within your environment.


 


Finally, you should not have to choose between world-class AI quality, world languages support, and deployment on cloud or on-premise.


 


Computer Vision OCR (Read API)


Microsoft’s Computer Vision OCR (Read) technology is available as a Cognitive Services Cloud API and as Docker containers. Customers use it in diverse scenarios on the cloud and within their networks to help automate image and document processing.


 


 


What’s New


We are announcing Computer Vision’s Read API v3.2 public preview as a cloud service and Docker container. It includes the following updates:



  • OCR for 73 languages including Simplified and Traditional Chinese, Japanese, Korean, and several Latin languages.

  • Natural reading order for the text line output.

  • Handwriting style classification for text lines.

  • Text extraction for selected pages for a multi-page document.

  • Available as a Distroless container for on-premise deployment.


 


First wave of language expansion


With the latest Read preview version, we are announcing OCR support for 73 languages, including Chinese Simplified, Chinese Traditional, Japanese, Korean, and several Latin languages, a 10x increase from the Read 3.1 GA version. 


 


Thanks to Read’s universal model, to extract the text in these languages, use the Read API call without the optional language parameter. We recommend not using the language parameter if you are unsure of the language of the input document or image at run time.


 


The latest Read preview supports the following languages:












 


English


French


Italian


German


Spanish


Portuguese


Dutch


Chinese Simplified


Chinese Traditional


Japanese


Korean


Czech


Hungarian


Polish


Swedish


Turkish


Danish


Norwegian


 



Cebuano


Fijian


Swahili (Latin)


Uzbek (Latin)


Zulu


Afrikaans


Albanian


Indonesian


Malay (Latin script)


Filipino


Catalan


Galician


Basque


Haitian Creole


Irish


Javanese


Scottish Gaelic


Scots


Romansh



Luxembourgish


Occitan


Breton


Asturian


Neapolitan


Western Frisian


Corsican


Friulian


Manx


Kara-Kalpak


Gilbertese


Bislama


Kachin (Latin script)


Khasi


Hani


Greenlandic


Tetum


Zhuang


 



Volapük


Interlingua


Kabuverdianu


Cornish


Hmong Daw (Latin)


Inuktitut (Latin)


K’iche’


Yucatec Maya


Estonian


Finnish


Slovenian


Kashubian


Kurdish (Latin)


Tatar (Latin)


Crimean Tatar (Latin)


Chamorro


Upper Sorbian


Walser



 


For example, once you have created a Computer Vision resource, the following curl code will call the Read 3.2 preview with the sample image.


 


Make the following changes in the command where needed:



  1. Replace the value of <subscriptionKey> with your subscription key.

  2. Replace the first part of the request URL (westcentralus) with the text in your own endpoint URL.


 

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Atomist_quote_from_Democritus.png/338px-Atomist_quote_from_Democritus.png"}"

 


The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation. The URL expires in 48 hours.


 

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

 


 


Natural reading order output


OCR services typically output text in a certain order in their output. With the new Read preview, choose to get the text lines in the natural reading order instead of the default left to right and top to bottom ordering. Use the new readingOrder query parameter with the “natural” value for a more human-friendly reading order output as shown in the following example.


 


The following visualization of the JSON formatted service response shows the text line order for the same document. Note the first column’s text lines output in order before listing the second column and finally the third column.


 


OCR Read order exampleOCR Read order example


 


For example, the following curl code sample calls the Read 3.2 preview to analyze the sample newsletter image and output a natural reading order for the extracted text lines.


 

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze?readingOrder=natural -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://docs.microsoft.com/en-us/microsoft-365-app-certification/media/dec01.png"}"

 


The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation.


 

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

 


 


Handwriting style classification


When you apply OCR on business forms and applications, it’s useful to know which parts of the form has handwritten text in them so that they can be handled differently. For example, comments and the signature areas of agreements typically contain handwritten text. With the latest Read preview, the service will classify English and Latin languages-only text lines as handwritten style or not along with a confidence score.


 


For example, in the following image, you see the appearance object in the JSON response with the style classified as handwriting along with a confidence score.


 


OCR handwriting style classification for text linesOCR handwriting style classification for text lines


 


The following code analyzes the sample handwritten image with the Read 3.2 preview.


 

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://intelligentkioskstore.blob.core.windows.net/visionapi/suggestedphotos/2.png"}"

 


The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation.


 

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

 


 


Extract text from select pages of a document


Many standard business forms have fillable sections followed by long informational sections that are identical between documents, and versions of those documents. At other times, you will be interested in applying OCR to specific pages of interest for business-specific reasons.


 


The following curl code sample calls the Read 3.2 preview to analyze the financial report PDF document with the pages input parameter set to the page range, “3-5”.


 

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze?pages=3-5 -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://www.annualreports.com/HostedData/AnnualReports/PDF/NASDAQ_MSFT_2019.pdf"}"

 


The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation.


 

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

 


The following JSON extract shows the resulting OCR output that extracted the text from pages 3, 4, and 5. You should see a similar output for your sample documents.


 

"readResults": [
      {
        "page": 3,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": []
      },
      {
        "page": 4,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": []
      },
      {
        "page": 5,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": []
      }
]

 


 


On-premise option with Distroless container


 


sanjeev_jagtap_3-1613027644248.png


 


The Read 3.2 preview OCR container provides:



  • All features from the Read cloud API preview

  • Distroless container release

  • Performance and memory enhancements


Install and run the Read containers to get started and find the recommended configuration settings.


 


 


Get Started


Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.