This article is contributed. See the original author and article here.

Overview

Businesses today are applying Optical Character Recognition (OCR) and document AI technologies to rapidly convert their large troves of documents and images into actionable insights. These insights power robotic process automation (RPA), knowledge mining, and industry-specific solutions. However, there are several challenges to successfully implementing these scenarios at scale.

The challenge

Your customers are global, and their content is global so your systems should also speak and read international languages. Nothing is more frustrating than not reaching your global customers due to lack of support for their native languages.

Secondly, your documents are large, with potentially hundreds and even thousands of pages. To complicate things, they have print and handwritten style text mixed into the same documents. To make matters worse, they have multiple languages in the same document, possibly even in the same line.

Thirdly, you are a business that’s trusted by your customers to protect their data and information. If your customers are in industries such as healthcare, insurance, banking, and finance, you have stringent data privacy and security needs. You need the flexibility to deploy your solutions on the world’s most trusted cloud or on-premise within your environment.

Finally, you should not have to choose between world-class AI quality, world languages support, and deployment on cloud or on-premise.

Computer Vision OCR (Read API)

Microsoft’s Computer Vision OCR (Read) technology is available as a Cognitive Services Cloud API and as Docker containers. Customers use it in diverse scenarios on the cloud and within their networks to help automate image and document processing.

What’s New

We are announcing Computer Vision’s Read API v3.2 public preview as a cloud service and Docker container. It includes the following updates:

OCR for 73 languages including Simplified and Traditional Chinese, Japanese, Korean, and several Latin languages.

Natural reading order for the text line output.

Handwriting style classification for text lines.

Text extraction for selected pages for a multi-page document.

Available as a Distroless container for on-premise deployment.

First wave of language expansion

With the latest Read preview version, we are announcing OCR support for 73 languages, including Chinese Simplified, Chinese Traditional, Japanese, Korean, and several Latin languages, a 10x increase from the Read 3.1 GA version.

Thanks to Read’s universal model, to extract the text in these languages, use the Read API call without the optional language parameter. We recommend not using the language parameter if you are unsure of the language of the input document or image at run time.

The latest Read preview supports the following languages:

English

French

Italian

German

Spanish

Portuguese

Dutch

Chinese Simplified

Chinese Traditional

Japanese

Korean

Czech

Hungarian

Polish

Swedish

Turkish

Danish

Norwegian

Cebuano

Fijian

Swahili (Latin)

Uzbek (Latin)

Zulu

Afrikaans

Albanian

Indonesian

Malay (Latin script)

Filipino

Catalan

Galician

Basque

Haitian Creole

Irish

Javanese

Scottish Gaelic

Scots

Romansh

Luxembourgish

Occitan

Breton

Asturian

Neapolitan

Western Frisian

Corsican

Friulian

Manx

Kara-Kalpak

Gilbertese

Bislama

Kachin (Latin script)

Khasi

Hani

Greenlandic

Tetum

Zhuang

Volapük

Interlingua

Kabuverdianu

Cornish

Hmong Daw (Latin)

Inuktitut (Latin)

K’iche’

Yucatec Maya

Estonian

Finnish

Slovenian

Kashubian

Kurdish (Latin)

Tatar (Latin)

Crimean Tatar (Latin)

Chamorro

Upper Sorbian

Walser

For example, once you have created a Computer Vision resource, the following curl code will call the Read 3.2 preview with the sample image.

Make the following changes in the command where needed:

Replace the value of <subscriptionKey> with your subscription key.

Replace the first part of the request URL (westcentralus) with the text in your own endpoint URL.

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Atomist_quote_from_Democritus.png/338px-Atomist_quote_from_Democritus.png"}"

The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation. The URL expires in 48 hours.

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

Natural reading order output

OCR services typically output text in a certain order in their output. With the new Read preview, choose to get the text lines in the natural reading order instead of the default left to right and top to bottom ordering. Use the new readingOrder query parameter with the “natural” value for a more human-friendly reading order output as shown in the following example.

The following visualization of the JSON formatted service response shows the text line order for the same document. Note the first column’s text lines output in order before listing the second column and finally the third column.

OCR Read order example

For example, the following curl code sample calls the Read 3.2 preview to analyze the sample newsletter image and output a natural reading order for the extracted text lines.

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze?readingOrder=natural -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://docs.microsoft.com/en-us/microsoft-365-app-certification/media/dec01.png"}"

The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation.

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

Handwriting style classification

When you apply OCR on business forms and applications, it’s useful to know which parts of the form has handwritten text in them so that they can be handled differently. For example, comments and the signature areas of agreements typically contain handwritten text. With the latest Read preview, the service will classify English and Latin languages-only text lines as handwritten style or not along with a confidence score.

For example, in the following image, you see the appearance object in the JSON response with the style classified as handwriting along with a confidence score.

OCR handwriting style classification for text lines

The following code analyzes the sample handwritten image with the Read 3.2 preview.

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://intelligentkioskstore.blob.core.windows.net/visionapi/suggestedphotos/2.png"}"

The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation.

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

Extract text from select pages of a document

Many standard business forms have fillable sections followed by long informational sections that are identical between documents, and versions of those documents. At other times, you will be interested in applying OCR to specific pages of interest for business-specific reasons.

The following curl code sample calls the Read 3.2 preview to analyze the financial report PDF document with the pages input parameter set to the page range, “3-5”.

curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze?pages=3-5 -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{"url":"https://www.annualreports.com/HostedData/AnnualReports/PDF/NASDAQ_MSFT_2019.pdf"}"

The response will include an Operation-Location header, whose value is a unique URL. You use this URL to query the results of the Read operation.

curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"

The following JSON extract shows the resulting OCR output that extracted the text from pages 3, 4, and 5. You should see a similar output for your sample documents.

"readResults": [
      {
        "page": 3,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": []
      },
      {
        "page": 4,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": []
      },
      {
        "page": 5,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": []
      }
]