About the ALE Learnset Manager(ALM)

The ALE Learnset Manager (ALM) is an application utilizing a trainable engine (ALE - Automated Learning Engine) to bring automation to document processing. ALM is a web-based administration client enabling the capture, preparation, and management of training documents to be learned through the Automatic Learning Engine (ALE).

You can also edit classes by adding and removing training documents to improve the performance of each learnset.

ALM APIs

ALM Server provides REST interface end-points of the ALE field extraction engine so that it can be used by applications that are coded in standard development languages, such as Java, C++, or C#, and that are also compatible with HTTP Web services, to send and receive data from ALM Server.

Standards

The ALM Server's approach to working with client applications is based on widely accepted standards. ALM Server uses a RESTful approach for Web services and HTTP/HTTPS transport for structured data exchange.

REST is an architectural style for message exchange that addresses the web as remote resources. In a RESTful application such as ALM Server, each URL points to a resource. This approach differs from that of SOAP in which SOAP exposes the functionality as URL endpoints containing functions that can be called. Unlike SOAP applications that are restricted to using GET or POST operations, REST-based applications include a greater range of operations like GET, POST, PUT, and DELETE.

RESTful applications are stateless, meaning no session state is stored on the server. The information required for a request is included in the request message itself. The client application can cache a resource representation, potentially improving application performance.

However, the documentation does not always link to return types that are used for requests. The types are mentioned in the description - please check the Data Model section for details on the referenced type.

How-to

Connect to Server

In order to connect to ALM Server, append ALM/service to server base URL. The URL should also contain the /session and User Name to connect to ALM Server and create a Session.

So the complete URL looks something similar like http://{serverName}:{portNumber}/ALM/Service/session/{userName}

To connect to the ALM server, complete the following steps.

Connect to the server using POST /ALM/Service/session/{userName}
The Password is passed as parameter in Request Payload.
Request Header should contain "application/json" as the Content Type.

On a successful login a session is created and session Id is returned in the response. This session Id is used for subsequent calls to ALM Server.

If the supplied user credentials are incorrect then authentication fails and a string not_authenticated is returned in the response.

To check whether the session has expired or not, the following request is sent:

Connect to the server using GET /session/current
In the request header, session Id is passed in "X-CPTMS-Session"

The X-CPTMS-Session header is included in the following subsequent calls. That means if the session is active then the client request header contains the key-value pair of "X-CPTMS-Session" and "Session Id".

Create a Project

A project in ALM is created using PUT /learnset/project.

The Request object has the following 6 parameters

name: Name of the Project, to be created.
usePositionalInformationForClassification: Indicates whether positional information is used for classification or not(Boolean, default false). This option allows the ALE to use the positional information of the words during the classification step. This is typically good to use when working with structured documents, a W2 tax form for example.
useUTF8: Passing texts in UTF-8 format or not to the engine. This is an boolean value and default is false.
useClassificationParameter: This is an optional parameter. Value can only be passed either as 'Y' or 'N'. If it is passed as 'Y', then threshold and distance values are considered for the classification calculation of determining class level field confidence. If the value set as 'N' threshold and distance values are not considered for the same. If no value is passed , it considers default value as 'Y'.
threshold: This is an optional parameter. This accepts a positive integer value; ranging between 0 to 100 , during a project creation. This value may require for classification calculation to determining the class level field confidence at the time of extraction. Default value is 75.
distance: This is an optional parameter. This accepts a positive integer value; ranging between 0 to 100 , during a project creation. This value may require for classification calculation to determining the class level field confidence at the time of extraction. Default value is 0.

For more details , refer to (Project).

A unique project Id is returned as a response.

Once the project is created, a FieldDeclaration object having field name "document_class_id" needs to be set to the created project id.

Get all Projects

This action fetches a list of all projects in ALM using GET /learnset/project/. Request object contains "application/json" as a content-type in the header. An array of projects in json format is returned as a successful response from ALM server.

You can append a random element to the request URL in order to prevent caching. Example: /learnset/project?_156578654332

The response JSON array containing the list of Project types has the following structures:


   {
        "id": "485db5ca-80ce-4f3f-8d3d-d29dbb0becad",
        "name": "Project1",
        "lastModifiedAt": 1733359562000,
        "lastLearnedAt": 1733359562002,
        "usePositionalInformationForClassification": false,
        "useUTF8": true,
        "useClassificationParameter": "Y",
        "threshold": 75,
        "distance": 0
    }

Create a Document Class

This action creates a new document class within a project using PUT /learnset/project/{projectId}/class

Request object contains the following parameters.

projectId: Id of the project.
name: Name of the class to be created

Request object contains "application/json" as a content-type in the header.

This returns the numeric Id of the new class as the response.

Get Project level Fields

This action fetches the field declarations for a project. The project level fields are fetched from the ALM server using GET /learnset/project/{projectId}/fields

Request object contains the following parameters.

projectId: Id of the project.

Request object contains "application/json" as a content-type in the header.

It returns the array of fields(FieldDeclaration ) objects as the response from the server. The response JSON array containing the list of FieldDeclaration types has the following structures:

 {
		  "fieldId" : 12345,
		  "name" : "...",
		  "type" : "...",
		  "required" : true,
		  "format" : "...",
		  "constant" : "..."
	}

Configure Project level Fields

This action explicitly sets the field declarations for a project.

All the fields must be submitted. If the end user wants to add one more new field all previous fields plus the new one must be re-submitted. The array of fields always include the following default entry:

 [
	    {
	        "fieldId": 1,
	        "name": "document_class_id",
	        "type": "int",
	        "format": null,
	        "constant": "COMPANY",
	        "required": true
	    },
	    {
	        "fieldId": 2,
	        "name": "Invoice_Date",
	        "type": "date",
	        "format": null,
	        "constant": null,
	        "required": false
	    }
    ]

Depending on the number of fields the FieldId is incremented subsequently.

The request body contain an array of FieldDeclaration objects. The project level fields are set using POST /learnset/project/{projectId}/fields

The request body containing a collection of FieldDeclarationobjects has the following structures:

 {
  "fieldId" : 12345,
  "name" : "...",
  "type" : "...",
  "required" : true,
  "format" : "...",
  "constant" : "..."
}

Get all Classes for a Project

This action fetches all classes for a specific project.

Request object contains the following parameters.

projectId: Id of the project.

Request object contains "application/json" as a content-type in the header.

It returns an array of DocumentClass objects in response. The list of classes are retrieved using GET /learnset/project/{projectId}/class

On a successful request the response containing the JSON array of DocumentClassobjects has the following structures:

{
  "id" : 12345, [The id of the document class - this id is unique within a project]
  
  "name" : "...",[Name of the document class. This is just a label that is supposed to help identifying classes in a UI. It is not used by ALE itself.]
  
  "numTrainingDocuments" : 12345 [ the number of training documents that are currently available for this class.
                                   This is provided when retrieving a class or a list of classes from the server. It is not supposed to be set
                                    when creating a new class.
}

Get Class level Fields

This action fetches all class field declarations of the project

Request object contains the following parameters.

projectId: Id of the project.

Request object contains "application/json" as a content-type in the header.

It returns an array of ClassFieldDeclaration objects as response. The list of class field declarations are retrieved using GET /learnset/project/{projectId}/classfields

On successful request the response containing the JSON array of ClassFieldDeclarationobjects has the following structures:

{
  "docClassId" : 12345, Id of the document class.
  
  "fieldId" : 12345,   Id of the field that is assigned to the document class.
  
  "name" : "...",   Name that is used for this class. If none is assigned the name from the field declaration will be used.
  
  "projectId" : "..." The id of the project.
}

Check if Project is Learnable

This action determines if the ALM project is learnable or not using GET /learnset/project/{projectId}/isLearnable

The request object contains the following parameters.

projectId: Id of the project.

Request object contains "application/json" as a content-type in the header.

This returns a Boolean value indicating whether the project can be learned without error.

Add Training Documents

This action adds training documents from a zip file. The zip file must contain an image or pdf, a .pos file and an .ival file for each document. If one of those files is missing the document is skipped. A field declaration is generated based on the values available values in the .ival files. Missing classes are also created.

Request body contains a multi-part form data object. Request object contains the following parameters.

projectId: Id of the project.
append: true, if append the documents from the zip files to the existing training set, false if create a new training set.

Request object contains "multi-part/form-data" as a content-type in the header.

This returns the status information for each document in response. It returns a collection of DocumentImportStatus objects eventually.

The request call to server happens using POST /learnset/project/{projectId}/docs

On successful request the response containing the JSON array of DocumentImportStatus objects has the following structures:

{
  "name" : "...", [Gets the base name of the file]
  
  "errorCode" : 12345,[ Gets the error code for the file. The code is 0 if the import succeeded or a bitwise combination of the available error code values.]
  
  "docId" : "..." [Gets the id under which the document was stored]
}

Delete Oldest Training Documents

In ALM, document-related data- such as images and OCR information- can consume substantial space in the database. To optimize storage, it's essential to periodically remove the oldest documents i.e. earliest added training documents from the learnset.

Delete oldest training documents in a Project level

An API is provided for this purpose to delete the oldest documents from a project - DELETE /learnset/project/{projectId}/doc/oldest/{noOfDocToBeDeleted}

projectId: Specifies the project from which training documents should be deleted.
noOfDocToBeDeleted: Indicates the number of oldest training documents to be deleted from the specific project.

Request object contains "application/json" as a content-type in the header.

Upon successful execution, the API returns:

HTTP Status Code: 200
Message: "{no_of_documents} document(s) are deleted successfully from the project {projectName}. Please check the log for the details."

Delete oldest training documents in a Class level

An API is available for this purpose, allowing the deletion of the oldest training documents from a class within a project.


DELETE /learnset/project/{projectId}/class/{classId}/doc/oldest/{noOfDocToBeDeleted}

projectId: Specifies the ID of the project from which training documents should be deleted.
classId: Specifies the ID of the class from which training documents should be deleted.
noOfDocToBeDeleted: Indicates the number of oldest training documents to be deleted.

Request object contains "application/json" as a content-type in the header.

Upon successful execution, the API returns:

HTTP Status Code: 200
Message: "{no_of_documents} document(s) are deleted successfully from the class - {className} under the project - {projectName}. Please check the log for the details."

Get Training Document Fields

This action fetches the fields of a training document using GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/fields

The request object contains the following parameters.

classId: Id of the class.
docId: Id of the training document.
projectId: Id of the Project.

Request object contains "application/json" as a content-type in the header.

This returns a collection of field FieldInfo objects in the response.

On successful request the response containing the JSON array of FieldInfo objects has the following structures:

{
  "fieldId": 12345, Id of the field as declared in FieldDeclaration object
  
  "location" : { Location of the field value on the document - only needs to be set for limited learning
    "left" : 12345,
    "top" : 12345,
    "right" : 12345,
    "bottom" : 12345
  },
  
  "pageNumber" : 12345, The fields page number
  
  "value" : "..." Value of the field. Either set a value or set word indexes.
}

Create a Test Set

On basis of Uploaded Contents

This action creates a set of test documents based on the uploaded content. This can either be a single document (image or pdf and .pos files) or a zip file with multiple documents.

The request object contains the following parameters.

projectId: Id of the project

Request object contains "application/json" as a content-type in the header.

The request body has the following details:

Request contains the list of files(as attachments) having the media type multipart/form-data.
Use the content-type: "multipart/form-data" HTTP header to specify this media type to the server.
The form data contains the image and positional file information. While uploading the files, the file extension must be present. Example:file_0.png, file_1.pos . The image and .pos files are uploaded as binary to the ALM Server.

The test sets are uploaded using POST /learnset/project/{projectId}/testset

This returns the test document set id in response. Media-type being application/json

On basis of files located at given path

This action configures the test sets based on the documents located in a given path / file share.

Request object contains the following parameters.

projectId: Id of the project
path: The encoded path within the servers local file system or a file share on the network.

Request object contains "application/json" as a content-type in the header.

The test sets are uploaded using PUT /learnset/project/{projectId}/testset The path parameters value is appended as query string to the request URL path="<encoded_file_path>"

It returns the Id of the Test Set for the documents in response.

Update StreamSet

This action creates or updates a batch stream set that contains the training documents. The stream set has the same id as the project and can be referenced in the stream set service or the field extractor service.

The Request body contains the documents to learn (collection of DocumentAdapter objects). Request object contains the following parameters.

projectId: Id of the project

Request object contains "application/json" as a content-type in the header.

Streamset is updated using HEAD /learnset/project/{projectId}/updateStreamSet

Update Learnset

This action involves the following chain of events:

It uploads the document along with positional data(.pos file) and value data(.ival file) to the server. While uploading the documents the path where image file is located needs to be mentioned.
The .ival data is generated by iterating through a collection of project fields and field value combinations.
The request body contains a multipart form data object consisting the binary data of image,.pos and .ival files.
The files are having the naming convention like file_{0}.png, file_{1}.pos, file_{2}.ival respectively.

The documents are updated in the learnset using POST /learnset/project/{projectId}/class/{classId}/doc

Training a Field Extractor

A field extractor can either be trained by passing a list of documents that is learned or by learning all documents from a batch stream set.

To train an extractor by passing a list of documents

Submit your field declaration using POST /extractor/{id}/fields
Learn the documents using POST /extractor/{id}/learn
Download the extractor for further use using GET /extractor/{id}/file/extractor

To train an extractor using a batch stream set

Create a batch stream set using PUT /streamset
Add documents to the stream set using POST /streamset/{id}/document
Submit your field declaration to the extractor using POST /extractor/{id}/fields
Learn the documents using GET /extractor/{id}/streamset/{streamSetId}/learn
Download the extractor for further use using GET /extractor/{id}/file/extractor

You may also want to download the PTB and CBM files after adding documents to the stream set and uploading them at a later time rather than creating the stream set from the scratch.

Extraction

Field extraction can be done for a given document or for documents that are stored in a batch stream set.

To extract fields from a given single document

Upload the extractor file to the server using POST /extractor/{id}/file/extractor
Pass the document in a call of /{id}/extract

To extract fields from documents that are stored in a batch stream set

Create or upload your batch stream set as described in the training section
Upload the extractor file to the server using POST /extractor/{id}/file/extractor
Extract fields for a document by calling GET /{id}/streamset/{streamSetId}/extract/{docNum}

Learn or Relearn a Project

This action involves the following chain of events:

Check if the project is learnable using GET /learnset/project/{projectId}/isLearnable
Check if the extractor exists for the given project Id in {id} using HEAD /extractor/{id}
Create a new field extractor on the server. The returned Id is required for subsequent calls that work with the extractor instance. The call is made using PUT /extractor/id={projectId}&persistent=true
Update the stream set using HEAD /learnset/project/{projectId}/updateStreamset This creates or update the batch stream set that contains the training documents. The stream set has the same Id as the project and can be referenced in the stream set service or the field extractor service.
Addition of fields to the extractor using POST /extractor/{id}/fields where {id} denotes Project / Extractor Id.
Declare a set of fields for a field extractor. If no fields are declared, then it throws an error. There must be a field of type Integer with the constant value "COMPANY" present that is used as identifier of document Class.
Learn the project using GET /extractor/{id}/streamset/{streamSetId}/learn It trains a field extractor based on the documents that are stored in a stream set. Here {id} represents projectId.

The engine learns which fields to extract for each class as defined by the passed documents. The state of the extractor is written to an extractor file that can be downloaded for future use. With relearnable flag, the user indicates that the extractor enables the relearning of classes, which stores additional information into the connected extractor stream.

The request object contains the following parameters:

Id: Id of the extractor.
relearnable: true to enable relearning (defaults to false).

Request object contains "application/json" as a content-type in the header.

Projects can be learned using POST /extractor/{id}/relearn The request body contains the following structure:

{
  "id" : 12345, Document id
  
  "fileName" : "...", File name where the document is located, if any.
  
  "words" : [ {       List of words with positioning information, collection of  WordInfo  objects.
    "pageNumber" : 12345,
    "word" : "...",
    "boundingBox" : {
      "left" : 12345,
      "top" : 12345,
      "right" : 12345,
      "bottom" : 12345
    }
  }, {
    "pageNumber" : 12345,
    "word" : "...",
    "boundingBox" : {
      "left" : 12345,
      "top" : 12345,
      "right" : 12345,
      "bottom" : 12345
    }
  } ],
  
  "pages" : [ {        List of pages, array of  PageInfo  objects
    "rotationAngle" : 12345.0,
    "rotationOrigin" : 12345,
    "boundingBox" : {
      "left" : 12345,
      "top" : 12345,
      "right" : 12345,
      "bottom" : 12345
    }
  }, {
    "rotationAngle" : 12345.0,
    "rotationOrigin" : 12345,
    "boundingBox" : {
      "left" : 12345,
      "top" : 12345,
      "right" : 12345,
      "bottom" : 12345
    }
  } ],
  "fields" : [ {List of fields (only required for learning, not for extraction), array of FieldInfo  objects.
    "fieldId" : 12345,
    "location" : {
      "left" : 12345,
      "top" : 12345,
      "right" : 12345,
      "bottom" : 12345
    },
    "pageNumber" : 12345,
    "value" : "..."
  }, {
    "fieldId" : 12345,
    "location" : {
      "left" : 12345,
      "top" : 12345,
      "right" : 12345,
      "bottom" : 12345
    },
    "pageNumber" : 12345,
    "value" : "..."
  } ],
  "companyFieldValue" : "..."
}

No-touch mode APIs

ALM is now able to aid document field extraction in no touch mode by integrating with BFI. The invoice data is written to a set of staging tables in the database of ALM. The data resides in these staging tables until a feedback loop is initiated. This feedback loop subsequently checks for any corrected fields and updates the Learnset accordingly using the settings and the fields which are configured. Then there are provision to remove the configured records from the staging tables by activating flags in the API .

The ALM database consists the tables - TMPALMDOCUMENT and TMPALMFIELDS , which are referred here as staging tables. The data held in the staging tables is only required temporarily and has no use once document processing is completed. As the data includes image and OCR information which can take good amount of space in the database, it is important that this data is cleaned up regularly.

Add Training Document to Staging Table

This action adds a single training document in the staging table by invoking POST /no-touch-mode/learnset/add-staging-data

The request body contains the following fields.

clientId: Id of the Client. Client Id should always be a positive integer
documentNumber: Name or Number of the Document to be added.
An object of TmpALMDocument containing
- fileExtension: The File Extension of the Document provided. Currently it supports only tif or tiff files
- ocrText: The data stored in the valid positional file
- The request body should contain either of the following parameters:
  - documentImage: The byte array containing the image of the document
  - documentImageAsBase64String: Encoded Base64 string of the document image
An array of TmpALMField. Each object must contain the following fields:
- fieldName: Name of the field to be stored in the staging table. The value of this field would be internally transformed to uppercase and added to the staging table.
- originalFieldName: The OriginalFieldName is the technical name of the extraction field. It is case-sensitive.
- fieldType: Type of the Field. The valid field types are DATE, AMOUNT or TEXT (TEXT is equivalent to the type PHRASE here)
- exportValue: Represents the exported value of the TmpALMField object in string format
- An array of Candidate objects must containing
  - left: Represents the leftmost X-coordinate of the candidate's position on a page. This indicates the horizontal starting position relative to the page.
  - top: Represents the topmost Y-coordinate of the candidate's position on a page. This indicates the vertical starting position relative to the page.
  - height: Represents the height of the candidate.
  - width: Represents the width of the candidate
  - page: Represents the page number where the candidate is located. This allows for locating the candidate within multi-page documents.
  - text: Represents the textual content associated with the candidate.
  - confidence: Represents the confidence level associated with the candidate, typically ranging from 0.0 to 1.0

Here is a sample request body for the action

 {
     "clientId": 12,
     "documentNumber": "SAMPLE_DOCUMENT",
     "tmpALMDocument": {
     "documentImage": [0, 0, 1, 0, 1, ...],
     "fileExtension": "TIF",
     "ocrText": "0,0,107,65,18,25,4\r\n1,0,125,65,9,25,/\r\n2,0,135,65,18,25,4\r\n...",
     "documentImageAsBase64String": "SUkqAAgAAAAUAP4ABAABAAAAAgAAAAABBAABAAAA..."
     },
     "tmpALMFields": [
         {
             "fieldName": "DUEDATE",
             "originalFieldName": "DueDate",
             "fieldType": "DATE",
             "exportValue": "2024-4-14",
             "candidates": [
                 {
                     "left": 602,
                     "top": 224,
                     "height": 36,
                     "width": 200,
                     "page": 0,
                     "text": "4/14/2024",
                     "confidence": 0.7
                 }, ...
             ]
         }
     ]
 }

Add Training Document From Staging Table to Learnset

This action adds a single existing training document from the staging tables to learnset using POST /no-touch-mode/learnset/project/doc

The request body contains the following fields.

projectName: Name of the project. Creates a new project if does not exist.
documentNumber: This is the unique document number for the record in the ALM staging tables database. This document number should be in the same format as the unique key for each document in the reporting database.
sourceId: This is the number that is used to form part of the ALM class name. It is usually set to the vendor Id.
sourceName: This is the name that is used to form part of the ALM class name. It is usually set to the vendor name.

[N.B., Class name is created by using sourceName and sourceId like this format - sourceName_sourceId. If that class already exists under the project , then that is used else new class is created with that name.]

retainALMTempRecord: Default Value is false, if not provided. If set to true, retains the document data in the staging table after being added to learnset.
deleteAllIfDaysOlderThan: Default Value is 0. If set to a positive number, deletes all the document records of the client stored in the staging table older than the given days.

[N.B., If retainALMTempRecord is true , but the value is provided in 'deleteAllIfDaysOlderThan' parameter , is already older than the given days then it deletes the records. Means 'deleteAllIfDaysOlderThan' gets precedence over 'retainALMTempRecord' here.]

maxClassDocumentCount: Default Value is 0. When marked positive, it sets the maximum number of documents allowed in the class.
maxALMClasses: Default Value is 0. When marked positive, it sets the maximum number of classes allowed in the project.
clientId: Unique Id of the Client. Client Id should always be a positive integer
An array of DocumentHeaderField objects must containing
- fieldName: The header field name to be updated.
- fieldValue: The header field value to be updated. Please note that if you are providing a date value the format must be YYYY-MM-DD

Here is a sample request body for the action

 {
     "projectName": "ALM_PROJECT",
     "documentNumber": "SAMPLE_DOCUMENT",
     "sourceId": "ALM",
     "sourceName": "CLASS",
     "retainALMTempRecord": false,
     "deleteAllIfDaysOlderThan": 0,
     "maxClassDocumentCount": 0,
     "maxALMClasses": 0,
     "clientId": 2,
     "documentHeaderFields": [
         {
             "fieldName": "INVOICEDATE",
             "fieldValue": "2024-04-04"
         }, ...
     ]
 }

File Types

ALM Server does not accept all image file formats. When you upload the files to the ALM Server, you need to ensure that the following supported types(.tiff, .jpg, and .png) are used.

Apart from aforementioned image file formats, .pdf format is also supported.

Instance management

Any instances of batch stream sets or field extractors that are created on the server is destroyed automatically after they have not been accessed for a default duration of 30 minutes. This is a server configuration and is denoted in alm.config.xml under settings bean through the following:

entry key="expirationTime" value="30"

Success and Error Codes

For any successful invocation of an API , it returns 2xx HTTP status response code.

Unauthorized requests without authentication header or with an incorrect password get 403 as a HTTP status response code.

Attempts to access non-existing stream sets or extractors lead to a 404 HTTP status response code.

Any Bad Request for invoking any API , produces 400 HTTP status response code.

All other errors usually produce a 5xx HTTP status response code.

Document Formats

Documents that are supposed to be used for training or extraction can always be passed in JSON format using the DocumentAdapter type with words, pages and - for training - fields.

As an alternative, .pos files and .ival files can be uploaded. A .pos file contains one line per word, using the following format:

 wordIdx,page,left,top,width,height,word

.ival files are only required when creating a batch stream set for learning. They are used to provide field values for a given document. An .ival file contains one entry per line, using one of the following formats:

 fieldname,type,value
 fieldname<TAB>type<TAB>value

The user can send fieldname, type, value, boundingBox and it is in the following format

 fieldname<TAB>type<TAB>value<TAB>pageId,left,top,right,bottom
 For example
	f1            int           4500022612         0,751,1246,980,1275

Change Log

Current released version is ALM 24.1.1

For more details on the version history, please refer to the product Release Notes - https://support.hyland.com/search/all?filters=component~%2522ALE+Learnset+Manager%2522*prodname~%2522Brainware%2522&content-lang=en-US

The resources use a data model that is supported by a set of client-side libraries that are made available on the files and libraries page.

There is a WADL document available that describes the resources API.

name	path	methods	description
BatchStreamSetService	`/streamset` `/streamset/{id}` `/streamset/{id}/document` `/streamset/{id}/fields` `/streamset/{id}/utf8` `/streamset/{id}/file/cbm` `/streamset/{id}/file/ptb` `/streamset/{id}/file/zip`	`PUT` `DELETE HEAD` `POST` `GET POST` `GET PUT` `GET HEAD POST` `GET HEAD POST` `GET HEAD`	Creation and management of batch stream sets.
FieldExtractorService	`/extractor` `/extractor/{id}` `/extractor/{id}/extract` `/extractor/{id}/fields` `/extractor/{id}/fieldtargets` `/extractor/{id}/learn` `/extractor/{id}/relearn` `/extractor/{id}/usePositionalInformation` `/extractor/{id}/utf8` `/extractor/{id}/file/extractor` `/extractor/{id}/streamset/{streamSetId}/learn` `/extractor/{id}/streamset/{streamSetId}/extract/{docNum}`	`PUT` `DELETE HEAD` `POST` `GET POST` `POST` `POST` `POST` `GET PUT` `GET PUT` `GET HEAD POST` `GET` `GET`	Training of field extractors and extraction of fields.
LearnSetManagerService	`/learnset/project` `/learnset/ocr/available` `/learnset/project/{projectId}` `/learnset/project/{projectId}/check` `/learnset/project/{projectId}/class` `/learnset/project/{projectId}/classfields` `/learnset/project/{projectId}/docs` `/learnset/project/{projectId}/fields` `/learnset/project/{projectId}/isLearnable` `/learnset/project/{projectId}/learn` `/learnset/project/{projectId}/numDocs` `/learnset/project/{projectId}/testset` `/learnset/project/{projectId}/updateStreamSet` `/learnset/project/{projectId}/upload` `/learnset/project/{projectId}/check/extraction` `/learnset/project/{projectId}/check/fieldtargets` `/learnset/project/{projectId}/class/{classId}` `/learnset/project/{projectId}/fields/statistics` `/learnset/project/{projectId}/testset/{testSetId}` `/learnset/project/{projectId}/upload/{uploadId}` `/learnset/project/{projectId}/class/{classId}/doc` `/learnset/project/{projectId}/class/{classId}/fields` `/learnset/project/{projectId}/doc/oldest/{noOfDocToBeDeleted}` `/learnset/project/{projectId}/class/{classId}/doc/{docId}` `/learnset/project/{projectId}/class/{classId}/doc/oldest/{noOfDocToBeDeleted}` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/check` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/class` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/fields` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/image` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/meta` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/pageCnt` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/pos` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/class` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/extract` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/image` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/meta` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/pos` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/check/extraction` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/check/fieldtargets` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/fields` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/value` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/locations/value` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/extract/forceLearning/{learn}`	`GET PUT` `GET` `DELETE GET PUT` `GET` `GET PUT` `GET` `GET POST` `GET POST` `GET` `GET` `GET` `POST PUT` `HEAD` `POST` `GET` `GET` `DELETE GET PUT` `GET` `DELETE GET HEAD POST` `DELETE GET` `GET POST` `GET POST` `DELETE` `DELETE GET` `DELETE` `GET` `PUT` `GET POST` `GET POST` `GET` `GET` `GET POST` `POST` `GET` `GET` `GET` `GET` `GET` `GET` `GET` `GET` `GET` `GET`
LearnsetNoTouchModeService	`/no-touch-mode/learnset/add-staging-data` `/no-touch-mode/learnset/project/doc`	`POST` `POST`
LearnsetSchedulerService	`/learnset/projects/learn-scheduler` `/learnset/projects/learn-scheduler/start` `/learnset/projects/learn-scheduler/{schedulerId}` `/learnset/projects/learn-scheduler/{schedulerName}`	`GET` `POST` `DELETE` `GET`

JSON

type	description
BoundingBox	Container to carry the positional information of word.
Candidate	Container to carry the candidate related properties for inserting into the staging table.
ClassFieldDeclaration	Container to carry the information of fields of the Document Class.
DataCell	Simple container to carry extracted data string for a single cell. Used by ExtractedData.
DocumentAdapter	Carrier for document information. When training an extractor, make sure to fill the word list, the page list and the field list. When extracting fields from a document you only need to fill the word list and page list.
DocumentClass	Information about a document class
DocumentHeaderField	Container to carry the corrected header fieldName and header fieldValue , as a part of request.
DocumentImportStatus	Container to carry the document import status information.
DocumentUploadStatus	Container to carry the uploading status information of the documents. It takes account of number of total and imported documents and their status
ExtendedClassFieldStatistics	Field statistics for a single class, covering how many values have been found at all for a field within that class and for how many of those values a target can be located.
ExtractedData	Contains the extraction result for a single field. There are usually multiple candidates which are provided as a list of DataCell.
FieldDeclaration	Declaration of a field that can be extracted.
FieldInfo	Container to carry field information.
FieldLocations	Container to carry the information of word locations respective to the field.
FieldStatistics	Basic field statistics, covering how many values exist for a given field in a project or class
LearnsetDocumentAddRequest	Container to carry the all the properties required to facilitate the request payload for adding documents from the staging tables to the learnset tables.
LearnsetSchedulerProperties	Container to carry the properties of a global scheduler.
PageInfo	Container to carry the page orientation and positional information.
Project	Container to carry the information of a Project created in the ALM application.
StagingDataAddRequest	Container to carry the all the properties required to facilitate the request payload for adding documents data to staging tables.
TmpALMDocument	Container to carry all the required properties for populating TMPALMDOCUMENT table's data.
TmpALMField	Container to carry all the required properties for populating TMPALMFIELDS table's data
TrainingDocumentIncident	Description of a failed plausibility check on a traing document
TrainingDocumentMetaData	Meta data about a stored training document
TrainingSetCheckResult	Result of a training set plausibility check, including found incidents and field statistics.
WordInfo	Container to carry word information.

About the ALE Learnset Manager(ALM)

ALM APIs

Standards

How-to

Connect to Server

Create a Project

Get all Projects

Create a Document Class

Get Project level Fields

Configure Project level Fields

Get all Classes for a Project

Get Class level Fields

Check if Project is Learnable

Add Training Documents

Delete Oldest Training Documents

Delete oldest training documents in a Project level

Delete oldest training documents in a Class level

Get Training Document Fields

Create a Test Set

On basis of Uploaded Contents

On basis of files located at given path

Update StreamSet

Update Learnset

Training a Field Extractor

Extraction

Learn or Relearn a Project

No-touch mode APIs

Add Training Document to Staging Table

[N.B., If both of the parameters are passed, only the value in documentImage would be considered]

Add Training Document From Staging Table to Learnset

[N.B., Class name is created by using sourceName and sourceId like this format - sourceName_sourceId. If that class already exists under the project , then that is used else new class is created with that name.]

[N.B., If retainALMTempRecord is true , but the value is provided in 'deleteAllIfDaysOlderThan' parameter , is already older than the given days then it deletes the records. Means 'deleteAllIfDaysOlderThan' gets precedence over 'retainALMTempRecord' here.]

File Types

Instance management

Success and Error Codes

Document Formats

Change Log

Resources

Data Types

JSON