The ALE Learnset Manager (ALM) is an application utilizing a trainable engine (ALE - Automated Learning Engine) to bring automation to document processing. ALM is a web-based administration client enabling the capture, preparation, and management of training documents to be learned through the Automatic Learning Engine (ALE).
You can also edit classes by adding and removing training documents to improve the performance of each learnset.
ALM Server provides REST interface endpoints of the ALE field extraction engine so that it can be used by applications that are coded in standard development languages, such as Java, C++, or C#, and that are also compatible with HTTP Web services, to send and receive data from ALM Server.
The ALM Server's approach to working with client applications is based on widely accepted standards. ALM Server uses a RESTful approach for Web services and HTTP/HTTPS transport for structured data exchange.
REST is an architectural style for message exchange that addresses the web as remote resources. In a RESTful application such as ALM Server, each URL points to a resource. This approach differs from that of SOAP in which SOAP exposes the functionality as URL endpoints containing functions that can be called. Unlike SOAP applications that are restricted to using GET or POST operations, REST-based applications include a greater range of operations like GET, POST, PUT, and DELETE.
RESTful applications are stateless, meaning no session state is stored on the server. The information required for a request is included in the request message itself. The client application can cache a resource representation, potentially improving application performance.
However, the documentation does not always link to return types that are used for requests.
The types are mentioned in the description - please check the Data Model
section
for details on the referenced type.
In order to connect to ALM Server, append ALM/service
to server base URL.
The URL should also contain the /session
and User Name
to connect to ALM Server and create a Session.
So the complete URL looks something similar like http://{serverName}:{PortNumber}/ALM/Service/session/{UserName}
To connect to the ALM server, complete the following steps.
POST /ALM/Service/session/{User ID}
On a successful login a session is created and session ID is returned in the response. This session ID is used for subsequent calls to ALM Server.
If the supplied user credentials are incorrect then authentication fails and a string not_authenticated
is returned in the response.
To check whether the session has expired or not, the following request is sent:
GET /session/current
The X-CPTMS-Session header is included in the following subsequent calls. That means if the session is active then the client request header contains the key-value pair of "X-CPTMS-Session" and "Session ID".
A project in ALM is created using PUT /learnset/project
.
The Request object has the following 3 parameters
Once the project is created, a FieldDeclaration object having field name "document_class_id" needs to be set to the created project ID.
This action fetches a list of all projects in ALM using GET /learnset/project/
.
Request object contains "application/json" as accept header. An array of projects in json format is returned as a successful response from ALM server.
You can append a random element to the request URL in order to prevent caching. Example: /learnset/project?_156578654332
The response JSON array containing the list of Project types has the following structures:
{ "id" : "...", "lastLearnedAt" : 12345, "lastModifiedAt" : 12345, "name" : "...", "usePositionalInformationForClassification" : true, "useUTF8" : true }
This action creates a new document class within a project using PUT /learnset/project/{projectId}/class
Request object contains the following parameters.
This returns the numeric ID of the new class as the response.
This action fetches the field declarations for a project.
The project level fields are fetched from the ALM server using GET /learnset/project/{projectId}/fields
Request object contains the following parameters.
It returns the array of fields(FieldDeclaration
) objects as the response from the server.
The response JSON array containing the list of FieldDeclaration types has the following structures:
{ "fieldId" : 12345, "name" : "...", "type" : "...", "required" : true, "format" : "...", "constant" : "..." }
This action explicitly sets the field declarations for a project.
All the fields must be submitted. If the end user wants to add one more new field all previous fields plus the new one must be re-submitted. The array of fields always include the following default entry:
{ "fieldId": 1, "name": "document_class_id", "type": "Integer", "required": true, "format": "...", "constant": "COMPANY" }Depending on the number of fields the FieldId is incremented subsequently.
The request body contain an array of FieldDeclaration
objects. The project level fields are set
using POST /learnset/project/{projectId}/fields
The request body containing a collection of FieldDeclaration
objects has the following structures:
{ "fieldId" : 12345, "name" : "...", "type" : "...", "required" : true, "format" : "...", "constant" : "..." }
This action fetches all classes for a specific project.
Request object contains the following parameters.
It returns an array of DocumentClass
objects in response. The list of classes are retrieved
using GET /learnset/project/{projectId}/class
On a successful request the response containing the JSON array of DocumentClass
objects has the following structures:
{ "id" : 12345, [The id of the document class - this id is unique within a project] "name" : "...",[Name of the document class. This is just a label that is supposed to help identifying classes in a UI. It is not used by ALE itself.] "numTrainingDocuments" : 12345 [ the number of training documents that are currently available for this class. This is provided when retrieving a class or a list of classes from the server. It is not supposed to be set when creating a new class. }
This action fetches all class field declarations of the project
Request object contains the following parameters.
It returns an array of ClassFieldDeclaration
objects as response. The list of class field declarations are retrieved
using GET /learnset/project/{projectId}/classfields
On successful request the response containing the JSON array of ClassFieldDeclaration
objects has the following structures:
{ "docClassId" : 12345, Id of the document class "fieldId" : 12345, Id of the field that is assigned to the document class "name" : "...", Name that is used for this class. If none is assigned the name from the field declaration will be used. "projectId" : "..." The id of the project }
This action determines if the ALM project is learnable or not using GET /learnset/project/{projectId}/isLearnable
The request object contains the following parameters.
This returns a Boolean value indicating whether the project can be learned without error.
This action adds training documents from a zip file. The zip file must contain an image, a .pos file and an .ival file for each document. If one of those files is missing the document is skipped. A field declaration is generated based on the values available values in the .ival files. Missing classes are also created.
Request body contains a multi-part form data object. Request object contains the following parameters.
This returns the status information for each document in response. It returns a collection of DocumentImportStatus
objects eventually.
The request call to server happens using POST /learnset/project/{projectId}/docs
On successful request the response containing the JSON array of DocumentImportStatus
objects has the following structures:
{ "name" : "...", [Gets the base name of the file] "errorCode" : 12345,[ Gets the error code for the file. The code is 0 if the import succeeded or a bitwise combination of the available error code values.] "docId" : "..." [Gets the id under which the document was stored] }
This action fetches the fields of a training document using GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/fields
The request object contains the following parameters.
This returns a collection of field FieldInfo
objects in the response.
On successful request the response containing the JSON array of FieldInfo
objects has the following structures:
{ "fieldId": 12345, Id of the field as declared in FieldDeclaration object "location" : { Location of the field value on the document - only needs to be set for limited learning "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 }, "pageNumber" : 12345, The fields page number "value" : "..." Value of the field. Either set a value or set word indexes. }
This action creates a set of test documents based on the uploaded content. This can either be a single document (image and .pos files) or a zip file with multiple documents.
The request object contains the following parameters.
The request body has the following details:
POST /learnset/project/{projectId}/testset
This returns the test document set id in response. Media-type being application/json
This action configures the test sets based on the documents located in a given path/file share.
Request object contains the following parameters.
The test sets are uploaded using PUT /learnset/project/{projectId}/testset
The path parameters value is appended as querystring to the request URL path="
It returns the test document set ID in response.
This action creates or updates a batch stream set that contains the training documents. The stream set has the same id as the project and can be referenced in the stream set service or the field extractor service.
The Request body contains the documents to learn (collection of DocumentAdapter
objects).
Request object contains the following parameters.
Streamset is updated using HEAD /learnset/project/{projectId}/updateStreamSet
This action involves the following chain of events:
POST /learnset/project/{ProjectID}/class/{ClassID}/doc
A field extractor can either be trained by passing a list of documents that is learned or by learning all documents from a batch stream set.
To train an extractor by passing a list of documents
POST /extractor/{id}/fields
POST /extractor/{id}/learn
GET /extractor/{id}/file/extractor
To train an extractor using a batch stream set
PUT /streamset
POST /streamset/{id}/document
POST /extractor/{id}/fields
GET /extractor/{id}/streamset/{streamSetId}/learn
GET /extractor/{id}/file/extractor
You may also want to download the PTB and CBM files after adding documents to the stream set and uploading them at a later time rather than creating the stream set from the scratch.
Field extraction can be done for a given document or for documents that are stored in a batch stream set.
To extract fields from a given single document
POST /extractor/{id}/file/extractor
/{id}/extract
To extract fields from documents that are stored in a batch stream set
POST /extractor/{id}/file/extractor
GET /{id}/streamset/{streamSetId}/extract/{docNum}
This action involves the following chain of events:
GET /learnset/project/{ProjectId}/isLearnable
HEAD /extractor/{Projectid}
PUT /extractor/id={ProjectID}&persistent=true
HEAD /learnset/project/{ProjectId}/updateStreamset
This creates or update the batch stream set that contains the training documents. The stream set has the same ID as the project and can be referenced in the stream set service or the field extractor service.
POST /extractor/{ID}/fields
where {ID} denotes Project/Extractor ID.
GET /extractor/{ProjectID}/streamset/{StreamsetID}/learn
It trains a field extractor based on the documents that are stored in a stream set.
The engine learns which fields to extract for each class as defined by the passed documents. The state of the extractor is written to an extractor file that can be downloaded for future use. With relearnable flag, the user indicates that the extractor enables the relearning of classes, which stores additional information into the connected extractor stream.
The request object contains the following parameters:
Projects can be learned using POST /extractor/{id}/relearn
The request body contains the following structure:
{ "id" : 12345, Document id "fileName" : "...", File name where the document is located, if any. "words" : [ { List of words with positioning information, collection ofWordInfo
objects. "pageNumber" : 12345, "word" : "...", "boundingBox" : { "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 } }, { "pageNumber" : 12345, "word" : "...", "boundingBox" : { "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 } } ], "pages" : [ { List of pages, array ofPageInfo
objects "rotationAngle" : 12345.0, "rotationOrigin" : 12345, "boundingBox" : { "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 } }, { "rotationAngle" : 12345.0, "rotationOrigin" : 12345, "boundingBox" : { "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 } } ], "fields" : [ {List of fields (only required for learning, not for extraction), array ofFieldInfo
objects. "fieldId" : 12345, "location" : { "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 }, "pageNumber" : 12345, "value" : "..." }, { "fieldId" : 12345, "location" : { "left" : 12345, "top" : 12345, "right" : 12345, "bottom" : 12345 }, "pageNumber" : 12345, "value" : "..." } ], "companyFieldValue" : "..." }
ALM Server does not accept all image file formats. When you upload the files to the ALM Server, you need to ensure that the following supported types(.tiff, .jpg, and .png) are used.
Any instances of batch stream sets or field extractors that are created on the server is destroyed
automatically after they have not been accessed for a default duration of 30 minutes.
This is a server configuration and is denoted in alm.config.xml under settings
bean through the following:
entry key="expirationTime" value="30"
Unauthorized requests without authentication header or with an incorrect password get 403 as the response code.
Attempts to access non-existing stream sets or extractors lead to a 404 response code.
All other errors usually produce a 5xx response code.
Documents that are supposed to be used for training or extraction can always be passed in JSON format using
the DocumentAdapter
type with words, pages and - for training - fields.
As an alternative, .pos files and .ival files can be uploaded. A .pos file contains one line per word, using the following format:
wordIdx,page,left,top,width,height,word
.ival files are only required when creating a batch stream set for learning. They are used to provide field values for a given document. An .ival file contains one entry per line, using one of the following formats:
fieldname,type,value fieldname<TAB>type<TAB>value
The user can send fieldname, type, value, boundingBox
and it is in the following format
fieldname<TAB>type<TAB>value<TAB>pageId,left,top,right,bottom For example f1 int 4500022612 0,751,1246,980,1275
Current released version is ALM 2.1.
For more details on the version history, please refer to the product Release Notes - https://docs.hyland.com/Portal_Public/Products/en/ALE_Learnset_Manager.htm
The resources use a data model that is supported by a set of client-side libraries that are made available on the files and libraries page.
There is a WADL document available that describes the resources API.
name | path | methods | description |
---|---|---|---|
BatchStreamSetService |
|
|
Creation and management of batch stream sets |
FieldExtractorService |
|
|
Training of field extractors and extraction of fields. |
LearnSetManagerService |
|
|
|
LearnsetSchedulerService |
|
|
type | description |
---|---|
BoundingBox | Container to carry the positional information of word. |
ClassFieldDeclaration | Container to carry the information of fields of the Document Class. |
DataCell | Simple container to carry extracted data string for a single cell. Used by ExtractedData. |
DocumentAdapter | Carrier for document information.
When training an extractor, make sure to fill the word list, the page list and the field list. When extracting fields from a document you only need to fill the word list and page list. |
DocumentClass | Information about a document class |
DocumentImportStatus | Container to carry the document import status information. |
DocumentUploadStatus | Container to carry the uploading status information of the documents. It takes account of number of total and imported documents and their status |
ExtendedClassFieldStatistics | Field statistics for a single class, covering how many values have been found at all for a field within that class and for how many of those values a target can be located. |
ExtractedData | Contains the extraction result for a single field. There are usually multiple candidates which are provided as a list of DataCell. |
FieldDeclaration | Declaration of a field that can be extracted. |
FieldInfo | Container to carry field information. |
FieldLocations | Container to carry the information of word locations respective to the field. |
FieldStatistics | Basic field statistics, covering how many values exist for a given field in a project or class |
LearnsetSchedulerProperties | |
PageInfo | Container to carry the page orientation and positional information. |
Project | Container to carry the information of Projects created in the ALM Server. |
TrainingDocumentIncident | Description of a failed plausibility check on a traing document |
TrainingDocumentMetaData | Meta data about a stored training document |
TrainingSetCheckResult | Result of a training set plausibility check, including found incidents and field statistics. |
WordInfo | Container to carry word information |