Defining the Workflow Plugin
Purpose
The purpose of this document is to explain how to define the Workflow Plugin, also referred to as AutomationScripts.dll.
Each customer has unique requirements and desired outcomes for document identification and processing. This document should be considered as general guidance to assist developers in defining the Workflow Plugin, according to specific use cases for the project.
Overview
To identify custom document types, the Workflow service uses a plugin structure. The Shinydocs.AutomatorScripting package facilitates this. The Workflow Library contains custom scripts for the Workflow service.
Getting Started
Third-Party Software Requirements
Name | Version | Download Link |
---|---|---|
.NET Framework SDK | 7.0 (for Workflow 2.5) | |
6.0 (for Workflow 2.4) |
Script Creation
We highly recommend stringent and thorough testing of the IdentifyDocumentType method to ensure the most accurate results from the Workflow service.
Create a class derived from the class Shinydocs.AutomatorScripting.DocumentTypeScript (see below for property table)
Implement the following methods:
IdentifyDocumentType()
This method identifies the document based on the JObject passed into it. This object is a copy of the index object. The function returns true if the document has been identified. It is called during the IdentifyDocument task in the workflow. This is a critical part of the process. If the document cannot be identified the other methods will not run for that document.
ProcessDocumentType()
This method returns a JObject that contains all the fields that are to be added to the index. It is called during the ProcessDocument task in the workflow. This method updates a document with data.
PostProcessDocumentType()
This method is almost identical to ProcessDocumentType(). It returns an update object that contains all the fields to be updated in the index. It is called during the PostProcessDocument task in the workflow. This method updates a document with data after it has been reviewed in Review.
GenerateDocumentTypeDefinitions()
This method returns a definition for the current document that the Workflow service writes to the index. The definition is used by Review to render a display for the given document type so each document can have a customized view, showing only the fields appropriate for the business case.
GenerateContentManagementSystemMapping()
This method returns a mapping of the index fields to Content Server Categories and Attributes.
Shinydocs.AutomatorScripting.DocumentTypeScript class
Property | Type | Default | Description |
---|---|---|---|
| string | Indicates the document type | |
| string | UGS | Indicates the group this document is a part of. Not currently being used |
| integer | 10000 | Signifies the order in which the documents should be processed |
| boolean | If the “identify document” step needs a copy of the physical file, set this property to true | |
| boolean | false | Tells the Workflow service to load the file from the source and to provide a path to the file. If a file system document is being processed, it is the path to the document. If a Content Server object is being processed, it is a temporary path that has been downloaded from the Content Server for you. If the “process document” step needs a copy of the physical file for further processing, set this property to true |
Script Example
In this example, the Capybara document does not use the full text to identify the document. In this case, we might use the DocumentTypeScript class and override the IdentifyDocument method. Let’s say Phylum Corp has implemented their class for identifying South American rodents from documents. A partial implementation of the Capybara document could be the following:
uses Phylum.Util.RodentMatcher;
class CapybaraDocument: DocumentTypeScript
{
internal override bool IdentifyDocumentType(JObject indexObject)
{
var image = RetrieveProperty(indexObject, "image");
var result = RodentMatcher.IdentifyRodent(image);
return result == RodentMatcher.SouthAmerican;
}
public override JObject ProcessDocumentType(JObject indexObject, ProcessData processData)
{
var name = GetNameFromFile(processData.FilePath);
JObject updateObject = new JObject();
// Set the filename metadata
updateObject.Add(new JProperty("filename", name));
// Set the type metadata
updateObject.Add(new JProperty("type", "Capybara"));
return updateObject;
}
public override List<DocumentTypeDefinition> GenerateDocumentTypeDefinitions()
{
return new List<DocumentTypeDefinition>
{
new DocumentTypeDefinition(DocumentType, DocumentTypeGrouping, 1, new List<DocumentTypeField>
{
new DocumentTypeField(_scriptSettings.NameNewField,
"Suggested File Name",
SourceType.Index,
FieldType.Text,
"The name the document should have in content server."
),
new DocumentTypeField(RetrieveDescriptionFieldName(),
"Description",
SourceType.Index,
FieldType.Text,
"The description you wish the document to have in content server.",
false
),
new DocumentTypeField(RetrieveCsFolderIdFieldName(),
"Content Server Location",
SourceType.ContentServer,
FieldType.Location,
"The folder in content server where the document should be added."
)
})
};
}
public override JArray GenerateContentManagementSystemMapping(ContentManagementSystem contentManagementSystem)
{
var mapping = new JArray();
mapping.Add(Rodent.Capybara.GenerateMapping());
return mapping;
}
}
Helper Classes
RegexHelper
This is a class for evaluating regular expressions, used to search for patterns that identify a particular document type.
ScriptSettings
The ScriptSettings class is a singleton class that contains the various settings required for the scripts. The values of the settings are stored in the Workflow Plugin configuration file. The settings contain information such as the path to required data files, logging configuration, and Regular Expression command configurations (used for pattern matching).
The ScriptSettings class also defines data for any Content Server Categories and Attributes, to be used by the automation scripts.
Each Category has a name, a nodeId, and contains several attributes. Each attribute has a name and an indexfieldname. The values for these are mapped to values in the configuration file. This mapping works as follows:
Configuration file | Code |
---|---|
CategoryName.AttributeName.FieldName |
|
Example
To set the name and index field name of the Author attribute in the Document Management category in the configuration file, use:
<add key="DocMgmt.Author.AttributeName" value="Author" />
<add key="DocMgmt.Author.IndexFieldName" value="cat_DocMgmt_Author" />
To access them from the code:
var name = _scriptSettings._documentManagement.Author.Name;
var fieldName = _scriptSettings._documentManagement.Author.IndexFieldName;