Index Rendition Documentum

How to add an index rendition to your documents in Documentum

When speaking of document transformation, people mostly think about converting a document from one type to another, while preserving all the data from the source document. However, this is not always required. DocShifter thinks one step beyond basic transformation and answers specific needs you might encounter. In this post I will explain how DocShifter can be used to generate an index rendition, which contains all the textual content from conventional and less conventional documents stored in Documentum.

Index rendition

Let’s start off with the meaning of an index rendition. Indexing servers, like xPlore, do not always provide all the functionalities you want. Some file formats are not supported and other documents need OCR first or may need to be bundled for indexing. For example, when archiving an email, you store the body and attachment in Documentum and create relations between the email and attachment. Next, you want to search emails but the system needs to be able to search in the content of both the email and the attachment. A solution for this, is to merge the text of the email and the attachment into one text rendition and add it to the email object as a rendition. This is what’s called an index rendition.

DocShifter

DocShifter offers an Index Rendition module that allows you to automatically create these kinds of renditions. The module extracts the text of one or more documents by leveraging the functionality of other modules, and by merging the results into one file.

The following image displays a DocShifter workflow that creates such a rendition. The workflow consists of 3 steps: first the Documentum input module reads documents from Documentum, then the index rendition gets created and finally, the document is added to the Documentum object.

Input

The input module queries the queue of the Documentum user for requests of a specific type. These requests can be created by TBO’s, SBO’s, xCP workflows, … The input module polls the Documentum queue on regular time frames and extracts the files that have a queue item associated. This module can also be configured to export related documents.

The Documentum sender module has different parameters:

Name Type Mandatory Description
dctm_repository STRING YES The name of the Documentum repository to poll.
dctm_user STRING YES Name of the user used to connect to Documentum. This is a technical user and requires at least READ access to the objects.
dctm_pass PASSWORD YES Password of the user in the dctm_user parameter
renditionType STRING YES The Documentum queue_item message that identifies the queue requests.
dataFields STRING NO Some DocShifter renditions require Documentum metadata for processing. The fields that need to be exported from Documentum can be listed here.
relationName STRING NO The name of the relation who’s document also needs to be read from Documentum. In the case of the example this is the relation linking the email to the attachments.
frequency INTEGER YES The size of the polling interval in which the poller polls from Documentum.
start_date DATETIME NO Start date of the polling period. If not provided then the poller will always start.
end_date DATETIME NO End date of the polling period. If not provided then the poller will not stop.

Transform

The transformation is handled by the Index Rendition module. This module processes the input and saves the content to one text file. The input of this module can be a file or a folder. When the input is a folder, all content will be merged to one text document with the document names as separator. To extract the content out of the document, the Index Rendition module leverages the use of the different modules available in the DocShifter instance to get the most content. For example: When an image-PDF or a Tiff-file enters the module you might believe that these documents do not contain any textual data. However, when the OCR module is installed, the Index Rendition module will use this module to extract text from the image.

The Index Rendition module does not have any parameters.

Output

The Documentum release module saves the result of the workflow to Documentum. This module supports multiple types of releases: as a new rendition, as update of the content or as a version of the object. In case of the index rendition, a new rendition is the correct choice.

The Documentum release module has 4 parameters:

Name Type Mandatory Description
dctm_repository STRING YES The name of the Documentum repository to poll.
dctm_user STRING YES Name of the user used to connect to Documentum. This is a technical user and requires at least WRITE access to the objects.
dctm_pass PASSWORD YES Password of the user in the dctm_user parameter.
dctm_update_type STRING YES The type of release. this can be rendition, update, minor or major.

Configuring Documentum

Documentum must be configured to request renditions. This can be done by using a workflow, TBO, SBO or by other ways to create queue_items.

Configure indexing of (only) the index renditions

For the index rendition a custom format, called index, is created. This format is then configured to be indexed, by configuring two attributes on the format.

The first attribute is can_index (“Full-Text Indexing” in DA), this Boolean enables a format for indexing. However, this only enables the fact that a format can be indexed. Configuring that a format needs to be indexed, even if it is a rendition, is done by adding “ft_always” to the formats format_class attribute. The ft_always class implies that a rendition is always indexed. The class can also be set to ft_prefferred, this gives the rendition preference to be indexed instead of other renditions from whom the format_class is not set. If multiple renditions have a format class of ft_preferred only, the first will be indexed.

Conclusion

The combination of Documentum and DocShifter allows users to quickly find documents in a convenient way. DocShifter fills the voids of your index server by transforming the documents to a simple text format. If you have any questions, you can contact us through our contact page or find more information on the website.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *