Processing Environment Overview
Depending on the overall goal of an organization, RP may be setup differently. To understand how to setup RP, the functions and purpose of each application should be understood.
RP SQL Server (Mandatory)
All components connect to SQL. The database schema (the InstanceName_Manager,InstanceName_NIST and InstanceName_AuditLog databases, taking the name assigned to each SQL Server instance) is created the first time the Discovery Manager connects to the SQL Server within an environment. Initially the InstanceName_NIST database which allows for DeNISTing will be empty. If DeNISTing is a required component, please see the NIST Database Installation section below.
Every project created will have its own SQL database which uses the naming convention of InstanceName_100, InstanceName_101, etc. A database may jump up increments of 1000, every time the SQL Server is rebooted. So, it is possible to see database InstanceName_100, InstanceName_101, InstanceName_1002, etc.
More than one instance of SQL Server may be used by Reveal Processing. Each will require the InstanceName_NIST schema to be created, and each will require NIST Database Installation in order to enable deNISTing across the platform.
RP Storage (Mandatory)
Storage is critical to the success of RP. Depending on the setup, there can be 5 storage locations. The faster the connection between the storage locations, the better the performance. Antivirus configurations must be considered when administering these storage locations (See section Antivirus & Microsoft Security Essentials Configurations below for additional information). If any of these storage locations run out of space, the system will hang on the current operation it is running, or it can crash the system. Distributed Processing Jobs and Imports have built in resume functionality to recover from these events. However other operations do not have this functionality, which may require additional rework.
SQL Storage
This is the storage location for SQL Server. A typical project can see a replication rate of 8% - 15% in relation to the size of the preprocessed source data.
Source Folder
This is client and workflow dependent. When applied, a Source Folder is a network share where all source data is staged for import purposes. If this workflow is used, this data can be removed from the Source Folder after the data has been imported and all exceptions have been remediated. Data can also be imported directly from attached and shared external hard drives instead of staged to a Source Folder, but this is typically avoided due to performance concerns and the potential for failure of the external drive.
Processing Folder
When processing data, a copy of all original non-filtered native files are copied from the Source Folder to the project’s Processing Folder. Metadata and text files are extracted from these files. Technicians set this folder at project creation time, and it is typically a network share. This will typically be the largest folder in RP, and the most critical for RP.
Note
It is important to monitor the free space within this folder, as it must be available for the project to function properly.
RP Desktop - Discovery Manager (Mandatory)
Discovery Manager serves as the primary point of interaction between end users and the processing platform. Within Discovery Manager, application technicians will create processing projects, initiate the import and export of data and perform exception remediation. When setting up RP for the first time, the Discovery Manager must be connected to the SQL Server before any other component, as it creates the RP database schema within SQL Server. For best performance, please make sure to use the Recommended Specifications listed above in the System Overview Diagram.
RP Desktop - Discovery Agent
The Discovery Agent hosts the Windows Service ‘Reveal Processing Service’. When configured and running, this service performs all distributed processing tasks for all processing jobs initiated through the Discovery Manager including data import, export, OCR of documents, indexing, email threading, etc. Because the Discovery Agents are performing the actual processing, the number of configured agents will be directly proportional to the overall throughput of the system. The recommended machine spec will ensure that ideal processing throughput is achieved while balancing the cost of resources.
RP Network Communication
When deploying the RP Platform, the following ports and network communication should be taken into consideration. As stated below, all components will need access to the RP Storage and RP SQL Server.
RP Storage
All components of Reveal Processing require access to the file system. The specific ports required by this are dependent on the protocol being used. For example, SMB uses TCP/445 directly, while via NetBIOS it uses UDP 137/138 and TCP 137/139.
RP SQL Server
The SQL Server utilizes TCP port 1433. Communication over this port is required by all components of Reveal Processing.