Reveal Processing Archiving
Processing Archival Process – Leave In Place
The most practical option for archiving is to simply leave your processing project in place. Append the prefix “Archive-“ to the processing project name and this will alert Reveal to not count this toward your inactive totals, as it currently doesn’t count as peak-active after the first month of processing.
Processing Archival Process – SaaS Full Export
An alternative method of archiving that might be useful is to export a complete dataset of content processed in the tool. A Parent/Child Export using the guidelines below can be stored in a state ready to load into a Review Platform on demand. This is not fundamentally different from a normal export, but the guidelines below may be helpful to get everything out of the case prior to deletion.
Important
Disclaimer – Restoration
This process is meant to export all necessary content from a case, but it cannot be loaded directly back into the Reveal Processing application. The output will however be a Concordance-delimited DAT with Text and Natives that can be easily loaded into Reveal Review via the Review Manager application. The export should also work with any review platform that supports third-party deliverables with Concordance delimiters.
Important
Disclaimer – QC and Unprocessed Source Archives
This process is useful if there is not a significant number of errored or corrupt archives in the dataset. If there are several corrupt, incomplete or password protected archives present, keep in mind that the Parent/Child export will not include content from these. As such, it may be worthwhile to additionally preserve the source material if this had not yet been done.
High Level Process
Decide whether to scope or export all data from case, and whether or not to exclude previously exported documents.
Choose settings after reviewing recommendations at bottom.
Upload archive to S3 Browser on Load Machine to preserve a copy.
Optional Scoping
While the entire database can be exported, a user can also scope documents to a population that excludes previously exported content. Scoping can also be used to split the archive export into multiple smaller exports, perhaps along custodian lines. To achieve these delineations, you can use either Selective Sets or the scoping mechanisms in the Export Module:
Exclude Previously Exported Content
On the Export module page, about halfway down there is a modal to Exclude Previously Exported Documents. This will exclude the exact FileIDs previously exported.
Check the box underneath the heading to enable the functionality, then exclude all relevant prior exports. As exports could have been used for testing, overlays or third-party deliverables, it is recommended that these are evaluated individually rather than checking every previous export.
Export Criteria / Scope Selection
If no scope criteria are selected in the Export module, the export will contain all documents in the project. At the top of the Export page, there is a scoping mechanism where a subset of documents can be exported along multiple dimensions. If the case is extremely large, it may be worthwhile to divide the export among custodians or imports. A Selective Set can be used if multiple dimensions need to be used when scoping the archive export into smaller chunks.
Recommended Parent Child Settings
After the scoping is decided, the settings recommended are shown below.
Export Name – ARCHIVE_YYYYMMDD_X (using X to specify sub-scopes if exporting archive in more than one piece).
Type - Parent/Child
General Settings
Deduplication – Project Level
While deduplicating will significantly cut down the number of documents to export, if the custodian/data priority is not clear it may be worthwhile to export duplicate records. In most other cases Project Level Deduplication is recommended.
Export Natives – For All Documents
Export Images – Do Not Export Images
Images are not recommended as they will significantly increase archive size.
Fulltext Priority – OCR Text
OCR Text is recommended: this will use OCR text where present, and Extracted everywhere else. Only one text file will be exported per record, but since Natives are included in the export, additional text can be generated as necessary in future.
Add Export to Review Population – No
Document Level – No
Enable Volume and Folder Options – No
Export Text Files Separately – Yes
Generate Placeholders for No Extracted Text – No
File Numbering Settings
File Numbering – Custom Incremental or Incremental
Prefix – ARCHIVE_ or existing preproduction prefix
While incremental can be used with the same prefix as previous exports from the case, a custom incremental range paired with a prefix may be helpful if you are exporting all documents not just unexported ones.
Padding Length – 8
Start Number – (1 if using Custom Incremental / ARCHIVE_ prefix)
Native Options
PST/OST Handling – MSG
MHT can also be used if space is a concern, but MSG is truer to the original format of the record for Exchange email.
Apply to Loose/Attached Email – No
This will ensure attachments and loose email will be exported in their original format rather than converted.
NSF Handling – MHT
Sort Options
Leave as default (pictured below)
Load File Options
Load File Type – Flat File
Export Field Names as First Line – Yes
Text Encoding – UTF-8 (Unicode)
Date Format – MM/dd/YYYY for US (This can be adjusted as desired for region)
Time Format – h:mm tt (This can be adjusted as desired for region)
Bool Format – TRUE/FALSE (This can be adjusted as desired for platform)
Delimiter Format: Concordance Default, Comma (020) / Quote (254) / Newline (174)
Fields: All fields except FULLTEXT and EMPTYFIELD (customize as you see fit)
Note
NOTE that the fulltext field is unnecessary as the text files are exported separately with pathing specified in other fields.
Additional Resources
Export settings are also described in more detail in the user guide:
https://processing-help.revealdata.com/en/Export-Module.html
Export Field descriptions can be found in the user guide:
https://processing-help.revealdata.com/en/APPENDIX-D---Export-Load-File-Field-Descriptions.html