Skip to main content

Reveal Processing

APPENDIX C - MD5 Hash Generation

During import all original files are given an MD5 Hash Value which is used when identifying duplicates within Discovery Manager. The following table describes the data used to generate the MD5 Hash per Document Type. In addition to the email metadata properties listed in the table below, the following normalization process is used when creating an email MD5 Hash:

  • Milliseconds are removed from all time values.

  • Recipients are sorted by email address alphanumerically.

  • Display Names are not used.

  • Attachments are sorted by filename alphanumerically.

  • All whitespaces, hard line returns, and non-alphanumeric characters are removed from the email body leaving only letters and numbers.

  • Whitespaces, hard line returns, and non-alphanumeric characters are not removed from the email subject.

Document Type

Values Used To Generate MD5 Hash

Efiles (Including Efile Attachments)

Generated on the bit stream of the file

Outlook Items1

Date Sent, Sender Email Address, Recipient Email Addresses, Subject, Body, Attachment Names, Attachment Size

Lotus Notes Items

Memo, Reply, Notice

From, DateSent, SendTo, CopyTo, BlindCopyTo, Attachment Name ($FILE), Subject, Body

Appointment

Subject, Chair, STARTDATETIME, Location, EndDateTime, RequiredAttendees, RepeatDates, OptionalAttendees, FYIAttendees, Attachment Name($FILE), Body

Task

Subject, DateSent, STARTDATETIME, DueDateTime, Principal, AssignedTo, OptionalAssignedTo, FYIAssignedTo, Body, AttachmentName ($FILE)

Non Delivery Report

Subject, IntendedRecipient, FailureReason, From, DateSent, SendTo, CopyTo, BlindCopyTo, Attachment Name ($FILE), OriginalSubject, Body

Delivery Report, Return Receipt

DateSent, Subject, IntendedRecipient, From, AttachmentName($FILE), OriginalSubject, Body, SendTo, CopyTo, BlindCopyTo

Unrecognized Forms

All properties except UNID

Note

1 The above fields can be adjusted within Project Settings shown below. You can remove fields, which will identify more duplicates, however it will create more false positives.

Picture1.png