MD5 Sums: What they are
and How to use them
(with Links to Free MD5 Programs)

Copyright©2003,2004,2006 by Daniel B. Sedory
[Do not reproduce in any form without permission from the author.]


Introduction
Who uses MD5 Sums?
How to Create and Use MD5 Sums
The Commonly Accepted Format for an *.MD5 File
Details about some Free Programs:
   • How to use hkSFV (a Windows GUI )
   • md5sum (by bruce@gridpoint.com) (a 32-bit Command-Line)
   • md5deep (32-bit Command-Line, Recursive sums; new version 1.0)
   • md5 (with RFC1321 parameter usage)
References
MD5 Tools
Other MD5 Resources

Introduction

An MD5 sum (or Message Digest 5 checksum) is a 16-byte (128-bits) Hexadecimal number ( written as 32 characters using the digits 0-9 and A-F or a-f ) that results from performing a series of calculations on digital data according to a mathematical algorithm devised by Ronald Rivest of MIT; documented in detail in RFC 1321. The amount of data that you can make an MD5 sum from has no limit; this means it can be used to make an MD5 sum from every byte on a hard drive, individual files or nothing at all. That's right, you can calculate the MD5 sum of a file which has no data. These are often called "zero-length" files, and they will all produce the same exact MD5 sum on any computer; that sum being:

d41d8cd98f00b204e9800998ecf8427e

The reason MD5 sums are so useful is because they are practically unique* and can be used as a digital fingerprint (or signature) for whatever file or data they were created from. Therefore, you can use MD5 sums to verify that the files you just downloaded from some web site far away are exactly the same as the ones the author made. The only other way to know if two files are exactly the same (or if you want to know why their MD5 sums don't agree) is by comparing every byte of the files using an FC (File Compare) DOS command or some type of byte by byte comparison program. It's also important to know that the appearance of an MD5 sum is no indicator of the nature of the data it was created from (such as a file's contents). For example, two MD5 sums with completely different hex bytes in every one of their 16-byte locations could easily have come from files that differ by only a single binary bit! So, an MD5 sum that's completely different than one given for say, some huge .ZIP file you just downloaded does not necessarily mean that the whole file is a total waste; it might simply mean that you had a problem with only a few bytes near the very end (and could easily recover most of the files inside it by using a .ZIP-repair program).

Who Uses MD5 Checksums?
DOS Collectors, MP3 Music Fans, Security Experts; Anyone Who wants to Know for sure that they have Exactly the Same File others Do, or that a File has not changed !

Once DOS collectors have created and verified the MD5 "fingerprints" for all the files on an old DOS diskette, those MD5 Sums can also be used to test for any degradation in both their digital and physical copies. As a matter of fact, it's a good idea to make an "image file" of all your original diskettes and create MD5 sums for these image files too; this is what Forensics experts do to make sure that nothing has changed on the disks they examine. Music collectors use MD5 sums all the time to make sure that their MP3 files are exactly the same as an original after downloading it. The Operating System files of an important server for a company or military organization are often checked against the known good MD5 checksums (stored elsewhere) from the time the OS was first installed or last upgraded. And if you burn your own CDs, it's an easy way to check for any physical changes in the disc media (remember, only factory pressed CDs will last for many decades, but CD-R discs rely on dyes and are bound to fail much sooner!).

Creating and Using MD5 Checksums

If you have a *nix OS (such as Linux), there should already be an md5sum program on your system. For example, if you enter md5sum at a command prompt, you should see a blank line appear on your screen. If you then type in the character for that system's "end-of-file" marker (on a Linux machine, you'd normally press the CTRL + D keysto create the "end-of-file" character), the MD5 sum for a zero-length file should appear on your screen; followed by two spaces and a "-". To find out more about how to use the program, enter md5sum --help at your Linux or UNIX prompt, or read its man page.

If you have a Windows OS on your computer, there are a variety of programs available for both the GUI and Command-Line user, but many of them either have no means of checking a file against stored MD5 sums, or fail to produce an output in the accepted format; without at least some editing on your part. Some programs which do produce *.md5 files in the accepted format will be discussed shortly (or see links under MD5 Tools below).

The Commonly Accepted Format for an *.md5 File

The format for .md5 files (the commonly accepted extension for files containing MD5 sums) has been more or less agreed upon by the majority of those who use MD5 sums for verifying file downloads on the Internet. But this format should be quite acceptable to DOS Collectors (especially Linux users), since it's basically the same output you get from the UNIX md5sum command :

[ NOTE: Although technically all DOS or Windows filenames output by the md5sum command are supposed to have an asterisk in front of them (in column 34), all of the Windows MD5 programs I tested don't seem to care if it's a blank space instead; as a matter of fact, I've recently been informed that this is an archaic carryover that possibly never should have been placed into the Linux documentation!  Here are the relevant lines from my RedHat Linux md5sum --help  output screen ( emphasis shows the parts that should be reviewed by those responsible for such matters ):
"
    -b,    --binary     read files in binary mode (default on DOS/Windows) " and
"
When checking , the input should be a former output of this program. The default mode is to print a line with checksum, a character indicating type ( '*' for binary, '  ' for text ) and name for each FILE."
Furthermore, I tested the checking function of the Linux  md5sum program with *.md5 files having many comment lines in them (lines beginning with a semicolon), and it ran just the same with or without them; so comment lines are quite acceptable under Linux as well as some newer Windows MD5 programs. ]

The order of the lines inside an *.md5 file shouldn't matter for any program which compares the sums to the files themselves; the order in these files is often the same as that output by a simple directory listing. Sorting them alphabetically by filenames or numerically by md5sums might be helpful if a person needs to look at the data for some reason. Comment lines can be used to identify, for example, which version of a DOS OS the collection of MD5 sums corresponds to (since the file names are often exactly the same for many different versions)! Here's an example of an *.md5 format with a number of comment lines at the beginning of the file: Windows 98 SE Startup Disk MD5 Text File.



How to use hkSFV with .md5 Files

( NOTE: Do not use this program without first reading this page:
  How I set up my own hkSFV install . )

Since most of you probably use a Windows OS, let's take a look at a fairly new GUI program that can be used to both create and check .md5 files: It's called hkSFV (reflecting the fact that it was first made for .sfv files which use only CRC checksums). Upon installation, it will associate any file having an *.MD5 extension with itself, so clicking on an *.md5 filename not only opens the program, but causes it to immediately begin checking the MD5 sums of anything that's listed in the file. Clicking on one of the .md5 files (see link to .ZIP file below) with a Windows 98 SE Startup Disk in your A: drive gives the following output:

hkSFV's help file is fairly comprehensive, but for any unanswered questions there are forums at the web site for discussing them! ( NOTE: At the time of this writing, hkSFV was unable to create an .md5 file from a CD or any write-protected disks. Yet, it's quite easy for hkSFV to check an .md5 file against such media using a Pathway! The authors said they will fix this problem in the next release, but I don't know if it will work yet. I made the file you see above (W98SEDSK.md5) by using my own perl script (with MD5 support) and editing its output to include the "A:\" pathway for each file on the diskette.)

Here are my MD5 files for the Windows 98 and Windows 98 SE Startup Disks in a .ZIP file (don't forget to insert the Startup Disk you want to check before clicking on the .md5 file): MD5 Files of Win98 Boot Disks, or for the Windows ME and Windows XP Startup Disks: MD5 Files of WinME/XP Startup Disks.

 


How to use the md5sum.exe Program

At this time, I was able to find only one Command-Line tool that outputs sums in the 'required format' without adding any extra comments such as copyright notices, ads, etc. Simply download md5sum.exe into your C:\WINDOWS, C:\WINNT or other directory that's in your PATH. Unfortunately, this program must be run with the DOS prompt at the directory you want to obtain the MD5 sums for, or you'll get some really frustrating error messages which state: "No such file or directory" right after the filename it says doesn't exist! I'm still looking for a better program, or may even write one of my own. But for now, you should note the following steps (and example) for using this program:

Step 1: Change the DOS prompt to the directory you wish to create the .md5 file from. (This is much easier to do if you install Microsoft's "Command Prompt Here" Powertoy for Win2000/XP! Similar Registry/OS functionality is available for Win9x as well if you look for it.)
For example, let's say your DOS box prompt is at C:\WINDOWS when you open it. If you want to create an '.md5' file from all the files in the temp\dos
directory of your D: drive, you'd first have to switch to that drive: D: (ENTER) and then 'cd' to its temp\DOS directory. Using the "Command Here" addition, you just right-click on the directory and select the "Command (Prompt) Here" item!

Step 2: At the prompt, enter: md5sum followed by a filespec such as *.* to create sums for all the files in that directory, and finally finish the entry with a re-direct symbol (">") pointing to the full-path and filename of your new .md5 file. For example, here's the command you'd enter to create the MD5 sums of all the .COM files in the root directory of a diskette in your B: drive and redirecting the output to an new file (called pdos330s.md5) in your C:\TEMP directory:

B:\>md5sum *.com > c:\temp\pdos330s.md5

which created the following pdos330s.md5 file from my own Phoenix Computer MS-DOS 3.30 (OEM) Supplemental Programs (backup) diskette:

6924fd81513d827a6ca91472f7e9eeeb *BACKUP.COM
5b12878faa52117af2ef16668b62cd7c *DEBUG.COM
aa2bb7fa6539119c3805d4b73230dd64 *RESTORE.COM
67b1da798e9e6d7c6dbcd322bed44233 *TREE.COM

In order to check the MD5 sums in the .md5 file against those on the present diskette, you once again need to be in the directory for the files to be checked, then enter md5sum followed by the "-c" switch and the full-path to the .md5 file like this:

B:\>md5sum -c c:\temp\pdos330s.md5
BACKUP.COM: OK
DEBUG.COM: OK
RESTORE.COM: OK
TREE.COM: OK

 


The md5deep.exe Program (v. 1.0)

MD5deep is a cross-platform program to compute MD5 sums on an arbitrary number of files. The program is known to run on Windows, Linux, FreeBSD, OS X, Solaris, and should run on most other platforms. md5deep can now use *.md5 files created by such programs as hkSFV (with comment lines) md5deep is similar to the md5sum program found in Linux, but has the following additional features:

For version 1.0, the author (Jesse Kornblum) has created many new examples to explain the use of various switches under md5deep! (See the link to his site below.)

Here is md5deep's usage information display (note the -h):

C:\TEMP>md5deep -h
md5deep version 1.0 by Jesse Kornblum.

Usage:
$ md5deep [-v|-V|-h] [-m|-M|-x|-X <file>] [-resbt] [-o fbcplsd] FILES

-v - display version number and exit
-V - display copyright information and exit
-h - display this help message and exit
-m - enables matching mode. See README/man page
-x - enables negative matching mode. See README/man page
-M and -X are the same as -m and -x but also print hashes of each file
-r - enables recursive mode. All subdirectories are traversed
-e - compute estimated time remaining for each file
-s - enables silent mode. Suppress all error messages
-o - Only process certain types of files:
f - Regular File
b - Block Device
c - Character Device
p - Named Pipe (FIFO)
l - Symbolic Link
s - Socket
d - Solaris Door

md5deep will also accept the switches: -t and -b just to remain compatible with the older *nix/Win32 md5sum program, but they are ignored; and do not affect its output.

md5deep is a very useful program for creating *.md5 files. The output of md5deep does not use an '*' (asterisk) in front of the file names, but as we stated above, many Windows programs that check the output of *.md5 files don't seem to care one way or the other about that. The following is a copy of the output md5deep gave with a Windows 98 Startup Disk in the A: drive using these parameters:

C:\TEMP>md5deep A:\*.COM A:\IO.SYS A:\MSDOS.SYS
b067cc477e113932e0a997e9d5e4319d  A:\COMMAND.COM
4823258556ae481a19015c22e33e8a9e  A:\IO.SYS
659d373aefa0966f804ca7d0304c3118  A:\MSDOS.SYS

Here's how you would create an *.md5 file for your Windows XP Startup Disk:

C:\TEMP>md5deep A:\*.* > WinXPSD.md5

And to its credit, if you then clicked on the file "WinXPSD.md5" with hkSFV installed on your Windows system, it would be able to check on the MD5 sums for all the files of any Windows XP Startup Disk in your A: drive! [ Here's a copy of that WinXPSD.md5 file; created by md5deep using the DOS redirect symbol (>). You should also note that the MD5 sums for both its AUTOEXEC.BAT and CONFIG.SYS files are: d41d8cd98f00b204e9800998ecf8427e. Do you recall what that means? Look briefly at the top of this page again... Do you know now? I like to think of this as the "foo bar" sum; some of you might know what that means, because the hex digits "f00b" can be thought of as the beginning of that phrase. The repetition of "98" three times: in front of the "f00b" and around "009" and/or the digit "d" at the beginning and "e" at the end might help you remember it. The point for the Win XP Startup Disk is that both of these files are zero-length; or empty! So booting up your system with a Windows XP Startup Disk will only give you an A:\> prompt on the screen! And if you type in the command ver, you'll find out that you've got the same COMMAND.COM file as a Windows ME user! In short, you should either add more files to this disk or go find yourself some other boot disk, because 'as is' this thing is almost worthless to you! One very useful program I'd add is a disk editor and some of the utility programs that Microsoft erased from the original diskette! Take this link for all the details of the Windows XP Startup Disk. ]


An md5.exe DOS (16-Bit) Program (Compiled by 3L Ltd.)

See the comments below (under MD5 Tools) for a link to 3L Limited. This program is an adaptation of the source code presented in RFC1321 by Ronald Rivest and conforms to the parameters listed in that document.

Although the output of this program does not conform to the format discussed above, it can be very useful when you're limited to running under DOS (real 16-bit mode). It will, however, also function correctly in a 32-bit DOS-box under Windows.

If you run this program against a copy of itself by entering: md5 md5.exe, the output will appear as follows:

MD5 (md5.exe) = ab5f8d3485f9a0e660dd93f151f8c03c

Since this program conforms to the parameter usage of the code in RFC1321, you also have the choice of entering "-s ", "-x " or "-t " on the command line.

For the -s (string) parameter, you must place a string of characters immediately after the "-s" with no intervening spaces (unless you use double-quote marks around the string)! Here are some examples to clarify this usage:

C:\TEMP>md5 -snospaceshere
MD5 ("nospaceshere") = 201fbc7441fa2a66b72bbed1247d1379

C:\TEMP>md5 -s"this string uses quote marks"
MD5 ("this string uses quote marks") = d3581f9f0a3ef0b5dd342603e7824fcd

C:\TEMP>md5 -s"message digest"
MD5 ("message digest") = f96b697d7cb7938d525a2f31aaf161d0

C:\TEMP>md5 -s
MD5 ("") = d41d8cd98f00b204e9800998ecf8427e

For -x, you should see the following on your display; except for the ' / '(red slash) where the lines were wrapped:

C:\TEMP>md5 -x
MD5 test suite:
MD5 ("") = d41d8cd98f00b204e9800998ecf8427e
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("abc") = 900150983cd24fb0d6963f7d28e17f72
MD5 ("message digest") = f96b697d7cb7938d525a2f31aaf161d0
MD5 ("abcdefghijklmnopqrstuvwxyz") = c3fcd3d76192e4007dfb496cca67e13b
MD5 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789")/
= d174ab98d277d9f5a5611c2c9f419d9f
MD5 ("1234567890123456789012345678901234567890123456789012345678901234/
5678901234567890") = 57edf4a22be3c955ac49da2e2107b67a

and something similar (for -t ; most likely with a different time and speed) to this:

C:\TEMP>md5 -t
MD5 time trial. Digesting 10000 5000-byte blocks ... done
Digest = 7870442513c89405972bca276a153ca3
Time = 6 seconds
Speed = 8333333 bytes/second

If you simply execute the program without any parameters (md5 ENTER) and enter text data from the standard input device, MD5.EXE will simply output a single MD5 sum on the next line (multiple lines of text are acceptable too). For example, if you type in the text "message digest" (without the quote marks) and enter the appropriate End of File character for MS-DOS / Windows; a ^Z (use the CTRL and z keys) immediately after the text (on the same line), you should get the MD5 sum for a file that contains only that text. Here's the output as it should appear on your screen:

C:\TEMP>md5
message digest^Z
f96b697d7cb7938d525a2f31aaf161d0

Although this program will not accept 'wildcards' in the file name; you can compute the MD5 sum for a small number of files at the same time by entering multiple file names (separated by a space) on the same command line. There is no usage help built into the program!


References

RFC 1321 (Request for Comments # 1321 by R. Rivest, MIT Laboratory for Computer Science and RSA Data Security, Inc. April 1992). Here are a couple sites with copies of this document (if you do a Google search for 'rfc1321.txt' or 'rfc1321.html' you'll find many others): http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1321.txt,
ftp://ftp.isi.edu/in-notes/rfc1321.txt or http://www.fourmilab.ch/md5/rfc1321.html .

*Practically Unique: It is not possible to prove with 100% certainty that any two files could never have the same MD5 sum. As a matter of fact, some recent studies may indicate that it is possible. But the odds against it are so great, that for all practical purposes you might as well consider it as impossible. In order to understand some of the remarks that have been made about the security of MD5, it's necessary to cover some terminology from the field of cryptography:

First, MD5 is called a hash function. These functions can have an input (a message or digital data) of any length, but their output (called hash values) will always have the same fixed length for a given type of function. [Although it's part of its name (MD5), these values should be called a message digest only when they result from applying a hash function to a message; thus, the terms 'sum' or 'checksum' being more general are the ones often used in relation to the MD5 of computer programs and other files.]

Although I'm still interested in providing an explanation for more of the technical aspects of MD5 in my own words, for now you'll have to study the following pages on your own (along with reading RFC 1321) for all of the details:
The RSA Security Crypto FAQ: http://www.rsasecurity.com/rsalabs/faq/.html "RSA Laboratories' Frequently Asked Questions About Today's Cryptography, Version 4.1"
The following sections of that FAQ pertain to MD5 or terminology used about it:
Section: 3.6.6 "What are MD2, MD4, and MD5?",
Section: 2.1.6 "What is a hash function?" and in the
CryptoBytes Technical Newsletter: Volume 2, No. 2 - Summer 1996 (Acrobat .PDF, 357k)
"The Status of MD5 After a Recent Attack" by Hans Dobbertin. Add to that another article in the RSA Laboratories’ Bulletin News and advice from RSA Laboratories: Number 4 - November 12, 1996 (Acrobat .PDF, 235k) "On Recent Results for MD2, MD4, and MD5" by M.J.B. Robshaw of RSA Labs, and you'll probably have enough to get a very good idea about MD5's usefulness even if it can't be proved with 100% certainty that no two programs will have the same checksums.


MD5 Tools:

1. hkSFV For Win9x/2000/XP -- I used Version 2.0.1 (build 84); dated, October 30, 2002. I'm hoping there will be a new one soon that allows the creation of MD5 sums from a CD! (Old link: http://www.big-o-software.com/products/hksfv/) The original site no longer exists, but you can still download it from DOWNLOAD.COM here:
New Link: Download hkSFV from here!

2. md5sum.exe (Note: Caution! There are many programs by this name!) This one is a command-line tool for Windows by bruce@gridpoint.com.
Link: http://www.etree.org/md5com.html


3. md5deep.exe (This program can be downloaded as a compiled binary for Windows, or as Source Code which compiles easily under Linux / FreeBSD / Solaris and other *nix systems.) The Program was written by Special Agent Jesse Kornblum of the United States Air Force Office of Special Investigations.
Link: http://md5deep.sourceforge.net/ (Note: MD5deep is now at version 1.12, but we haven't had time to check the commands and functionality; except for the fact that it now includes SHA1, SHA256, Whirlpool and Tiger hashes as well. SHA256 is the new FIPS standard!)


4
. md5.exe ( The same MD5 program found in my BCTEST.ZIP download; which is part of my Basic Course in Forensics/Data Recovery. It will function under either real 16-bit DOS or in a DOS-box under Windows. ) The Program is an adaptation of Ron Rivest's original code in RFC1321 and as such its output does not conform to the "Commonly Accepted Format" for an .MD5 file as described above! However, it can still be very useful in finding the MD5 Sum for a small number of files especially if you must do so under a Real (16-bit) DOS environment! Although I have an e-mail from the 3L Ltd. company stating I can distribute this file with my own software, I thought it only fair to reference them as the party who actually compiled this program: http://www.shen.myby.co.uk/threel/tech/tools/md5.htm [dead link!].


5. More links to be added at a later date.


Links to Other MD5 Resources:

The ("unofficial") MD5 Homepage:
http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html.


Last Updated: 28 OCT 2006 (28/10/2006).


 

The Starman's Realm Index Page