Tag Archives: dd

Imaging Using dcfldd

In this example, a 128MB USB thumb drive will be imaged on a Linux system using dcfldd onto a 1GB USB thumb drive.  dcfldd is an improved version of dd; most of the syntax is identical, just a few functions have been added.  It is important to locate the name that Linux uses to refer to both the USB drives that will be used in the imaging process.  This can be done by entering sudo fdisk –l in a terminal window.  This will list all the disks that Linux sees, as well as where in the /dev directory it is located.  In this example, the USB drive that will be imaged is located at /dev/sdb, and the drive that the image will be saved on is /dev/sdc.

Figure 1: Displaying disk names
Figure 1: Displaying disk names

It is important to write protect the drive to be imaged as soon as possible after it has been attached to the computer.  While a properly configured forensic Linux machine will not write to the evidence disk, it is good to take precautions to block write attempts, both from the system and the user.  Now that the drive’s location is known, the next step is the change the permissions.

The command ls –lha /dev | grep sd will list all the files in the /dev folder that contain the letters sd.  Since all the disks being used contain sd in the name, this will filter out all the devices that are not of interest.  This command allows the user to view the permissions of the drives; as it is now, both the owner of sdb and root can write to sdb.  To change this, use the chmod command.  Entering sudo chmod 440 /dev/sdb sets the permissions for the disk sdb so that root and the owner can both only read, not write.  Enter ls –lha /dev | grep sd again to view the new permissions and verify that this is the case.

Figure 2: Displaying permissions
Figure 2: Displaying permissions

The next step is to use the dcfldd utility to create a copy of the drive.  In this case, an image will be created of the first partition on the sdb device, so the source will be /dev/sdb1.  By invoking the mount command, it can be seen that the destination drive has been mounted as /media/disk.  The command to create the image is as follows (enter as one line):

dcfldd if=/dev/sdb1 of=/media/disk/test_image.dd hash=md5,sha1 hashlog=/media/disk/hashlog.txt

Next each of the options in this dcfldd command will be discussed.  The if parameter identifies the source of the data to be imaged, in this case, /dev/sdb1.  The of option directs dcfldd where to write the output of the data acquisition.  One nice feature about dcfldd is that multiple of paths can be specified, allowing for multiple copies of the image to be created simultaneously.  This is useful if the examiner wants to create a local copy of the image, and a remote backup or archival copy on a network file server or local tape drive.  Special caution should be used when specifying if and of.  If the write blocking fails or is not used at all, switching these two parameters will result in the blank destination drive being copied on top of and overwriting the evidence drive.  Because of the dire consequences of such a mix-up, the original dd was jokingly thought to stand for ‘data destroyer.’  The next parameters are what make dcfldd so much better for forensic purposes than dd.  The hash attribute allows the user to specify what kind of cryptographic hash algorithms will be applied to the data.  The default is MD5, but in this example both MD5 and SHA-1 will be used.  The final attribute, hashlog, specifies where the output of the hashing should be directed; in this case, it will be to a text file in the same directory as the disk image.

While the image is being created, dcfldd will display a line that shows how many blocks have been written, and how many megabytes that corresponds to.  Once the image process has completed, a message will appear indicating how many complete blocks were copied.  The block size can be specified as a flag in the dcfldd command by adding bs=[block_size]; the default is 512 bytes.  If the number of blocks is followed by a +0, then exactly that many complete blocks of data were written.  If the number is followed by a +1, that means that that many complete blocks of data were written, plus one partial block of data.

Once the image has been created, it is very important that it be verified that it is indeed an exact, bit-for-bit copy of the original data.  There are a few ways that this can be done.  One method is to use dcfldd again.  If the following command is run, it will hash both the source (specified with if) and the file given by vf and report if their hash values match.  If they are the same, it will report Match; if not, it will report Mismatch.

dcfldd if=/dev/sdb1 vf=/media/disk/test_image.dd verifylog=/media/disk/verifylog.txt

Another method to verify that the two are identical is to directly hash both files and compare.  The programs md5 and sha1 perform their respective hash function on the file specified.  Referring back to the file that was imaged earlier in this example, if the user were to enter sudo md5 /media/disk/test_image.dd  /dev/sdb1, and compare the two returned hash values, they should be the same.  Also, because the hash flag was set when dcfldd was run, the hashlog file has the calculated hash values already, so those may be referenced as well.   If the hashes match, the image creation process was successful.  Otherwise, the whole process can be repeated; sometimes errors in copying the data will cause verification to fail.  Note that, if even one bit of data has been altered, the two sets of data will have drastically different hash values.

Figure 3: Hashing and comparing values
Figure 3: Hashing and comparing values


Creating a Forensically-Sound Image

The first step in any forensic data recovery operation or computer forensic investigation is to create an exact duplicate of the media to be examined.  As a rule, in most cases analysis should never be performed on the original media, as the investigative process can make irrecoverable changes to the source data.  Since the original cannot be used, it becomes imperative to make an exact copy of the original that investigators can examine.  This is commonly referred to as making a bitstream image.  It also is possible to simply copy each bit of data from one hard disk to another, but problems arise when the hard disks are not exactly the same size as it can be hard to tell when the copied data ends and the data that was already on the disk begins.  This problem is adverted by copying the data into a file called an image.  Files are much easier to handle, and can be split and recombined if necessary for transport or storage.  Some image files use compression schemes to decrease the size of the image while others do not.

There are a myriad of computer programs available to create these images, but by far the most ubiquitous is a program known as dd.  dd began as a Unix utility that was used to perform low level reads and writes between various devices.  It has spawned many offspring, including ports over to Windows and newer improved versions for Linux such as dcfldd developed specifically with forensic purposes in mind.  dcfldd’s primary improvement over dd is the ability to hash files to ensure that they are exact copies of one another.

Cryptographic hash functions are mainstays in the computer forensic field.  These functions take any size block of data, perform a one-way algorithm to it, and return a fixed length string called a hash value.  These functions are constructed in such a way so that if a single bit of data is different between two files, they will have drastically different hash values.  The two most popular hashing algorithms are known as MD5 and SHA-1.  Both of these have been around for over ten years, and some weaknesses have been discovered in them, but they are still used widely by the forensic community.  After a bitstream image has been made of the original data, one or both of these hash functions are commonly run on both sets of data, and the hash values compared.  If they match, the image is an exact bit-for-bit copy of the original.  This hash value is many times referred to as a digital fingerprint, as the chance of running across two different sets of data with the same hash value is nearly a mathematical impossibility.

One of the cardinal rules of a forensic analyst is to never alter the original evidence.  Since most common operating systems, namely Windows, are constantly making changes to the active hard drive without the knowledge of the average user, it becomes important to take other steps to ensure that the original data is not inadvertently changed.  Most forensic analysts would put forth that the safest way to image a hard drive is to physically remove it from the suspect machine and attach it to an examination machine via a hardware writer blocker.  A hardware write blocker is like a one-way valve; it only allows information to flow from the hard drive to the computer, not the other way around.  The computer can attempt to write things to the drive, and on some operating systems even appear to do so, but hardware write blockers do not allow any data to actually be modified.  Some flash drives, memory cards and floppy disks have built in limited hardware write blocking in the form of a switch or tab on the side of the media.  While this works in most cases, it is always seen as a good practice to use write blocking devices that have been vetted by the forensic community.  In any case, it is preferable to have some sort of backup write blocking to ensure that no data is inadvertently altered.

While hardware write blockers are preferable in the vast majority of cases, they tend to be rather expensive and their price puts them out of reach for most students or the casual forensic investigator.  The next most viable option is known as software write blocking.  Software write blocking can be achieved in many ways.  For example, in Linux it is possible to mount a device as read only; this blocks nearly all write attempts, but as there is nothing physically preventing writes, they still are possible.  Some Linux distributions made with forensics in mind mount all devices as read-only by default; Helix takes this approach.  If the investigator would like to perform the examination from a machine running Windows, it is possible to use the registry to write-protect all USB storage devices.  When the hard drive is attached via a USB cable or USB card reader, this approach can be an inexpensive and effective alternative.

All the methods outlined above fall under the category of imaging colloquially known as ‘dead imaging.’  Dead imaging is when the hard drive that is to be imaged has been powered off by some means and will be imaged by the investigator outside of its original computer.  This method has the advantage of performing the image with a set of known-good programs that have not been tampered with.  If the imaging were to be done on the original computer, advanced users may have altered basic processes to hide data or a rootkit may interfere with the imaging process.  Dead imaging when done properly is very unlikely to alter any of the original data.

The alternative to dead imaging is ‘live imaging.’  Live imaging is when the investigator is able to get to the computer to be imaged while the hard drive is still in the computer and powered on.  This method, in which it is nearly impossible not to alter any the original data, is sometimes the only one available.  For example, the disk may be encrypted, and if it is powered off all the data on it will be shielded from the investigator by the encryption.  Also, if the system is powered off, all the data in the RAM will be lost to the investigator, because RAM is a form of volatile storage.  Whether data should be acquired by live or dead imaging should be evaluated on a case-by-case basis, as each incident and the circumstances around it is unique.  All the examples of imaging in this paper are of the dead imaging variety.

Another important consideration is where the image will be stored.  It is considered a good practice in the forensic community to image to a wiped storage device to avoid any possibility of data contamination.  While saving an image file on a disk with preexisting data on it is perfectly fine if the hash value of the image matches the original, it is generally seen as a better practice to eliminate all other data prior to imaging to the drive to quash any question of the image being tainted.  Hence, many investigators use a completely separate hard drive for each image, never imaging to the hard drive containing the operating system for their examination machine.