e2undel:  home page  /  download

How e2undel works

First, you have to understand that file undeletion is primarily a question of the file system, not the operating system. I don't talk about file undeletion under Linux, but about recovering deleted files on the ext2 file system. Meanwhile, there are many file systems available für Linux (ext2, ext3, ReiserFS, XFS, JFS, and many more). The following applies only to ext2. Even ext3, despite its compatibility with ext2, behaves in one crucial point differently from ext2 (see below), so undeleting files on ext3 requires a completely different approach.

 The ext2 file system 

ext2 is a traditional Unix file system. If a file is stored on disk, there are three parts the must be looked at: The file content; metadata like the file's creation date, owner, access rights, and where on disk the file content is stored; and its name.

Data are stored in "raw" blocks on disks, with a block size of typically 4 kByte. This is the smallest size of data ext2 can address on disk: Even a file containing only one byte occupies 4 kByte (one block) on disk. Files larger than 4 kByte occupy several 4k blocks.

The file's metadata are stored in inodes. An inode contains all information about a file except of its name in an 128 Byte structure (see struct ext2_inode in /usr/include/linux/ext2_fs.h for the layout of an inode on disk): its file type and access rights; user and group id of the owner; acces, creation, modification, and deletion time; the number of used data blocks; and a list of blocks the file content is stored at.

An inode stores up to 12 "direct" block numbers, summing up to a file size of 48 kByte. Number 13 points to a block with up to 1024 block numbers of 32 Bit size each ("indirect blocks"), summing up to a file size of 4 MByte. Number 14 points to a block with numbers of blocks containing numbers of data blocks ("double indirect", up to 4 GByte); and number 15 points to "triple indirect" blocks (up to 4 TByte).

Finally, directories store the file names and the associated inodes. Directories are stored as "normal files" on disk, but are marked as being a directory in the inode. They store a chained list of file names and their associated file names.

 Deleting a file 

If a file is deleted, its name is removed from the directory, and the deletion time in the inode is set. In ext2, the complete inode information is preserved: If you know the inode number, you can use the ext2 low level tool debugfs to get all file information (expect its name), including the block numbers of the data blocks, and assemble the file content by hand from these blocks. ext3 behaves differently: It deletes the block numbers in the inode; so after deleting a file, you can't find its content from the inode information. That is why ext2 undeletion tools usually don't work with the ext3 file system.

Besides the directory and inode action, ext2 marks the blocks used by the file as free in its block bitmap, and it marks the inode as free in its inode bitmap. But because inode data and data blocks are not immediately overwritten, all data necessary to recover a file -- with the exception of the file name -- remain on disk. They might be, however, overwritten at some pipnt of time in the future. How long it takes until this happens depends on factors like file the intensity of file system usage and the available free space on the file system.

 What e2undel does 

e2undel (and also the "lsdel" function of debufs) scans the inode table for inodes with deletion time set and assembles a list of deleted files. For each deleted inode found, it gets the data block numbers and checks in the block bitmap if these blocks are still marked as free or are used by another file. If the number of free data blocks is larger than zero, the file content can be recovered at least partially be simply reading these blocks and writing them to a new file.

This is the point where e2undel and the "lsdel" function of debugfs fail on ext3: Because ext3 removes the block numbers in the inodes of deleted files, the number of blocks is always zero for inodes with deletion time set.

When building the list of deleted files, e2undel reads the first 1024 Byte of each deleted file and uses routines from the file(5) program to determine its file type (if "-t" is given on the command line). Together with the inode information (owner, size, deletion date), the file type is all the user has to identify the files he wants to recover.

 What libundel does 

As said before, the file name is lost when a file is deleted. Because humans usually think of their files in names and not inode or block numbers, libundel tries to preserve the names of deleted files.

The library hooks into the system calls unlink(2) and remove(3). When loaded by each process that deletes file, e.g. by adding it to the environment variable $LD_PRELOAD, it writes the name and the associated inode of each file that is deleted to a log file. This log file is read and evaluated by e2undel. The program than matches this list against the list of deleted inodes found on disk and presents deleted files by name.