HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.
HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as ‘bin/hadoop fsck‘.
Runs a HDFS filesystem checking utility.
hadoop fsck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
|<path>||Start checking from this path.|
|-move||Move corrupted files to /lost+found|
|-delete||Delete corrupted files.|
|-openforwrite||Print out files opened for write.|
|-files||Print out files being checked.|
|-blocks||Print out block report.|
|-locations||Print out locations for every block.|
|-racks||Print out network topology for data-node locations.|