HDFS fsck command

HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.

HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as ‘bin/hadoop fsck‘.

Runs a HDFS filesystem checking utility.

hadoop fsck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]

COMMAND_OPTION Description
<path> Start checking from this path.
-move Move corrupted files to /lost+found
-delete Delete corrupted files.
-openforwrite Print out files opened for write.
-files Print out files being checked.
-blocks Print out block report.
-locations Print out locations for every block.
-racks Print out network topology for data-node locations.

Source – https://hadoop.apache.org/docs/r1.2.1/commands_manual.html

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s