HDFS Job command

Command to interact with Map Reduce Jobs.

hadoop jobĀ  [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobOutputDir>] | [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>]

-submit <job-file> Submits the job.
-status <job-id> Prints the map and reduce completion percentage and all job counters.
-counter <job-id> <group-name> <counter-name> Prints the counter value.
-kill <job-id> Kills the job.
-events <job-id> <from-event-#> <#-of-events> Prints the events’ details received by jobtracker for the given range.
-history [all] <jobOutputDir> -history <jobOutputDir> prints job details, failed and killed tip details. More details about the job such as successful tasks and task attempts made for each task can be viewed by specifying the [all] option.
-list [all] -list all displays all jobs. -list displays only jobs which are yet to complete.
-kill-task <task-id> Kills the task. Killed tasks are NOT counted against failed attempts.
-fail-task <task-id> Fails the task. Failed tasks are counted against failed attempts.
-set-priority <job-id> <priority> Changes the priority of the job. Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

Source: https://hadoop.apache.org/docs/r1.2.1/commands_manual.html

HDFS fsck command

HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.

HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as ‘bin/hadoop fsck‘.

Runs a HDFS filesystem checking utility.

hadoop fsck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]

<path> Start checking from this path.
-move Move corrupted files to /lost+found
-delete Delete corrupted files.
-openforwrite Print out files opened for write.
-files Print out files being checked.
-blocks Print out block report.
-locations Print out locations for every block.
-racks Print out network topology for data-node locations.

Source – https://hadoop.apache.org/docs/r1.2.1/commands_manual.html