Featured Post

PowerCurve for Beginners: A Comprehensive Guide

Image
PowerCurve is a complete suite of decision-making solutions that help businesses make efficient, data-driven decisions. Whether you're new to PowerCurve or want to understand its core concepts, this guide will introduce you to chief features, applications, and benefits. What is PowerCurve? PowerCurve is a decision management software developed by Experian that allows organizations to automate and optimize decision-making processes. It leverages data analytics, machine learning, and business rules to provide actionable insights for risk assessment, customer management, fraud detection, and more. Key Features of PowerCurve Data Integration – PowerCurve integrates with multiple data sources, including internal databases, third-party data providers, and cloud-based platforms. Automated Decisioning – The platform automates decision-making processes based on predefined rules and predictive models. Machine Learning & AI – PowerCurve utilizes advanced analytics and AI-driven models ...

Hadoop: How to find which file is healthy

Hadoop provides file system health check utility which is called "fsck". Basically, it checks the health of all the files under a path It also checks the health of all the files under the '/'(root).
  • BIN/HADOOP fsck / - It checks the health of all the files
  • BIN/HADOOP fsck /test/ - It checks the health of files under the path
By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.
Healthy file checking ides

 How to find which file is healthy

  • It prints out dot for each healthy file
  • It will print a message for each file, if it is not healthy, also for under replicated blocks, over replicated blocks, mis-replicated blocks, and corrupted blocks.
  • By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.

How to delete corrupted blocks

  • BIN/HADOOP fsck -delete block-names
  • It will delete all corrupted blocks
  • BIN/HADOOP fsck -move block-names
  • It will move corrupted blocks to /lost directory
  • Other options we can use with fsck:
    • files
    • blocks
    • locations

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

5 SQL Queries That Popularly Used in Data Analysis

Big Data: Top Cloud Computing Interview Questions (1 of 4)