Featured Post

14 Top Data Pipeline Key Terms Explained

Image
 Here are some key terms commonly used in data pipelines 1. Data Sources Definition: Points where data originates (e.g., databases, APIs, files, IoT devices). Examples: Relational databases (PostgreSQL, MySQL), APIs, cloud storage (S3), streaming data (Kafka), and on-premise systems. 2. Data Ingestion Definition: The process of importing or collecting raw data from various sources into a system for processing or storage. Methods: Batch ingestion, real-time/streaming ingestion. 3. Data Transformation Definition: Modifying, cleaning, or enriching data to make it usable for analysis or storage. Examples: Data cleaning (removing duplicates, fixing missing values). Data enrichment (joining with other data sources). ETL (Extract, Transform, Load). ELT (Extract, Load, Transform). 4. Data Storage Definition: Locations where data is stored after ingestion and transformation. Types: Data Lakes: Store raw, unstructured, or semi-structured data (e.g., S3, Azure Data Lake). Data Warehous...

How to Use Help Command in HDFS

Sometimes as a Hadoop developer it is difficult to remember all the Hadoop commands. The HELP command useful to know the correct syntax.

----

How to List all HDFS Commands

hadoop
hadoop fs   ==> Enter 

This will list all Hadoop commands.

Help Command in HDFS

Hadoop commands are the flavor of UNIX. If you want to see each Command description, you can go for Hadoop help command. You can use the below command for help.

hadoop fs -hlep ls 

Deleting a File in Hadoop HDFS

The below command helps   how to delete a file from Hadop cluster.

hadoop fs -rm exmaple.txt

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Big Data: Top Cloud Computing Interview Questions (1 of 4)