Featured Post

Top Questions People Ask About Pandas, NumPy, Matplotlib & Scikit-learn — Answered!

Image
 Whether you're a beginner or brushing up on your skills, these are the real-world questions Python learners ask most about key libraries in data science. Let’s dive in! 🐍 🐼 Pandas: Data Manipulation Made Easy 1. How do I handle missing data in a DataFrame? df.fillna( 0 ) # Replace NaNs with 0 df.dropna() # Remove rows with NaNs df.isna(). sum () # Count missing values per column 2. How can I merge or join two DataFrames? pd.merge(df1, df2, on= 'id' , how= 'inner' ) # inner, left, right, outer 3. What is the difference between loc[] and iloc[] ? loc[] uses labels (e.g., column names) iloc[] uses integer positions df.loc[ 0 , 'name' ] # label-based df.iloc[ 0 , 1 ] # index-based 4. How do I group data and perform aggregation? df.groupby( 'category' )[ 'sales' ]. sum () 5. How can I convert a column to datetime format? df[ 'date' ] = pd.to_datetime(df[ 'date' ]) ...

Sqoop Real Use in Hadoop Framework

Why Sqoop you need while working on Hadoop-The Sqoop and its primary reason is to import data from structural data sources such as Oracle/DB2 into HDFS(also called Hadoop file system).



To our readers, I have collected a good video from Edureka which helps you to understand the functionality of Sqoop.

The comparison between Sqoop and Flume

Sqoop

How name come for Sqoop

Sqoop word came from SQL+HADOOP=SQOOP. And Sqoop is a data transfer tool. The main use of Sqoop is to import and export a large amount of data from RDBMS to HDFS and vice versa.


List of basic Sqoop commands

  1. Codegen- It helps to generate code to interact with database records.
  2. Create-hive-table- It helps to Import a table definition into a hive
  3. Eval- It helps to evaluate SQL statement and display the results
  4. Export-It helps to export an HDFS directory into a database table
  5. Help- It helps to list the available commands
  6. Import- It helps to import a table from a database to HDFS
  7. Import-all-tables- It helps to import tables from a database to HDFS
  8. List-databases- It helps to list available databases on a server
  9. List-tables-It helps to list tables in a database
  10. Version-It helps to display the version information

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

5 SQL Queries That Popularly Used in Data Analysis

Big Data: Top Cloud Computing Interview Questions (1 of 4)