Top Key Architecture Components in HIVE

- October 11, 2015

5 architectural components present in Hadoop Hive: Shell: allows interactive queries like MySQL shell connected to a database – Also supports web and JDBC clients Driver: session handles, fetch, execute Compiler: parse, plan, optimize Execution engine: DAG of stages (M/R, HDFS, or metadata) Metastore: schema, location in HDFS, SerDe

Data Mode of Hive:

Tables

– Typed columns (int, float, string, date, boolean)
– Also, list: map (for JSON-like data)

Partitions

– e.g., to range-partition tables by date

Buckets

– Hash partitions within ranges (useful for sampling, join optimization)

HIVE Meta Store

Database: namespace containing a set of tables
Holds table definitions (column types, physical layout)
Partition data
Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other relational databases

Physical Layout of HIVE

Warehouse directory in HDFS

– e.g., /home/hive/warehouse

Tables stored in subdirectories of warehouse

– Partitions, buckets form subdirectories of tables

Actual data stored in flat files

– Control char-delimited text, or SequenceFiles
– With custom SerDe, can use arbitrary format

Search This Blog

ApplyBigAnalytics

Featured Post

Step-by-Step Guide to Creating an AWS RDS Database Instance

Top Key Architecture Components in HIVE

Comments

Post a Comment

Popular posts from this blog

Step-by-Step Guide to Reading Different Files in Python

SQL Query: 3 Methods for Calculating Cumulative SUM

PowerCurve for Beginners: A Comprehensive Guide