What is HCatalog?

HCatalog is a table and storage management layer that sits between HDFS and the different tools used to process the data (Pig, Hive, Map Reduce, etc). It presents users with a relational view of the data. It can be thought of as a data abstraction layer.

One of the advantages of using HCatalog is the user does not have to worry about what format the data is stored in. The data can be text, RC file format etc. Also, the user does not need to know where the data is stored.

The data from HCatalog is stored in tables, these tables can be placed in databases.

HCatalog also has a CLI. For example:

hcat.py -f myscript.hcatalog (Executes the myscript script file)

To Create a new table in HCatalog via the GUI (These steps use the Hortonworks Sandbox):

1) Select “Create new table from file”

2) Enter the table name

3) Click “Choose a file” and browse to the data file in HDFS

4) Change the file options (delimiters, encoding, etc), if necessary

5) In the table preview, change the column names or column data types, if necessary

6) Click “Create Table”





One thought on “What is HCatalog?

  1. Pingback: Wordcount with Pig | Hadoop Rookie

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s