What is HCatalog?

HCatalog is a table and storage management layer that sits between HDFS and the different tools used to process the data (Pig, Hive, Map Reduce, etc). It presents users with a relational view of the data. It can be thought of as a data abstraction layer.

One of the advantages of using HCatalog is the user does not have to worry about what format the data is stored in. The data can be text, RC file format etc. Also, the user does not need to know where the data is stored.

The data from HCatalog is stored in tables, these tables can be placed in databases.

HCatalog also has a CLI. For example:

hcat.py -f myscript.hcatalog (Executes the myscript script file)

To Create a new table in HCatalog via the GUI (These steps use the Hortonworks Sandbox):

1) Select “Create new table from file”

2) Enter the table name

3) Click “Choose a file” and browse to the data file in HDFS

4) Change the file options (delimiters, encoding, etc), if necessary

5) In the table preview, change the column names or column data types, if necessary

6) Click “Create Table”





