My Big Data: October 2015

Monday, October 12, 2015

Hive basics for Begginers #1

1. How to get into hive terminal
$ hive

2. Command to display the databases in hive
hive> show databases;

Note: Hive has a default database 'default', if you don't specify any databases it takes the default database.

3. Command to use database;

hive> use <database_name>

4. Command to list tables in a database

hive> show tables;

5. Create table syntax in hive

hive> create table emp(eid int, ename string, salary int, gender string, dept_no int );

6.Load data into Hive table from Local File System

hive> load data local inpath '<Local_Directory_Path>' into table <hive_table_name>;

emp.txt is a coma delimited file, hence we are getting Null's in all the columns. To overcome this we have to modify our query as below

7. Creating a table in hive which can accept the coma delimited file

hive>create table emp_temp(eid int, ename string, salary int, gender string, dept_no int )

> row format delimited fields terminated by ',';

8. Loading data into the above created hive table

hive>load data local inpath '/tmp/HadoopPractice/emp.txt' into table emp_temp;

Note: In the above output you don't see headers(colum names) for the column. So use the below command to set the headers

hive> set hive.cli.print.header=true;

9. Now as we have data in our table lets do some analysis on the data

hive> select sum(salary) as salaries_sum from emp_temp;

10. Import data from one hive table to another hive table;

hive> insert overwrite table <to_table_name> select eid,ename from <from_table_name>;

________________________________________________________________________________

More Hive Commands

hive> describe <table_name>;

hive>describe extended <table_name>;

hive>show functions;

hive> set hive.cli.print.header=true;

hive> describe function <function_name>

hive> load data inpath '/tmp/HadoopPractice/emp.txt' into table emp_temp; (Loading data from HDFS to Hive).

Note: when you load data from HDFS to Hive, the file in HDFS is actually deleted.

By default all the hive databases are stored in user/hive/warehouse/

[training@localhost /]$ hadoop fs -ls /user/hive/warehouse/;

Thursday, October 1, 2015

Basic HDFS Commands for Begginers

1. To display files and directories in HDFS
$ hadoop fs -ls

2. To create a directory in HDFS
$ hadoop fs -mkdir HDFSPractice

3. To dispaly the contents of the directory
$ hadoop fs -ls <Directory_Name>

Since, our directory is new we don't have any files in the directory. So let's add some files to our directory.

4. Loading files into HDFS from our local file system
$hadoop fs -copyFromLocal /usr/hadoopPractice/employee.txt HDFSPractice/

Note: 1. hadoop is case sensitive so copyFromLocal is different from copyfromLocal.

Now lets display the contents of the HDFSPractice directory as in point 3.

5. Remove/Delete file from HDFS
$ hadoop fs -rm HDFSPractice/employee.txt

6. Loading the file in our HDFS to our local file system
$ hadoop fs -copyToLocal HDFSPractice/employee.txt /usr/

Goals of HDFS
1. Very large distributed file system
---- 10k nodes, 10PB data, 100 million files.
2. User Space, runs on heterogeneous OS
3.Optimized for batch processing
----- Locations of data exposed so that the computations can move to where data resides
----- Provides very high aggregate bandwidth
4.Assumes commodity hardware
----- Files are replicated to handle hardware failures.
----- Detects the failures and recovers from them.