Monday, October 12, 2015

Hive basics for Begginers #1


 1. How to get into hive terminal
        $ hive



 2. Command to display the databases in hive 
    hive> show databases;


    Note:  Hive has a default database 'default', if you don't specify any databases it takes the default database.

3. Command to use database;
      hive> use <database_name>
4. Command to list tables in a database
     hive> show tables;
 5. Create table syntax in hive
   hive> create table emp(eid int, ename string, salary int, gender string, dept_no int ); 
6.Load data into Hive table from Local File System 
  hive> load data local inpath '<Local_Directory_Path>' into table <hive_table_name>;



  emp.txt is a coma delimited file, hence we are getting Null's in all the columns. To overcome this we have to modify our query as below

 7.  Creating a table in hive which can accept the coma delimited file
   hive>create table emp_temp(eid int, ename string, salary int, gender string, dept_no int )
           > row format delimited fields terminated by ',';


   8. Loading data into the above created hive table
      hive>load data local inpath '/tmp/HadoopPractice/emp.txt' into table emp_temp;
     

 Note: In the above output you don't see headers(colum names) for the column. So use the below command to set the headers
  hive> set hive.cli.print.header=true;

9. Now as we have data in our table lets do some analysis on the data
hive> select sum(salary) as salaries_sum from emp_temp;


10. Import data from one hive table to another hive table;
     hive>    insert overwrite table <to_table_name> select eid,ename from <from_table_name>;


     
________________________________________________________________________________

More Hive Commands

hive> describe <table_name>;
hive>describe extended <table_name>;
hive>show functions;
hive> set hive.cli.print.header=true;
hive> describe function <function_name>
hive> load data  inpath '/tmp/HadoopPractice/emp.txt' into table emp_temp; (Loading data from HDFS to Hive).

Note: when you load data from HDFS to Hive, the file in HDFS is actually deleted.


By default all the hive databases are stored in  user/hive/warehouse/
[training@localhost /]$  hadoop fs -ls /user/hive/warehouse/;


















Thursday, October 1, 2015

Basic HDFS Commands for Begginers


1. To display files and directories in HDFS
  $ hadoop fs -ls



 2. To create a directory in HDFS
$ hadoop fs -mkdir HDFSPractice



3. To dispaly the contents of the directory
$ hadoop fs -ls <Directory_Name>  
           


Since, our directory is new we don't have any files in the directory. So let's add some files to our directory.



4. Loading files into HDFS from our local file system
 $hadoop fs -copyFromLocal /usr/hadoopPractice/employee.txt HDFSPractice/
   


Note:  1. hadoop is case sensitive so copyFromLocal is different from copyfromLocal.
      
Now lets display the contents of the HDFSPractice directory as in point 3. 


5. Remove/Delete file from HDFS
$ hadoop fs -rm HDFSPractice/employee.txt






6. Loading the file in our HDFS to our local file system
$  hadoop fs -copyToLocal HDFSPractice/employee.txt /usr/





Goals of HDFS
1. Very large distributed file system
          ---- 10k nodes, 10PB data, 100 million files.
2. User Space, runs on heterogeneous OS
3.Optimized for batch processing
          -----  Locations of data exposed so that the computations can move to where data resides
          -----  Provides very high aggregate bandwidth
4.Assumes commodity hardware
         -----  Files are replicated to handle hardware failures.
         -----  Detects the failures and recovers from them.