Tuesday, December 1, 2015

Cassandra cqlsh




1. cqlsh is a command line utility for issuing query statemets to cassandra or         altering schema's in cassandra.
                    install/bin/cqlsh

2. Some of the options that you can pass to the cqlsh command are                  


  
3. cqlsh has some commands that are not there in Cassandra Query Language
       Below are some such commands

4. Copying data to or from a specified table and csv file
       

5. Default keyspaces in Cassandra:
             a) system_traces
             b) system
          


6.  cqlsh commands and CQL commands
     






Monday, November 30, 2015

Cassandra Nodetool


Introduction to Cassandra Nodetool:

1. Nodetool is the command line utility for managing cassandra cluster.
             /install/bin/nodetool

2. Command to connect to the node other than that you are currently on use         the below command.
           $ bin/nodetool -h 'hostname' -p 'jmx_port' [command] [options]
               > jmx_port is configured in cassandra-env.sh
               > default jmx port is 7199.
          Example:  


3. Nodetool supports over 60 commands including:
         > status
         > info
         > ring 
            Example:
4. Sample outout of nodetool info command.



5.Additional nodetool commands
    















Cassandra



Installing, Configuring and Running  Cassandra locally

1.  Prepare the Operating System
            a)  Install latest Java 7
            b)  Configure JAVA_HOME 
            c)   Install JNA ( Java Native Access) libraries
            d)   Synchronize clocks on each node by using NTP  protocol.
            e)  Disable SWAP      (sudo swapoff -all)
         

 2. Select and install a Cassandra distribution.
         There are three distributions 
            a)  Cassandra Opensource 
            b)  DSE(Datastax Enterprise)
            c)  DSC(Datastax Community)

                Directory Structure after you install Cassandra

   3.Configure Cassandra for the single node 
          Configuration files include
             a) cassandra.yaml
                
             b) cassandra-env.sh
                        
             c)  logback.xml
                      
             d) cassandra-rackdc.properties
             e) cassandra-topology.properties
             f) bin/cassandra-in.sh

4.Start and Stop the cassandra instance.

                a) Starting the instance

                b)  Stopping the instance
                       

            c) System logs



         Summary






Saturday, November 28, 2015

Key Features And Benefits Of Cassandra


 Cassandra provides the following features and benefits.

  1. Massively scalable architecture
  2. Active everywhere design
  3. Linear scalable performance
  4. Continuous availability
  5. Transparent fault detection and recovery
  6. Flexible and dynamic data model
  7. Strong data protection
  8. Tunable data consistency
  9. Multi-data center replication
  10. Data compression
  11. CQL

Tuesday, November 3, 2015

Hive Basics for Begginers #2



11. Loading the multi-delimiter data into hive table
       There are four steps that I follow to load this kind of data    
       a)  Creating a single column table in hive
           hive> create table multi_temp(content String);         


        b) Loading data from local file system to the hive single column table.




       c) Creating the deired table in hive 
     

        d)  Loading the data from single column table to the desired table 
         

12. Loading XML data into hive table.
  We can use the same four step approach that we have used for the multi-deimiter data of which the first three steps are same.
4. Loading data from single column data to the desired table.

   Till now we have executed all the queries performed operations in hive terminal. We can also do this by writing a script and executing it from local terminal.

13. Loading nested XML data into hive table.





14. Creating and Executing Hive Scripts.
    a) creating the hive script
   


     b) Executing the hive script.











Monday, October 12, 2015

Hive basics for Begginers #1


 1. How to get into hive terminal
        $ hive



 2. Command to display the databases in hive 
    hive> show databases;


    Note:  Hive has a default database 'default', if you don't specify any databases it takes the default database.

3. Command to use database;
      hive> use <database_name>
4. Command to list tables in a database
     hive> show tables;
 5. Create table syntax in hive
   hive> create table emp(eid int, ename string, salary int, gender string, dept_no int ); 
6.Load data into Hive table from Local File System 
  hive> load data local inpath '<Local_Directory_Path>' into table <hive_table_name>;



  emp.txt is a coma delimited file, hence we are getting Null's in all the columns. To overcome this we have to modify our query as below

 7.  Creating a table in hive which can accept the coma delimited file
   hive>create table emp_temp(eid int, ename string, salary int, gender string, dept_no int )
           > row format delimited fields terminated by ',';


   8. Loading data into the above created hive table
      hive>load data local inpath '/tmp/HadoopPractice/emp.txt' into table emp_temp;
     

 Note: In the above output you don't see headers(colum names) for the column. So use the below command to set the headers
  hive> set hive.cli.print.header=true;

9. Now as we have data in our table lets do some analysis on the data
hive> select sum(salary) as salaries_sum from emp_temp;


10. Import data from one hive table to another hive table;
     hive>    insert overwrite table <to_table_name> select eid,ename from <from_table_name>;


     
________________________________________________________________________________

More Hive Commands

hive> describe <table_name>;
hive>describe extended <table_name>;
hive>show functions;
hive> set hive.cli.print.header=true;
hive> describe function <function_name>
hive> load data  inpath '/tmp/HadoopPractice/emp.txt' into table emp_temp; (Loading data from HDFS to Hive).

Note: when you load data from HDFS to Hive, the file in HDFS is actually deleted.


By default all the hive databases are stored in  user/hive/warehouse/
[training@localhost /]$  hadoop fs -ls /user/hive/warehouse/;


















Thursday, October 1, 2015

Basic HDFS Commands for Begginers


1. To display files and directories in HDFS
  $ hadoop fs -ls



 2. To create a directory in HDFS
$ hadoop fs -mkdir HDFSPractice



3. To dispaly the contents of the directory
$ hadoop fs -ls <Directory_Name>  
           


Since, our directory is new we don't have any files in the directory. So let's add some files to our directory.



4. Loading files into HDFS from our local file system
 $hadoop fs -copyFromLocal /usr/hadoopPractice/employee.txt HDFSPractice/
   


Note:  1. hadoop is case sensitive so copyFromLocal is different from copyfromLocal.
      
Now lets display the contents of the HDFSPractice directory as in point 3. 


5. Remove/Delete file from HDFS
$ hadoop fs -rm HDFSPractice/employee.txt






6. Loading the file in our HDFS to our local file system
$  hadoop fs -copyToLocal HDFSPractice/employee.txt /usr/





Goals of HDFS
1. Very large distributed file system
          ---- 10k nodes, 10PB data, 100 million files.
2. User Space, runs on heterogeneous OS
3.Optimized for batch processing
          -----  Locations of data exposed so that the computations can move to where data resides
          -----  Provides very high aggregate bandwidth
4.Assumes commodity hardware
         -----  Files are replicated to handle hardware failures.
         -----  Detects the failures and recovers from them.

Thursday, September 3, 2015

Hive

How to handle XML data ?
Using hive function  xpath,xpath_string
step 1:  Create single column table
create table xmldataTable(col1 string);
step 2 : Load data into single column table
load data local inpath 'xmldata' into table xmldataTable;
step 3: create the required table
create table xml_table2(name string, age int,gend string);

step 4: load data from single column table to final table
insert overwrite table xml_table2 select xpath_string(col1, 'rec/name'),xpath_string(col1, 'rec/age'),xpath_string(col1,'rec/sex ') from xmldataTable ;





hive> select * from (select * from emp unionall select * from emp2)e



[cloudera@localhost ~]> hive -e 'select eid from emp'


Running hive scripts
[cloudera@localhost ~]$ gedit hivesc.hive
       hivesc.hive file
            use nareshdb;


[cloudera@localhost ~]$ hive -f  hivesc