My Big Data: November 2015

Monday, November 30, 2015

Cassandra Nodetool

Introduction to Cassandra Nodetool:

1. Nodetool is the command line utility for managing cassandra cluster.

/install/bin/nodetool

2. Command to connect to the node other than that you are currently on use the below command.

$ bin/nodetool -h 'hostname' -p 'jmx_port' [command] [options]
> jmx_port is configured in cassandra-env.sh
> default jmx port is 7199.
Example:

3. Nodetool supports over 60 commands including:
> status
> info
> ring
Example:

4. Sample outout of nodetool info command.

5.Additional nodetool commands

Installing, Configuring and Running Cassandra locally

1. Prepare the Operating System
a) Install latest Java 7
b) Configure JAVA_HOME

c) Install JNA ( Java Native Access) libraries

d) Synchronize clocks on each node by using NTP protocol.

e) Disable SWAP (sudo swapoff -all)

2. Select and install a Cassandra distribution.

There are three distributions

a) Cassandra Opensource

b) DSE(Datastax Enterprise)

c) DSC(Datastax Community)

Directory Structure after you install Cassandra

3.Configure Cassandra for the single node

Configuration files include

a) cassandra.yaml



b) cassandra-env.sh



c) logback.xml



d) cassandra-rackdc.properties

e) cassandra-topology.properties

f) bin/cassandra-in.sh

4.Start and Stop the cassandra instance.

a) Starting the instance

b) Stopping the instance

c) System logs

Summary

Saturday, November 28, 2015

Key Features And Benefits Of Cassandra

Cassandra provides the following features and benefits.

Massively scalable architecture
Active everywhere design
Linear scalable performance
Continuous availability
Transparent fault detection and recovery
Flexible and dynamic data model
Strong data protection
Tunable data consistency
Multi-data center replication
Data compression
CQL

Tuesday, November 3, 2015

11. Loading the multi-delimiter data into hive table
There are four steps that I follow to load this kind of data
a) Creating a single column table in hive
hive> create table multi_temp(content String);

b) Loading data from local file system to the hive single column table.

c) Creating the deired table in hive

d) Loading the data from single column table to the desired table

12. Loading XML data into hive table.
We can use the same four step approach that we have used for the multi-deimiter data of which the first three steps are same.
4. Loading data from single column data to the desired table.

Till now we have executed all the queries performed operations in hive terminal. We can also do this by writing a script and executing it from local terminal.

13. Loading nested XML data into hive table.

14. Creating and Executing Hive Scripts.
a) creating the hive script

b) Executing the hive script.