Migrating a Cassandra Cluster

Featured image


So Cassandra, what in the name of god Cassandra is? Well, it is officially defined as: “Cassandra is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.”

Most of our infrastructure is ran over Containers, Cassandra is not the exception, that being said, we have three nodes of Cassandra running on the Datacenter and we are just using docker and mapped volumes into the host, easy right?. That’s bassically it, when going to production, we needed to migrate all the data on the test envs (compose by the Cassandra Cluster) to the live envs (more Cassandras) now running on both AWS Instances and other hosts on the Data Center, by doing so we migrated 25 tables very easily by leveraging the use of Amazon S3 and Docker. We just ran the following commands for each table and voilá, we moved all the data in around 1 hour.

The Problem

Of course it only took 1 hour after perfectioning the process, without having the following cheat sheets we would have accomplish such task in no less than 4 to 5 hours, as we were moving 25 tables running on multiple cassandras runing in 4 docker containers, if you know containers, you know that their ids can be very extend and write confusing once you have like 30 containers running on a single host. Either way, we were able to do it at the end by doing the following.

The Solution

All the magic happens here:

For Table named keyword_volumes_index

1 cd /cassandra/keywords/data/apollo
2 docker exec a9aab8e4-57ce-4315-974d-4d851520c4ff nodetool snapshot -t backup1 -cf keyword_volumes_index apollo
3 find -name backup1
4 cd /keyword_volumes_index-0fedef70f7bd11e68bc1c595be0001ec/snapshots
5 mv backup1 keyword_volumes_index
6 aws s3 cp . s3://NAME-OF-YOUR-BUCKET/keyword_volumes_index/ --recursive
7 rm keyword_volumes_index --recursive
8 create keyspace and then the table with the existing one
9 cd /var/lib/cassandra/data/apollo/keyword_volumes_indexcfe4ca2024f311e78d16052016a317fb
10 aws s3 cp s3://NAME-OF-YOUR-BUCKET/keyword_volumes_index/ . --recursive
11 nodetool refresh apollo keyword_volumes_index

On step 2 we execute the command “nodetool snapshot -t backup1 -cf keyword_volumes_index apollo” to execture the nodetool command into the docker container, which is the one doing the backup, we call the backup “backup1” and we do it to the table “keyword_volumes_index” from the keyspace “apollo” (this is the name of the database).

The End

After all of this we successfully migrated a Cassandra cluster with no pain nor tears!

I have created a NEW POST talking about how to monitor Cassandra using Prometheus, as once you have a Cassandra (or any other service) you need to not just monitor them but truly manage and monitor them!.