Out-of-the-box, HBase uses the built-in automatic split functionality to split a large region. Essentially, after a major compaction, there is for every region one single StoreFile per column family. StoreFiles save the data using the HFile file format internally. The region server splits the StoreFile that exceeds hbase.hregion.max.filesize in the middle (on block boundaries) and creates two new StoreFiles out of the one.
Due to our key design and the irregular growth of data, we get a lot of regions for a table where the region size ranges from 200MB to 40GB. This high region count makes things noticeably slower. The recommended number of regions per region server ranges from 20 to low-hundreds.
As stated in a previous post one way to deal with this issue is to disable auto-split and manage the splitting manually. To turn off automatic splitting just set hbase.hregion.max.filesize to a high value such as 40GB or even higher. That’s what we did.
But, how do I monitor my region growth and distribution on the cluster?
And even more importantly, how well are the regions split (size) for each table?
What is going on in particular regions? How long do compactions take?
As none of the tools shipped with HBase or Ganglia (the monitoring tool we use) could help us with these questions, we decided to build our own solution: Hannibal.
So finally, we came up with Hannibal: a web based tool to visualise region sizes, their distribution and the compaction history. The visualisations help you to make decisions about manual splitting.
Hannibal is open source and implemented in Scala. In it’s current version it supports HBase 0.90. Support for versions > 0.90 is planned and will be added soon. Please have a look at the github project, install Hannibal, play around with it and let us know what you think, what you like, what you don’t, or what additional features you would like to see.
My talk “Operating HBase: Things you need to know” at ApacheCon Europe will cover the above topics and introduce Hannibal as well.
Which HBase region splitting mechanism do you use and how do you solve the problems that arise from it?
We just uploaded a Video Tutorial:
Let me hear what you think.