In HBase Architecture: Part-1, we Started our discussion of architecture by describing RegionServers instead of the MasterServer may have surprised you. The term RegionServer would seem to imply that it depends on (and is secondary to) the MasterServer and that we should therefore describe the MasterServer first.
Although, the RegionServers do depend on the MasterServer for certain functions, but not in the sense of a master-slave relationship for data storage and retrieval. In the upper-left corner of Hbase Architecture diagram, notice that the clients do not point to the MasterServer, but point instead to the
Zookeeper cluster and RegionServers.
The MasterServer isn’t in the path for data storage and access — that’s the job of the RegionServers and the Zookeeper cluster. We’ll cover Zookeeper in the next part and describe client interaction there; for now, take a careful look at the primary features of the MasterServer, which is also a software process (or daemon) like the RegionServers. The MasterServer is there for us and helps to:
Monitor the RegionServers in the HBase cluster: The MasterServer maintains and monitors a list of active RegionServers in the HBase cluster.
Handle metadata operations: When a table is created in Hbase or its attributes are modified (compression setting, cache settings, versioning, and more) the MasterServer handles all these operations and stores the required metadata.
Assign regions: The MasterServer is also responsible for assigning regions to RegionServers.
Manage RegionServer failover: As with any distributed cluster, we hope that node failures don’t occur and we plan for them anyway. Whenever region servers fail, Zookeeper notifies the MasterServer so that failover and restore operations can be initiated. We discuss this topic in greater detail in the next part.
Oversee load balancing of regions across all available RegionServers: We may recall that tables are consists of regions which are evenly distributed across all available RegionServers. This is the job of the balancer thread (or chore, if we prefer) which the MasterServer periodically activates.
Manage (and clean) catalog tables: Two key catalog tables — labelled ROOT- and .META — are used by the HBase system to help a client find a particular key value pair in the system.
• The -ROOT- table keeps track of the .META table’s location in the cluster.
• The .META table keeps track of where each region is located in the cluster.
The MasterServer enables management of these important tables in behalf of the overall HBase system.
Clear the WAL: The MasterServer communicates with the WAL at the time of RegionServer failover and periodically cleans the logs.
Empowers a coprocessor framework for observing master operations: Here’s another new term for our growing HBase glossary. Coprocessors executes in the context of the MasterServer or RegionServers. For instance, a MasterServer observer coprocessor enables us to change or extend the normal functionality of the server when operations like table creation or table deletion take place. Often coprocessors are used to manage table indexes for advanced HBase systems.
A coprocessor, which runs in the context of the MasterServer and or RegionServer (or both of them), can be used to improve security, create secondary indexes, and many more things.
As with all open source Hadoop technologies, MasterServer operations will likely change over time as the community of engineers work on innovations designed to enhance HBase. As if now, however, we have a fairly thorough list that serves as a high-level reference for the MasterServer.
Finally, we have one more important point to make about the HBase MasterServer. There can and should be a backup MasterServer in any HBase cluster. (Refer to HBase Architecture diagram above.) There needs to be only one active MasterServer at any given time, so the backup MasterServer is for failover purposes. We may recall that the MasterServer isn’t in the data access path for HBase clients. However, we may also recall (from the list of functions) that the MasterServer is responsible for actions such as RegionServer failover and load balancing. The good news is that clients can continue to query the HBase cluster if the master goes down but for normal cluster operations, the master should not remain down for any length of time.