HBase Architecture: Part-1, we Started our discussion of architecture by describing
RegionServers instead of the MasterServer may have surprised you. The term
RegionServer would seem to imply that it depends on (and is secondary to) the
MasterServer and that we should therefore describe the MasterServer first. Although,
the RegionServers do depend on the MasterServer for certain functions, but not
in the sense of a master-slave relationship for data storage and retrieval. In
the upper-left corner of Hbase Architecture diagram, notice that the clients do
not point to the MasterServer, but point instead to the Zookeeper cluster and RegionServers.
MasterServer isn’t in the path for data storage and access — that’s the job of
the RegionServers and the Zookeeper cluster. We’ll cover Zookeeper in the next
part and describe client interaction there; for now, take a careful look at the
primary features of the MasterServer, which is also a software process (or
daemon) like the RegionServers. The MasterServer is there for us and helps to:
the RegionServers in the HBase cluster: The
MasterServer maintains and monitors a list of active RegionServers in the HBase
metadata operations: When a table is created in
Hbase or its attributes are modified (compression setting, cache settings,
versioning, and more) the MasterServer handles all these operations and stores
the required metadata.
regions: The MasterServer is also responsible
for assigning regions to RegionServers.
RegionServer failover: As with any distributed
cluster, we hope that node failures don’t occur and we plan for them anyway.
Whenever region servers fail, Zookeeper notifies the MasterServer so that
failover and restore operations can be initiated. We discuss this topic in
greater detail in the next part.
load balancing of regions across all available RegionServers:
We may recall that tables are consists of regions which are evenly distributed
across all available RegionServers. This is the job of the balancer thread (or
chore, if we prefer) which the MasterServer periodically activates.
(and clean) catalog tables: Two key catalog
tables — labelled ROOT- and .META — are used by the HBase system to help a
client find a particular key value pair in the system.
The -ROOT- table keeps track of the .META table’s location in the cluster.
The .META table keeps track of where each region is located in the cluster.
MasterServer enables management of these important tables in behalf of the
overall HBase system.
the WAL: The MasterServer communicates with the
WAL at the time of RegionServer failover and periodically cleans the logs.
Empowers a coprocessor framework for observing master
operations: Here’s another new term for our growing
HBase glossary. Coprocessors executes in the context of the MasterServer or
RegionServers. For instance, a MasterServer observer coprocessor enables us to
change or extend the normal functionality of the server when operations like
table creation or table deletion take place. Often coprocessors are used to manage
table indexes for advanced HBase systems.
coprocessor, which runs in the context of the MasterServer and or RegionServer
(or both of them), can be used to improve security, create secondary indexes,
and many more things.
with all open source Hadoop technologies, MasterServer operations will likely
change over time as the community of engineers work on innovations designed to
enhance HBase. As if now, however, we have a fairly thorough list that serves
as a high-level reference for the MasterServer.
we have one more important point to make about the HBase MasterServer. There
can and should be a backup MasterServer in any HBase cluster. (Refer to HBase
Architecture diagram above.) There needs to be only one active MasterServer at
any given time, so the backup MasterServer is for failover purposes. We may
recall that the MasterServer isn’t in the data access path for HBase clients.
However, we may also recall (from the list of functions) that the MasterServer
is responsible for actions such as RegionServer failover and load balancing.
The good news is that clients can continue to query the HBase
cluster if the master goes down but for normal cluster operations, the master
should not remain down for any length of time.