Monday, February 2, 2015

OpenTSDB : Understanding OpenTSDB tables

I found most of the basic information of OpenTSBD from the internet. However I took some time to understand the table schema and also the way the uid mapping is done and used across tables.

As of OpenTSDB 1.x there are 4 tables:
hbase(main):001:0> list

Let's understand the two most important tables 'tsdb-uid' and 'tsdb'.

Consider a write operation using telnet

'put sys.mem 1234567890'

sys.mem  is the metric name
1234567890  is the time stamp
host is the tag1 name is the tag1 value

Understanding 'tsdb-uid'

The tsdb-uid maintains the name to uid and uid to name mapping.
Let's consider following uids mapped to each names
sys.mem  => x00x00x01 
host => x00x00x01 => x00x00x01

The tsdb-uid shows the name to uid and the uid to name mapping.

Understanding 'tsdb'

The tsdb table uses the uids generated for each name and generates the rowkey and the columnfamily key.

The rowkey is composed of
3 bytes : metric uid
4 bytes : metric uid. The timestamp rounded to the nearest hour. Example: 12:32 is rounded to 12:00.
3 bytes : tag1 name uid
3 bytes : tag1 value uid

row key =  metric uid + partial time + tag1 name uid + tag1 value uid
row key =  x00x00x01 + 12:00 + x00x00x01 + x00x00x01

The key name for the columnfamily 't' contains the precise timestamp from the rounded time. Example 12:32 is rounded to 12:00, so the remaining seconds 32 is the value for the 't'

Let's see how this looks in the 'tsdb' table

To summarize:
Row key is a concatenation of UIDs and time:
  • metric + timestamp + tagk1 + tagv1… + tagkN + tagvN
  • sys.cpu.user 1234567890 42 host=web01 cpu=0 x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02
  • Timestamp normalized on 1 hour boundaries
  • All data points for an hour are stored in one row
  • Enables fast scans of all time series for a metric
  • …or pass a row key regex filter for specific time series with particular tags 

OpenTSDB Architecture

OpenTSDB architecture
tcollector : A python collection framework used to collect thousand of metrics from Linux 2.6, Apache's HTTPd, MySQL, HBase, memcached, Varnish and more. It also posts the data to tsd servers. 

asynchbase : A hbase client library used by tsd for all interaction with hbase asynchronously.