推特 阿里云技术文档正文

TSQL10分钟快速入门_TSQL 开发指南_时序数据库 TSDB_时序时空数据库TSDB

admin 阿里云技术文档 2020-02-11 187 0
阿里云服务器优惠

TSQL10分钟快速入门

概要描述

我们现在以一个机房性能监控为应用场景,通过一个工具产生的样本数据,来展示如何TSQL来完成时序查询。

样本数据

这个样本数据使用了一个时序性能测试工具benchmark (https://github.com/influxdata/influxdb-comparisons)中的样本数据生成工具。在安装并编译这个工具后,你可以使用下面两步生成数据并加载到TSDB引擎中。

生成数据

  1. cd influxdb-comparisons/cmds
  2. bulk_data_gen/bulk_data_gen --seed=123 --use-case=devops --scale-var=10 --format=opentsdb --timestamp-start="2019-03-01T00:00:00Z" --timestamp-end="2019-03-01T00:10:00Z" > tsdb_devops_sf10_10m_seed123.json

下面显示了样本数据的抽样:

  1. {"metric":"redis.evicted_keys","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2951}
  2. {"metric":"redis.keyspace_hits","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2945}
  3. {"metric":"redis.keyspace_misses","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":2944}
  4. {"metric":"redis.instantaneous_ops_per_sec","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":65}
  5. {"metric":"redis.instantaneous_input_kbps","timestamp":1551398990000,"tags":{"arch":"x86","datacenter":"us-east-1b","hostname":"host_9","os":"Ubuntu16.10","port":"1470","rack":"7","region":"us-east-1","server":"redis_29176","service":"14","service_environment":"production","service_version":"0","team":"LON"},"value":58}

加载数据

  1. cat tsdb_devops_sf10_10m_seed123.json | bulk_load_opentsdb/bulk_load_opentsdb --urls=http://your_tsdb_host:port_num -workers=5

查询

  • 时间范围查询:查看一个metric在一个时间段内的所有的列,包括值,时间戳,以及对应的tag key的值
  1. select *
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  • 时间范围查询:查询一个metric在一个时间段内的指定的列,包括值,时间戳,以及具体某几个tag key的值
  1. select `value`, `timestamp`, hostname, datacenter
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  • 时间范围 + tagkey条件查询:查询一个metric在时间段内的值,时间戳, 并且hostname满足 IN-LIST的条件
  1. select `value`, `timestamp`, hostname, datacenter
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
  4. hostname in ('host_0', 'host_2', 'host_4')
  • 查询结果按时间戳排序:查看一个metric在一个时间段内的值,时间戳,以及对应的tag key的值
  1. select *
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  4. order by `timestamp`
  • 含数学计算表达式的值过滤条件查询: 查看一个metric在一个时间段内的值,时间戳,以及对应的tag key的值, 值满足其平方根> 1.5
  1. select *
  2. from tsdb.`cpu.usage_system`
  3. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10' and
  4. sqrt(`value`) > 1.5
  • 分组聚合查询: 按照hostname, datacenter来分组,计算每个分组最大值,最小值,平均值
  1. select
  2. hostname,
  3. datacenter,
  4. max(`value`) as maxV,
  5. min(`value`) as minV,
  6. avg(`value`) as avgV
  7. from tsdb.`cpu.usage_system`
  8. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:00:10'
  9. group by hostname, datacenter
  • 分组分时间段聚合查询,按照hostname, datacenter来分组,并且进一步按照2分钟的间隔分组,计算最大值,最小值,平均值
  1. select
  2. hostname,
  3. datacenter,
  4. tumble(`timestamp`, interval '2' minute) as ts,
  5. max(`value`) as maxV,
  6. min(`value`) as minV,
  7. avg(`value`) as avgV
  8. from tsdb.`cpu.usage_system`
  9. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
  10. group by hostname, datacenter, ts
  • 分组分时间段聚合查询, 计算聚合后的表达式的值max(value) - min(value) + 0.5* avg(value`)
  1. select
  2. hostname,
  3. datacenter,
  4. tumble(`timestamp`, interval '2' minute) as ts,
  5. max(`value`) - min(`value`) + 0.5* avg(`value`) as compV
  6. from tsdb.`cpu.usage_system`
  7. where `timestamp` between '2019-03-01 00:00:00' and '2019-03-01 00:10:00'
  8. group by hostname, datacenter, ts
  • 计算每台机器上,相邻两个时间戳上的记录值之间的差值

下面的例子使用window function lag(), lag()函数使用一个基于hostname做分组,时间戳排序的窗口frame, 返回在同一个窗口内当前记录的前一条记录的值,通过计算两者之差,获取每台主机上相邻时间戳的记录值上的差值。

  1. select hostname, `timestamp`, `value`,
  2. `value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
  3. from tsdb.`cpu.usage_system`
  4. where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
  • 计算每台机器上,相邻两个时间戳上的记录值之间的差值, 如果差值超过异常值,则重设成0

下面的例子,把上面的查询放在一个子查询中,在子查询之外用了一个case表达式,来表达如果查询中的差值超过50.0 (认为是异常值), 则重设成0.0

  1. select hostname, `timestamp`, `value`,
  2. case when diff > 50.0 then 0.0
  3. else diff
  4. end
  5. from (
  6. select hostname, `timestamp`, `value`,
  7. `value` - lag(`value`) over(partition by hostname order by `timestamp`) as diff
  8. from tsdb.`cpu.usage_system`
  9. where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
  10. );
  • 计算每台机器上一分钟内的记录的最大值,以及相邻分钟之间最大值的差值

下面的查询区别于之前差值计算,在于要获取每分钟的最大值的差值。我们可以在子查询中计算每分钟的最大值,在子查询外用窗口函数lag()计算获得相邻分钟之间最大值的差值。

  1. select hostname, ts, maxValue,
  2. maxValue - lag(maxValue) over(partition by hostname order by ts) as diff
  3. from (
  4. select hostname,
  5. tumble(`timestamp`, interval '1' minute) ts, max(`value`) maxValue
  6. from tsdb.`cpu.usage_system`
  7. where `timestamp` between '2019-03-01' and '2019-03-01 00:10:00'
  8. group by hostname, ts)
版权声明

本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。

评论

-----