数据类型
TSDB 2.0中, 数据会根据数据类型, 组织为不同的格式进行存储. 数据类型在数据的插入,存储和查询过程中属于关键的特征. 合理的设置数据类型是正确使用TSDB数据库的关键.
TSDB 2.0 中, 数据类型的名称被视为关键词, 如果column的名称与数据类型重名, 需要对该名称进行转义.
类型分类
基础类型
基础类型描述具备原子性(atomic),不可再分割的基础元素.
Boolean
基础的布尔值, 接受true
与false
两个枚举值.
cr> create table my_bool_table (
... first_column boolean
... );
CREATE OK, 1 row affected (... sec)
Text
包含多个unicode字符的字符串
cr> create table my_table2 (
... first_column text
... );
CREATE OK, 1 row affected (... sec)
注意: text文本也支持 创建全文索引, 支持模糊搜索.
字符串的长度被限制为32766字节(编码为UTF-8). 但当字符串被全文索引或者列存(Column store)模式被禁用时除外.
Numeric
名称 | 大小 | 描述 | 范围 |
---|---|---|---|
smallint |
2 bytes | 小范围整数 | -32,768 - 32,767 |
integer |
4 bytes | 整数 | -2^31 - 2^31-1 |
bigint |
8 bytes | 大范围整数 | -2^63 - 2^63-1 |
real |
4 bytes | 可变精度 | 6位精度浮点数 |
double precision |
8 bytes | 可变精度 | 15位精度浮点数 |
real
与double precision
类型是变长存储,非精确的数值类型, 他们内部是尽量保持浮点数精度. 因此针对这些浮点数的存储,计算和查询均不保证其提供精确的浮点值精度. 因此, 诸如比较浮点数, 对浮点数做聚合(sum/avg)可能会产生非精确的结果.
特殊浮点数引用
TSDB 2.0支持对real
与double precision
类型的浮点值进行特殊值引用, 引用标准遵循 IEEE 754.支持的特殊浮点数应用: NaN
, Infinity
, -Infinity
(negative infinity) 以及 -0
(signed zero)
cr> SELECT 0.0 / 0.0, 1.0 / 0.0, 1.0 / -0.0;
+-------------+-------------+---------------+
| (0.0 / 0.0) | (1.0 / 0.0) | (1.0 / - 0.0) |
+-------------+-------------+---------------+
| NaN | Infinity | -Infinity |
+-------------+-------------+---------------+
SELECT 1 row in set (... sec)
这些特殊浮点值的引用也可以在数据插入的使用使用.
cr> create table my_table3 (
... first_column integer,
... second_column bigint,
... third_column smallint,
... fourth_column double precision,
... fifth_column real,
... sixth_column char
... );
CREATE OK, 1 row affected (... sec)
cr> INSERT INTO my_table3 (fourth_column, fifth_column)
... VALUES ('NaN', 'Infinity');
INSERT OK, 1 row affected (... sec)
IP
IP类型支持将IPv4或IPv6地址进行高效存储和查询. 在内部,IP地址会转化为bigint进行存储. 支持高效排序,过滤和聚合.
cr> create table my_table_ips (
... fqdn text,
... ip_addr ip
... );
CREATE OK, 1 row affected (... sec)
cr> insert into my_table_ips (fqdn, ip_addr)
... values ('localhost', '127.0.0.1'),
... ('router.local', '0:0:0:0:0:ffff:c0a8:64');
INSERT OK, 2 rows affected (... sec)
cr> insert into my_table_ips (fqdn, ip_addr)
... values ('localhost', 'not.a.real.ip');
SQLActionException[ColumnValidationException: Validation failed for ip_addr: Cannot cast 'not.a.real.ip' to type ip]
Ip 地址类型支持二进制操作符 <<
来进行 CIDR 子网标记. 语法为 [ip address/prefix_length]. e.g. ‘192.168.1.5’ << ‘192.168.1/24’.
日期/时间
名称 | 大小 | 描述 | 范围 |
---|---|---|---|
timestamp with time zone |
8 bytes | 带时区的时间戳 | 292275054BC - 292278993AD |
timestamp without time zone |
8 bytes | 不带时区的时间戳 | 292275054BC - 292278993AD |
时间戳标记
时间戳由 日期和时间共同组成, 并且可选的带上一个时区信息. 在内部, 时间戳存储为从 1970-01-01T00:00:00Z
至今的UTC 毫秒计数器(bigint). 因此, 时间戳可以合法的与bigint
进行转换.
时间戳的标记语法:
date-element [time-separator [time-element [offset]]]
time-separator: 'T' | ' '
date-element: yyyy-MM-dd
time-element: HH:mm:ss [fraction]
fraction: '.' digit+
offset: {+ | -} HH [:mm] | 'Z'
注意, 当写入的 timestamp 小于
-999999999999999
(-29719-04-05T22:13:20.001Z) 或者大于999999999999999
(33658-09-27T01:46:39.999Z) 时, 将会出现溢出问题.由于内部的时间解析存在限制, 不是所有bigint的数值范围均受到支持, 目前支持的合理解析范围为 (
292275054BC
-292278993AD
).
带时区的时间戳标记
带时间信息的时间戳将会被转换为时间偏移后的UTC时间戳存储.
cr> select '1970-01-02T00:00:00+0100'::timestamp with time zone as ts_z,
... '1970-01-02T00:00:00Z'::timestamp with time zone ts_z,
... '1970-01-02T00:00:00'::timestamp with time zone ts_z,
... '1970-01-02 00:00:00'::timestamp with time zone ts_z_sql_format;
+----------+----------+----------+-----------------+
| ts_z | ts_z | ts_z | ts_z_sql_format |
+----------+----------+----------+-----------------+
| 82800000 | 86400000 | 86400000 | 86400000 |
+----------+----------+----------+-----------------+
SELECT 1 row in set (... sec)
不带时区信息的时间戳标记
cr> select '1970-01-02T00:00:00+0200'::timestamp without time zone as ts,
... '1970-01-02T00:00:00+0400'::timestamp without time zone as ts,
... '1970-01-02T00:00:00Z'::timestamp without time zone as ts,
... '1970-01-02 00:00:00Z'::timestamp without time zone as ts_sql_format;
+----------+----------+----------+---------------+
| ts | ts | ts | ts_sql_format |
+----------+----------+----------+---------------+
| 86400000 | 86400000 | 86400000 | 86400000 |
+----------+----------+----------+---------------+
SELECT 1 row in set (... sec)
Interval
时间间隔标记时间间隔标记标识一段时间范围, 例如 年-月.
<interval_literal> ::=
INTERVAL [ <sign> ] <string_literal> <interval_qualifier>
<interval_qualifier> ::=
<start_field> [ TO <end_field>]
<start_field> ::= <datetime_field>
<end_field> ::= <datetime_field>
<datetime_field> ::=
YEAR
| MONTH
| DAY
| HOUR
| MINUTE
| SECOND
year-month
year-month
包含年/月信息.
<year_month_literal> ::=
INTERVAL [ {+ | -} ]'yy' <interval_qualifier> |
INTERVAL [ {+ | -} ]'[ yy- ] mm' <interval_qualifier>
cr> select INTERVAL '01-02' YEAR TO MONTH;
+--------------------------------+
| INTERVAL '01-02' YEAR TO MONTH |
+--------------------------------+
| 1 year 2 mons 00:00:00 |
+--------------------------------+
SELECT 1 row in set (... sec)
day-time
<day_time_literal> ::=
INTERVAL [ {+ | -} ]'dd [ <space> hh [ :mm [ :ss ]]]' <interval_qualifier>
INTERVAL [ {+ | -} ]'hh [ :mm [ :ss [ .nn ]]]' <interval_qualifier>
INTERVAL [ {+ | -} ]'mm [ :ss [ .nn ]]' <interval_qualifier>
INTERVAL [ {+ | -} ]'ss [ .nn ]' <interval_qualifier>
cr> select INTERVAL '10 23:10' DAY TO MINUTE;
+-----------------------------------+
| INTERVAL '10 23:10' DAY TO MINUTE |
+-----------------------------------+
| 1 weeks 3 days 23:10:00 |
+-----------------------------------+
SELECT 1 row in set (... sec)
字符串标记
cr> select INTERVAL '1-2 3 4:5:6';
+---------------------------------+
| CAST('1-2 3 4:5:6' AS interval) |
+---------------------------------+
| 1 year 2 mons 3 days 04:05:06 |
+---------------------------------+
SELECT 1 row in set (... sec)
ISO-8601 标记
cr> select INTERVAL 'P1Y2M3DT4H5M6S';
+------------------------------------+
| CAST('P1Y2M3DT4H5M6S' AS interval) |
+------------------------------------+
| 1 year 2 mons 3 days 04:05:06 |
+------------------------------------+
SELECT 1 row in set (... sec)
PostgreSQL 标记
cr> select INTERVAL '1 year 2 months 3 days 4 hours 5 minutes 6 seconds';
+------------------------------------------------------------------------+
| CAST('1 year 2 months 3 days 4 hours 5 minutes 6 seconds' AS interval) |
+------------------------------------------------------------------------+
| 1 year 2 mons 3 days 04:05:06 |
+------------------------------------------------------------------------+
SELECT 1 row in set (... sec)
时间计算
时间戳支持简单的加减计算.
Operand | Operator | Operand |
---|---|---|
timestamp |
- |
timestamp |
interval |
+ |
timestamp |
timestamp |
+ , - |
interval |
interval |
+ , - |
interval |
空间类型
空间形状(geo_shape)
空间形状类型存储地理图形, 标识为GeoJSON对象.一个空间形状列(coumn)支持存储不同类型的GeoJSON地理图形对象.(例如, 同时存储LineString或MultiPolygon图形)
类型定义
<columnName> geo_shape
为了快速查询空间地理信息, 需要同时指定空间索引.
<columnName> geo_shape INDEX USING geohash WITH (precision='50m', distance_error_pct=0.025)
常见的空间索引结构为 geohash
(默认) 和 quadtree
.
复杂类型
对象(object)
对象类型允许定义复杂的嵌套结构.每个对象可以包含一个或多个任意类型的字段(field). 对象自身的定义可以是动态类型(schemaless)或者强制地理对象结构(schema). 对象本身也可以定义为JSON二进制块.
定义语法
<columnName> OBJECT [ ({DYNAMIC|STRICT|IGNORED}) ] [ AS ( <columnDefinition>* ) ]
cr> create table my_table11 (
... title text,
... col1 object,
... col3 object(strict) as (
... age integer,
... name text,
... col31 object as (
... birthday timestamp with time zone
... )
... )
... );
CREATE OK, 1 row affected (... sec)
Strict
每个field可以声明为强制模式(strict), 这意味着新写入的对象必须要包含对象声明式标记为strict的field.
cr> create table my_table12 (
... title text,
... author object(strict) as (
... name text,
... birthday timestamp with time zone
... )
... );
CREATE OK, 1 row affected (... sec)
Dynamic
另外一个field选项为动态标记(dynamic),这意味着该字段可以在生命周期内动态添加字段信息.
cr> create table my_table14 (
... title text,
... author object(dynamic) as (
... name text,
... birthday timestamp with time zone
... )
... );
CREATE OK, 1 row affected (... sec)
对象标记
{ [ ident = expr [ , ... ] ] }
空对象
{}
包含Boolean的对象
{ my_bool_column = true }
包含文本的对象
{ my_str_col = 'this is a text value' }
包含数值的对象
{ my_int_col = 1234, my_float_col = 5.6 }
包含集合的对象
{ my_array_column = ['v', 'a', 'l', 'u', 'e'] }
驼峰标记必须引用
{ "CamelCaseColumn" = 'this is a text value' }
包含嵌套的对象
{ nested_obj_colmn = { int_col = 1234, str_col = 'text value' } }
包含占位符的对象
{ my_other_column = ? }
组合示例:
{ id = 1, name = 'foo', tags = ['apple', 'banana', 'pear'], size = 3.1415, valid = ? }
数组
[]
[null]
[1, 2, 3, 4, 5, 6, 7, 8]
['Zaphod', 'Ford', 'Arthur']
[?]
ARRAY[true, false]
ARRAY[column_a, column_b]
ARRAY[ARRAY[1, 2, 1 + 2], ARRAY[3, 4, 3 + 4]]
类型转换
可以使用 cast
关键词完成类型之间的转换.
cast(expression as type)
expression::type
cr> select cast(port['http'] as boolean) from sys.nodes limit 1;
+-------------------------------+
| CAST(port['http'] AS boolean) |
+-------------------------------+
| TRUE |
+-------------------------------+
SELECT 1 row in set (... sec)
cr> select (2+10)/2::text;
+------------------------------+
| ((2 + 10) / CAST(2 AS text)) |
+------------------------------+
| 6 |
+------------------------------+
SELECT 1 row in set (... sec)
同时, 也支持将array转换为其他类型:
cr> select cast([0,1,5] as array(boolean)) as
... active_threads from sys.nodes limit 1;
+---------------------+
| active_threads |
+---------------------+
| [false, true, true] |
+---------------------+
SELECT 1 row in set (... sec)
此外, 用户可以用 try_cast
关键词来进行类型转换, try_cast
在转换失败时返回 null
try_cast(expression as type)
cr> select try_cast('true' as boolean) from sys.nodes limit 1;
+-----------------------------+
| TRY_CAST('true' AS boolean) |
+-----------------------------+
| TRUE |
+-----------------------------+
SELECT 1 row in set (... sec)
cr> select try_cast(name as integer) from sys.nodes limit 1;
+---------------------------+
| TRY_CAST(name AS integer) |
+---------------------------+
| NULL |
+---------------------------+
SELECT 1 row in set (... sec)
类型别称
为了提供与PostgreSQL的类型兼容性, TSDB 2.0的类型也支持别名.
别名 | 真实类型 |
---|---|
int2 | smallint |
short | smallint |
int | integer |
int4 | integer |
int8 | bigint |
long | bigint |
string | text |
name | text |
byte | char |
float | real |
double | double precision |
timestamp | timestamp with time zone |
timestamptz | timestamp with time zone |
版权声明
本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。
评论