子查询
与 MySQL 子查询的差异
在 MySQL 支持的子查询基础上,增加了以下限制,使用时需要注意。
- 支持 WHERE 条件中的子查询以及列子查询,不支持HAVING 子句中的子查询,JOIN ON 条件中的子查询;
- 等号操作行符的标量子查询(The Subquery as Scalar Operand)不支持 ROW 语法。例如:
支持:select * from tb1 where id in (select id from tb2)
支持:select * from tb1 where (id, name) in (select id, name from tb2)
支持:select * from tb1 where row(id, name) in (select id, name from tb2)
支持:select * from tb1 where row(id, name) not in (select id, name from tb2)
不支持:select * from tb1 where row(id, name) = (select id, name from tb2)
- 目前仅支持 SELECT 语句中的子查询。 DELETE 语句中的子查询,目前不支持。例如:
DELETE FROM t1 WHERE ROW(c1,c2) IN (
SELECT c1, c2 FROM t2
);
其它 INSERT, UPDATE, SET 中的子查询皆不支持。此处不再举出示例。
效率
DRDS 子查询部分只能转化为比较低效的 APPLY 执行器执行,在实际使用中请尽量避免以下类型的子查询。
WHERE 条件中 OR 与子查询共存时,执行效率会依外表数据情况大幅降低。例如:
高效:select * from tb1 where id in (select id from tb2)
高效:select * from tb1 where id in (select id from tb2) and id>3
低效:select * from tb1 where id in (select id from tb2) or id>3
关联子查询(Correlated Subqueries)的关联项中带函数或非等号运算符。例如:
高效:select * from tb1 a where id in
(select id from tb2 b where a.name=b.name)
低效:select * from tb1 a where id in
(select id from tb2 b where UPPER(a.name)=b.name)
低效:select * from tb1 a where id in
(select id from tb2 b where a.decimal_test=abs(b.decimal_test))
低效:select * from tb1 a where id in
(select id from tb2 b where a.name!=b.name)
低效:select * from tb1 a where id in
(select id from tb2 b where a.name>=b.name)
关联子查询(Correlated Subqueries)关联项与其它条件的逻辑运算符为 OR。例如:
高效:select * from tb1 a where id in
(select id from tb2 b where a.name=b.name
and b.date_test<'2015-12-02')
低效:select * from tb1 a where id in
(select id from tb2 b where a.name=b.name
or b.date_test<'2015-12-02')
低效:select * from tb1 a where id in
(select id from tb2 b where a.name=b.name
or b.date_test=a.date_test)
标量子查询(The Subquery as Scalar Operand)带关联项。例如:
高效:select * from tb1 a where id >
(select id from tb2 b where b.date_test<'2015-12-02')
低效:select * from tb1 a where id >
(select id from tb2 b where a.name=b.name
and b.date_test<'2015-12-02')
跨嵌套层的子查询关联项。例如:
SQL 多层嵌套,但每层子查询关联项仅与次级上层关联,此类高效。
高效:select * from tb1 a where id in(select id from tb2 b
where a.name=b.name and
exists (select name from tb3 c where b.address=c.address))
SQL 多层嵌套,但是表 c 的子查询关联项中与表 a 的列进行了关联,此类低效。
低效:select * from tb1 a where id in(select id from tb2 b
where a.name=b.name and
exists (select name from tb3 c where a.address=c.address))
子查询中包含 GROUP BY,请确保 GROUP BY 的分组列包含关联项。例如:
SQL 子查询中包含聚合函数和关联项,关联项 b.pk 被包含中分组列(pk)中,此类 SQL 高效。
高效:select * from tb1 a where exists
(select pk from tb2 b
where a.pk=b.pk and b.date_test='2003-04-05'
group by pk);
SQL 子查询中包含聚合函数和关联项,关联项 b.date_test 不被包含中分组列(pk)中,此类 SQL 低效。
低效:select * from tb1 a where exists
(select pk from tb2 b
where a.date_test=b.date_test and b.date_test='2003-04-05'
group by pk);
目前支持的子查询类别
Comparisons Using Subqueries
最常见的子查询类别,语法为:
non_subquery_operand comparison_operator (subquery)
comparison_operator : = > < >= <= <> != <=> like
例如:
... WHERE 'a' = (SELECT column1 FROM t1)
注意:目前仅支持子查询在比较运算符的右边。
Subqueries with ANY, ALL, IN/NOT IN, EXISTS/NOT EXISTS
语法:
operand comparison_operator ANY (subquery)
operand comparison_operator ALL (subquery)
operand IN (subquery)
operand NOT IN (subquery)
operand EXISTS (subquery)
operand NOT EXISTS (subquery)
comparison_operator:= > < >= <= <> !=
ANY : 如果子查询返回的任意一行满足 ANY 前的表达式,返回 TRUE,否则返回 FALSE。ALL:如果子查询返回所有行都满足 ALL 前的表达式,返回 TRUE,否则返回 FALSE。IN : IN 在子查询前使用时,与 =ANY 是等价的。例如:
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);
NOT IN:NOT IN 在子查询前使用时,与 <>ALL 是等价的。例如:
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);
EXISTS/NOT EXISTS:如果子查询返回任意行,EXISTS 子查询结果为 TRUE,NOT EXISTS 子查询结果为 FALSE。如果返回为空,则结果反过来。例如:
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
如果子查询中包含任意行,即使只包含 NULL 的行值,WHERE 条件也会返回 TRUE。
Row Subqueries
Row Subqueries 支持以下表达式:
= > < >= <= <> != <=>
例如:
SELECT * FROM t1
WHERE (col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);
SELECT * FROM t1
WHERE ROW(col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);
以上两个 SQL 是等价的。只有满足以下条件时,t1 表的数据行才会返回:
- 子查询 (SELECT col3, col4 FROM t2 WHERE id = 10 ) 返回的仅有一行记录。返回多行会报错。
- 子查询返回的 col3, col4 与主表的 col1, col2 相等。
Correlated Subqueries
Correlated Subqueries 是指子查询中包含对外层查询表的引用。例如:
SELECT * FROM t1
WHERE column1 = ANY (SELECT column1 FROM t2
WHERE t2.column2 = t1.column2);
示例 SQL 子查询中并没有包含 t1 表及其列名 column2,此时会向上一层寻找 t1 表的引用。
注意:将 Correlated Subqueries 改写为 JOIN 有可能会提高其性能。
Derived Tables (Subqueries in the FROM Clause)
Derived Tables 是指在 FROM 子句中的子查询:
SELECT ... FROM (subquery) [AS] tbl_name ...
例如:
CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
INSERT INTO t1 VALUES (1,'1',1.0);
INSERT INTO t1 VALUES (2,'2',2.0);
SELECT sb1,sb2,sb3
FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb
WHERE sb1 > 1;
查询的结果为:2, '2', 4
从场景举例:假如现在需要分组数据 SUM 后的平均值,直接使用以下 SQL 无法得到想要的结果。
SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;
此时可使用 Derived Tables 拿到需要的信息:
SELECT AVG(sum_column1)
FROM (SELECT SUM(column1) AS sum_column1
FROM t1 GROUP BY column1) AS t1;
注意:
- Derived Tables 必须拥有一个别名。
- Derived Tables 可以返回一个标量,列,行或表。
- Derived Tables 不可以成为 Correlated Subqueries,即不能包含子查询外部表的引用。
版权声明
本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。
评论