Lec2-SQL

本文最后更新于:2022年5月29日 凌晨

SQL 语法基础

[toc]

Lecture #02: Intermediate SQL

1 Relational Languages

关系代数基于set(无序,没有重复),SQL基于bag(无序,有重复)。

Relational algebra is based on sets (unordered, no duplicates). SQL is based on bags (unordered, allows duplicates).

2 SQL History

用于关系型数据库的声明式查询语言,最初是在 20 世纪 70 年代作为 IBM System R 项目的一部分开发的。IBM 最初称为 “SEQUEL”(Structured English Query Language 结构化英文查询语言)。在20世纪80年代改名为“SQL”(Structured Query Language 结构化查询语言)。

这个语言由不同种类的命令组成:

  1. Data Manipulation Language (DML): SELECT, INSERT, UPDATE, and DELETE statements.
  2. Data Definition Language (DDL): Schema definitions for tables, indexes, views, and other objects.
  3. Data Control Language (DCL): Security, access controls.

DML数据操纵语言, DDL数据定义语言, DCL数据控制语言

3 Aggregates(聚合)

聚合函数接受一批元组作为输入并产生单个标量作为输出。聚合函数只能用于SELECT的输出列。

An aggregation function takes in a bag of tuples as its input and then produces a single scalar value as its output. Aggregate functions can only be used in SELECT output list.

COUNT, SUM, AVG, MAX, MIN

1
2
3
SELECT COUNT(*) FROM student WHERE login LIKE '%@cs';
SELECT COUNT(login) FROM student WHERE login LIKE '%@cs';
SELECT COUNT(1) FROM student WHERE login LIKE '%@cs';

这三条 sql 语句等价

可以使用多个聚合函数在一个SELECT语句块中:

1
2
SELECT AVG(gpa), COUNT(sid)
FROM student WHERE login LIKE '%@cs';

一些聚合函数支持DISTINCT关键字:

1
2
SELECT COUNT(DISTINCT login)
FROM student WHERE login LIKE '%@cs';

同时输出聚合函数列和非聚合函数列是未定义行为(e.cid在这里是未定义行为)

注:未定义行为指该行为标准中未定义,不同数据库可能会返回不同的结果(甚至可能会报错)

1
2
3
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid;

因此,其他非聚合函数的列必须使用GROUP BY命令进行聚合:

1
2
3
4
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid;

HAVING:聚合后 进行过滤,像WHERE子句中使用GROUP BY过滤这样

1
2
3
4
5
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING avg_gpa > 3.9;

4 String Operations

SQL 标准规定字符串大小写敏感只能使用单引号

LIKE 关键字中, % 匹配任意字符(包括空), _ 匹配任意一个字符

|| 连接两个字符串

The SQL standard says that strings are case sensitive and single-quotes only. There are functions to manipulate strings that can be used in any part of a query.

Pattern Matching: The LIKE keyword is used for string matching in predicates.

• “%” matches any substrings (including empty).

• “_” matches any one character.

Concatenation: Two vertical bars (“||”) will concatenate two or more strings together into a single string.

5 Output Redirection

除了可以直接给用户返回结果,你还可以告诉数据库存储结果到另外一张表,然后你可以在后续查询中访问这些数据。

  • New Table, 存储查询的输出到新表中。
1
SELECT DISTINCT cid INTO CourseIds FROM enrolled;
  • Existing Table:存储查询的输入到已经存在表中。目标表必须有相同的列数和相同的类型,列名可以和输出列不匹配。
1
INSERT INTO CourseIds (SELECT DISTINCT cid FROM enrolled);

6 Output Control

由于SQL是无序的,你必须使用ORDER BY子句去对结果进行排序:

1
2
SELECT sid FROM enrolled WHERE cid = '15-721'
ORDER BY grade DESC;

你可以使用多个ORDER BY子句做更复杂的排序:

1
2
SELECT sid FROM enrolled WHERE cid = '15-721'
ORDER BY grade DESC, sid ASC;

你可以使用任意的表达式在ORDER BY子句中:

1
2
SELECT sid FROM enrolled WHERE cid = '15-721'
ORDER BY UPPER(grade) DESC, sid + 1 ASC;

通常,数据库会返回所有符合条件的元组。你可以使用LIMIT子句去限制返回元组的数量:

1
2
SELECT sid, name FROM student WHERE login LIKE '%@cs'
LIMIT 10;

同时也可以提供offset去获得一个结果的区间:

1
2
SELECT sid, name FROM student WHERE login LIKE '%@cs'
LIMIT 10 OFFSET 20;

除非你使用一个ORDER BY子句搭配LIMIT,否则数据库可能在每次查询会返回不同的元组,因为关系模型不在意顺序。

7 Nested Queries

Invoke queries inside of other queries to execute more complex logic within a single query. The scope of outer query is included in inner query (i.e. inner query can access attributes from outer query), but not the other way around.

内部查询可以出现在查询的几乎所有地方:

  1. SELECT输出目标:
1
SELECT (SELECT 1) AS one FROM student;
  1. FROM子句:
1
2
3
SELECT name
FROM student AS s, (SELECT sid FROM enrolled) AS e
WHERE s.sid = e.sid;
  1. WHERE子句:
1
2
SELECT name FROM student
WHERE sid IN ( SELECT sid FROM enrolled );

例子:获取所有报名“15-445”的学生名字

1
2
SELECT name FROM student
WHERE sid IN ( SELECT sid FROM enrolled WHERE cid = '15-445' );

注意,根据sid出现的不同位置,它拥有不同的作用域。

Nest Query Results Expressions(嵌套查询表达式):

  • ALL:必须满足子查询中所有的行
  • ANY:必须满足子查询中至少1行
  • IN:等价于ANY()
  • EXISTS:至少1行被返回

8 Window Functions

跨元组执行“移动”计算。和聚合一样,但它依然返回原始元组。

Performs “moving” calculation across set of tuples. Like an aggregation but it still returns the original tuples.

Functions(函数):窗口函数可以是上面讨论的任意的聚合函数。同时还有一些特殊的窗口函数:

  1. ROW_NUMBER:当前行的number
  2. RANK:当前行的排序

Grouping(分组):OVER子句指定当计算窗口函数的时候如何分组。使用PARTITION BY去指定分组。

1
2
SELECT cid, sid, ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled ORDER BY cid;

你也可以在OVER中使用ORDER BY去确定结果的顺序,即使数据库内部存储结构发生改变也不会影响结果。

1
2
SELECT *, ROW_NUMBER() OVER (ORDER BY cid)
FROM enrolled ORDER BY cid;

重点:RANK函数在窗口函数排序后计算,ROW_NUBMER函数在排序前计算。

9 Common Table Expressions

公用表表达式(CTE)是窗口函数和嵌套查询的替代方法去编写更加复杂的查询。可以认为CTE是在单次查询的一个临时表。

WITH子句将内部查询的输出绑定到临时的结果。

例子:生成一个 CTE 叫 cteName 包含 1 个 tuple 的 1 个属性,属性值为 1 。这个查询返回了 cteName 的所有属性及值。

1
2
3
4
WITH cteName AS (
SELECT 1
)
SELECT * FROM cteName;

你可以绑定输出列的名字在AS之前:

1
2
3
4
WITH cteName (col1, col2) AS (
SELECT 1, 2
)
SELECT col1 + col2 FROM cteName;

单个查询可以包括多个CTE的定义:

1
2
3
4
5
6
7
WITH cte1 (col1) AS (
SELECT 1
),
cte2 (col2) AS (
SELECT 2
)
SELECT * FROM cte1, cte2;

在 WITH 后添加 RECURSIVE 关键字允许 CTE 引用自己。

例子:打印1到10的序列

1
2
3
4
5
6
7
WITH RECURSIVE cteSource (counter) AS (
(SELECT 1)
UNION
(SELECT counter + 1 FROM cteSource
WHERE counter < 10)
)
SELECT * FROM cteSource;

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!