Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- {
- "cells": [
- {
- "cell_type": "markdown",
- "id": "779c90b0",
- "metadata": {},
- "source": [
- "# 利用公共表表达式(CTE)及自连接查找重复元素"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "3c7e4a21",
- "metadata": {},
- "source": [
- "> 不同数据库实现略有不同,这里仅以 Oracle 及 PostgreSQL 为例"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b2d31279",
- "metadata": {},
- "source": [
- "## Oracle"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c7a42d1f",
- "metadata": {},
- "source": [
- "```sql\n",
- "DROP TABLE T_CONTACTS;\n",
- "\n",
- "CREATE TABLE T_CONTACTS(\n",
- " EMAIL VARCHAR2(255)\n",
- ");\n",
- "\n",
- "INSERT ALL\n",
- " -- x1\n",
- " INTO T_CONTACTS VALUES('uncautiously@phenocryst.org')\n",
- " INTO T_CONTACTS VALUES('innocuous@grosz.net')\n",
- " -- dup x2\n",
- " INTO T_CONTACTS VALUES('hopvine@dilligrout.edu')\n",
- " INTO T_CONTACTS VALUES('hopvine@dilligrout.edu')\n",
- " -- dup x3\n",
- " INTO T_CONTACTS VALUES('chamaesiphon@provisor.org')\n",
- " INTO T_CONTACTS VALUES('chamaesiphon@provisor.org')\n",
- " INTO T_CONTACTS VALUES('chamaesiphon@provisor.org')\n",
- " -- dup x4\n",
- " INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
- " INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
- " INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
- " INTO T_CONTACTS VALUES('miltonian@soarability.edu')\n",
- "SELECT 1 FROM DUAL;\n",
- "\n",
- "WITH Y AS (SELECT EMAIL, SYS_GUID() RN FROM T_CONTACTS),\n",
- " T AS (SELECT * FROM Y)\n",
- " SELECT DISTINCT T1.EMAIL\n",
- " FROM T T1\n",
- "INNER JOIN T T2\n",
- " ON T1.EMAIL = T2.EMAIL\n",
- " AND T1.RN <> T2.RN;\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f53269d2",
- "metadata": {},
- "source": [
- "注意最后的查询中临时表 `Y` 增加了一列字段 `rn`,但在 Oracle 中 CTE 在后续的查询中引用中会重复计算(暂不清楚是否与易失性函数如 `SYS_GUID` 或 `DBMS_RANDOM.VALUE` 有关),所以加了一个临时表 `T` 用于固化 `Y` 的结果,最后自关联采用 `EMAIL` 相等且存在 `RN` 不等判断出重复值"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "688a7072",
- "metadata": {},
- "source": [
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1055780b",
- "metadata": {},
- "source": [
- "## PostgreSQL"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b5818644",
- "metadata": {},
- "source": [
- "```sql\n",
- "DROP TABLE IF EXISTS DATALAB.I_CONTACTS;\n",
- "\n",
- "CREATE TABLE DATALAB.I_CONTACTS(\n",
- " EMAIL VARCHAR(255)\n",
- ");\n",
- "\n",
- "INSERT INTO DATALAB.I_CONTACTS VALUES\n",
- " ('uncautiously@phenocryst.org'),\n",
- " ('innocuous@grosz.net'),\n",
- " ('hopvine@dilligrout.edu'),\n",
- " ('hopvine@dilligrout.edu'),\n",
- " ('chamaesiphon@provisor.org'),\n",
- " ('chamaesiphon@provisor.org'),\n",
- " ('chamaesiphon@provisor.org'),\n",
- " ('miltonian@soarability.edu'),\n",
- " ('miltonian@soarability.edu'),\n",
- " ('miltonian@soarability.edu'),\n",
- " ('miltonian@soarability.edu');\n",
- "\n",
- "WITH T AS (SELECT EMAIL, RANDOM() RN FROM DATALAB.I_CONTACTS)\n",
- " SELECT DISTINCT T1.EMAIL\n",
- " FROM T T1\n",
- "INNER JOIN T T2\n",
- " ON T1.EMAIL = T2.EMAIL\n",
- " AND T1.RN <> T2.RN;\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ca5a4705",
- "metadata": {},
- "source": [
- "注意 PostgreSQL 就没有 Oracle 中的问题了,可能与 CTE 的执行与实现原理有关"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d678657a",
- "metadata": {},
- "source": [
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "12db6b55",
- "metadata": {},
- "source": [
- "以上结果均为:\n",
- "\n",
- "| Email |\n",
- "| ----- |\n",
- "| hopvine@dilligrout.edu |\n",
- "| chamaesiphon@provisor.org |\n",
- "| miltonian@soarability.edu |\n",
- "\n",
- "(顺序可能有不同)"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.16"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
- }
Add Comment
Please, Sign In to add comment