SQL, which stands for Structured Query Language

SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. Virtually all relational database management systems (RDBMS) use SQL as the standard querying language. Some of the popular database software that use SQL include:

  1. MySQL: MySQL is one of the most popular open-source relational database management systems. It is used by many web applications and is known for its speed and reliability. SQL is used for querying and managing MySQL databases.

  2. Oracle Database: Oracle Database is a robust and widely used commercial RDBMS. SQL is the primary language used for interacting with Oracle databases. Oracle also provides its own extensions to the standard SQL language.

  3. Microsoft SQL Server: Microsoft SQL Server is a comprehensive, enterprise-class database management system. T-SQL (Transact-SQL) is the extension of SQL used in Microsoft SQL Server, incorporating additional features specific to the Microsoft ecosystem.

  4. PostgreSQL: PostgreSQL is a powerful open-source object-relational database system. It supports advanced data types and offers many features found in commercial databases. SQL is the language used for querying and managing PostgreSQL databases.

  5. SQLite: SQLite is a lightweight, file-based relational database engine. It is embedded in many applications, especially in mobile devices and desktop software. SQL is used to interact with SQLite databases.

  6. IBM Db2: IBM Db2 is a family of data management products, including database servers, developed by IBM. SQL is the standard language used to work with Db2 databases.

  7. MariaDB: MariaDB is a fork of MySQL and is designed to maintain open-source freedom. It is fully compatible with MySQL and uses SQL for database operations.

These are just a few examples, and there are many other RDBMS systems that use SQL as their querying language. SQL provides a standardized way for users to interact with and manage data in relational databases, regardless of the specific database software being used.

In the context of data analysis and data science, SQL plays a crucial role in handling structured data, enabling professionals to interact with databases, extract relevant information, and perform complex operations efficiently. Here’s an introduction to SQL for data analysis and data science:

**1. Data Retrieval:
SQL allows users to retrieve specific data from databases using queries. With SELECT statements, users can specify the columns they want to retrieve and apply filtering conditions to fetch only the necessary data.

SQL Syntax:

SELECT column1, column2
FROM
table_name
WHERE condition;

**2. Data Filtering and Sorting: SQL enables users to filter data based on specific criteria and sort the results in ascending or descending order. This functionality is crucial for narrowing down datasets and arranging the output in a meaningful way.

SQL Syntax:

SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column_name ASC/DESC;

**3. Data Aggregation: SQL provides aggregate functions such as SUM, AVG, COUNT, MIN, and MAX, which allow users to perform calculations on groups of rows. These functions are essential for summarizing data and generating meaningful insights.

SQL Syntax:

SELECT AVG(column_name)
FROM table_name
GROUP BY grouping_column;

**4. Data Joining: SQL supports JOIN operations to combine data from multiple tables based on common columns. This feature is invaluable when working with databases that have related information stored in different tables.

SQL Syntax:

SELECT column1, column2
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;

**5. Data Modification: SQL includes statements for modifying data in a database. UPDATE statement changes existing records, INSERT statement adds new records, and DELETE statement removes records. Proper caution is needed while using these statements to maintain data integrity.

SQL Syntax:

UPDATE table_name
SET column1 = value1
WHERE condition;
INSERT INTO table_name (column1, column2)
VALUES (value1, value2);
DELETE FROM table_name
WHERE condition;

**6. Data Constraints and Indexing: SQL allows the definition of constraints such as primary keys, unique constraints, and foreign keys, ensuring data integrity and relational integrity. Indexing can also be applied to improve query performance, especially on large datasets.

SQL Syntax:

CREATE TABLE table_name (
column1 datatype CONSTRAINT
constraint_name PRIMARY KEY,
column2 datatype UNIQUE,
column3 datatype,
FOREIGN KEY (column_name)
REFERENCES another_table(column_name));
CREATE INDEX index_name
ON table_name(column_name);

Proficiency in SQL is essential for anyone working with databases, including data analysts and data scientists. It empowers professionals to extract, transform, and analyze structured data efficiently, making it a fundamental skill in the field of data analysis and data science. Familiarity with SQL, combined with other data analysis tools and techniques, enhances the capability to derive valuable insights from large datasets.

Commonly used SQL Statements

SQL DATABASE

CREATE DATABASE Statement:

CREATE DATABASE databasename;

DROP DATABASE Statement:

DROP DATABASE databasename;

BACKUP DATABASE Statement:

BACKUP DATABASE databasename
TO DISK = ‘filepath’;

Example:

BACKUP DATABASE exampleDB
TO DISK = ‘F:\backups\exampleDB.bak’;

SQL TABLE

CREATE TABLE Statement:

CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
….
);

Example:

CREATE TABLE customers (
custID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);

DROP TABLE Statement:

DROP TABLE table_name;

DROP TABLE Statement:

TRUNCATE TABLE table_name;

Example:

TRUNCATE TABLE products;

SQL ALTER TABLE Statement:

ALTER TABLE table_name
ADD column_name datatype;

Example to add new column:

ALTER TABLE Customers
ADD phonenumber int(20);

SQL Create Constraints:

CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
column3 datatype constraint,
….
);

SQL NOT NULL Constraint:

Example:

CREATE TABLE products (
ID int NOT NULL,
productname varchar(255) NOT NULL,
description varchar(255) NOT NULL,
barcode int
);

SQL UNIQUE Constraint:

Example:

CREATE TABLE customers (
custID int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);

SQL PRIMARY KEY Constraint:

Example:

CREATE TABLE customers (
custID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (custID)
);

SQL FOREIGN KEY Constraint:

Example:

CREATE TABLE products (
prodID int NOT NULL,
productname varchar(255) NOT NULL,
description varchar(255) NOT NULL,
barcode int,
PRIMARY KEY (prodID),
FOREIGN KEY (sellerID) REFERENCES sellers(sellerID)
);

The SQL COUNT() Function:

SELECT COUNT(column_name)
FROM table_name
WHERE condition;

Example:
SELECT COUNT(*)
FROM Orders;

Example:
SELECT COUNT(ProductID)
FROM Products
WHERE Price > 80;

The SQL WHERE Clause:

SELECT column1, column2, …
FROM table_name
WHERE condition;

Example:

SELECT * FROM travelagent
WHERE Country = ‘Canada’;

The SQL ORDER BY:

SELECT column1, column2, …
FROM table_name
ORDER BY column1, column2, … ASC|DESC;

Example:

SELECT * FROM Products
ORDER BY category;

The SQL AND Operator:

SELECT column1, column2, …
FROM table_name
WHERE condition1 AND condition2 AND condition3 …;

Example:

SELECT *
FROM Buyers
WHERE Country = ‘Japan’ AND CustomerName LIKE ‘K%’;

The SQL OR Operator:

SELECT column1, column2, …
FROM table_name
WHERE condition1 OR condition2 OR condition3 …;

Example:

SELECT *
FROM Buyers
WHERE Country = ‘Japan’ OR Country = ‘India’;

The NOT Operator:

SELECT column1, column2, …
FROM table_name
WHERE NOT condition;

Example:

SELECT *
FROM Buyers
WHERE NOT country = ‘Peru’;

The SQL INSERT INTO Statement:

INSERT INTO table_name (column1, column2, column3, …)
VALUES (value1, value2, value3, …);

Example:

INSERT INTO CarType (Carmodel, Manufacturer, Year)
VALUES (‘Silverado’, ‘Chevrolet’, ‘2014’);

The IS NULL Operator:

Example:

SELECT sellername, contactname, address
FROM Sellers
WHERE Address NULL;

The IS NOT NULL Operator:

Example:

SELECT sellername, contactname, address
FROM Sellers
WHERE Address IS NOT NULL;

The SQL UPDATE Statement:

UPDATE table_name
SET column1 = value1, column2 = value2, …
WHERE condition;

Example:

UPDATE Sellers
SET contactname = ‘David White’, City= ‘Ottawa’
WHERE sellerID = 184;

The SQL DELETE Statement:

DELETE FROM table_name WHERE condition;

Example:

DELETE FROM products WHERE prodID = 46345;

The SQL SELECT TOP Clause:

SQL Server / MS Access Syntax:
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;

Example, selects the first 50% of the records from the “Buyers” table:

SELECT TOP 50 PERCENT * FROM Buyers;

The SQL SELECT TOP Clause:

SQL Server / MS Access Syntax:
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;

 

Example, selects the first 50% of the records from the “Buyers” table:

SELECT TOP 50 PERCENT * FROM Buyers;

The SQL SELECT TOP Clause:

MySQL Syntax:

SELECT column_name(s)
FROM table_name
WHERE condition
LIMIT number;

Example, Select the first 3 records of the Sellers table:
SELECT * FROM Sellers
LIMIT 3;

Example, Select only the first 3 records of the Sellers table:
SELECT TOP 3 * FROM Sellers;

Example:
SELECT * FROM Buyers
FETCH FIRST 3 ROWS ONLY;

The SQL MIN() and MAX() Functions:

SELECT MIN(column_name)
FROM table_name
WHERE condition;

SELECT MAX(column_name)
FROM table_name
WHERE condition;

Example, Find the lowest price:

SELECT MIN(Price)
FROM Products;

Example, Find the highest price:

SELECT MAX(Price)
FROM Products;

The SQL SUM() Function:

SELECT SUM(column_name)
FROM table_name
WHERE condition;

Example:


SELECT SUM(Quantity)
FROM Carorderdetails;

The SQL AVG() Function:

SELECT AVG(column_name)
FROM table_name
WHERE condition;

Example:

SELECT AVG(Price)
FROM Products;

The SQL BETWEEN Operator:

SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;

Example:

SELECT * FROM Products
WHERE Price BETWEEN 50 AND 100;

The SQL GROUP BY Statement:

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

Example:

SELECT COUNT(carID), Model, Price
FROM Carorders
GROUP BY Model
ORDER BY COUNT(carID) DESC;