Database and donuts

September 2, 2023

(1y ago)

Hello There🐼

Let's talk Database management.

In this blog we will talk about Basics and advanced concepts related to database management and SQL.

We have a lot of things in DBMS. we are going to cover few important ones here.

Cardinallity and it's ratio
Keys
Joins
Functional dependency
SQL

Cardinallity and it's ratio

Cardinallity is a measure to represt how the entities are related to each other.
There are 4 types of cardinality ratio:
1. One-to-One
2. One-to-Many
3. Many-to-One
4. Many-to-Many

Extension and Intension

Extension	Intension
THe stae of the database at a given time.	The internal structure and architecture of the database while creating it.

Keys

Keys play an important role in Database.
There are various keys:

Super key
Candidate key
Primary key
Unique key
Foreign key

Super key

All possible key combinations in a table or a realtion are known as super keys.
It's like a superset of all keys.

Candidate key

It is a subset of all minimal super keys.
All candidate keys as super keys.

Primary key

Subset of candidate keys who's value can not be null.
A unique attribute in a table which can be used to identify all other keys in the table.
All primary keys are Candidate keys.
The value can not be null.

Unique key

Subset of Candidate keys which are not primary key
We can modify the values in this columns.
Value can be null.

Foreign key

The attribute from one table which is refferenced in other table.
It can only take values which are defined in the parent table.
A foreign key is allways reffering a primary key of other table.

Locks

Shared Lock	Exclusive Lock
We can have mnay readers reading the same data.	We have only one writer writing the data.
We allow other transactions or clients to read the data at the same time	We do not give access to any reader or writer on the data which is being written by a transactions at that moment.

Prime and Non-Prime attributes

Prime attributes	Non Prime attributes
The attributes which are part of a candidate key.	The attributes which are not part of a candidate key.
Say Candidate Key is `AB-->CD`.	Say Candidate Key is `AB-->CD`.
`AB` are Prime attributes.	`CD` are Non Prime attributes.

Partial dependency

For a Candidate key AB-->CD if there exists a relation B-->C or B-->D or A-->C or A-->D then this is called partial dependency.
Basically all non prime attributes should completely depend only on Candidate key.

Transitive dependency

For a given candidate key A-->BC we have realtions B-->C.
A is prime attribute.
B and C are non prime attributes.
B being a non prime attribute is identifying C which is also a non prime attribute.
This kind of situation in Transitive dependency.

Joins

Joining one or more tables together on some common attribute or some comparison of columns from both the tables.
There are multiple types of join:
1. Natural Join
2. Equi Join
3. Self Join
4. Outter Join:

Natural Join

Joining two tables based on the common attributes which is in both the tables.

Equi Join

It is same of natural join but here you can join the tables based on any attribute.
The joining attribute need not be the same.

Self Join

Here we join the table with itself.

Left outter Join

After joining two tables we get the common attributes from both the table along with that we get all the attributes from left table.
Syntax:SELECT * FROM emp LEFT OUTTER JOIN(dept) ON(emp.id = dept.id)

Right outter Join

After joining two tables we get the common attributes from both the table along with that we get all the attributes from right table.
Syntax:SELECT * FROM emp RIGHT OUTTER JOIN(dept) ON(emp.id = dept.id)

FULL outter Join

After joining two tables we get the common attributes from both the table along with that we get all the attributes from BOTH tables.
Syntax:SELECT * FROM emp FULL OUTTER JOIN(dept) ON(emp.id = dept.id)

Functional dependency

A given attribute relation X-->Y. Here Y is said to be dependent on X.
If X can uniquely identify Y for any value of X.

Trivial Functional dependency	Non-Trivial Functional dependency
`X-->Y`	`X-->Y`
`Y ⊆ X`	`Y ⊈ X`
EX: `AB-->A`	EX: `AB-->C`

Closure set

It is a set of all attribute which can be identied by a given attribute.
(A)⁺=ABC

Normalization

Breaking down a bigger table into smaller tables based on some rules is normalization.
Need for normalization is to get rid of problem realted with INSERT, UPDATE and DELETE anomalies.
Or in simple terms u can say that you want to get rid of redundancy.
Various normalization forms:
1. 1NF
2. 2NF
3. 3NF
4. BCNF
5. 4NF

1NF

Each cell in a table should have only atomic or one value.
Or you can say that each cell should not have multivalued attribute.
How to convert a table into 1NF if it is not in 1NF form:
- Make one more row for each multivalued attribute in a cell.

2NF

We need to know the concept of Prime and Non-Prime attributes to understand 2NF.
Consider Candidate keys AB-->CD, B-->C and AB-->D.
Here CD are Non Prime attributes and AB are Prime attributes.
C is entirely not dependent on the prime attributes or candidate keys.
C can be derived from a subset of Prime attribute B.
This is known as Partial dependency.
Partial dependency is to be avoided in 2NF.
In simple terms we can say that non prime attributes should entirely depend on candidate key and not a subset of the candidate key.
How to make a table into 2NF:
- Make another table with candidate key and all the attributes which are completely dependent on the candidate key.
- Make separate table for all the other keys which are defying the rule of partial dependency.

3NF

We Need to avoid Transitive dependency.
For any given functional dependency X--->Y If we satisfy either of the two rules then the table is in 3NF
1. X should be a super key.
2. Y should be a Prime attribute.
How to make table into 3NF:
Whoever is voilating the rule make separate table for them along with all the attributes which it is identifying.

BCNF

Only one condition is there:
LHS should be a Super key.

ACID

A-->Atomicity
- For a given instruction or a transaction it should take place completely or not at all.
- Transaction management system is responsible for Atomicity behaviour.
C-->Consistency
- For a given data Transaction the database should go from one consistent state to another consistent state.
- There is no system that holds takes care of Consistency.
I-->Isolation
- Each transaction or a set of instruction should run or execute independently without interfering with each other.
- Concurrancy control component is responsible for Isolation behaviour.
D-->Durability
- Whatever the data stored in the database it should be permanent.
- Recovery management component is responsible for Durability behaviour.

If any of the above properties is voilated by a transaction then the database automatically performs roll back operation.

Roll back operation takes the database to it's previous consistent state.

Transaction

It is a set of instruction that are executed on the database.

There are few concepts you need to understand for transactions and schedule:

Dirty read problem
Unrepeatable read problem
Phantom read problem
Lost update problem

Dirty read problem

T1	T2
R(A)
W(A)
	R(A)
	Commit
Commit

If a transaction modifies some data, Here in this case it is A.
If some other transaction reads that data before commiting, Then that transaction is reading out of time data.
This is only dirty read problem.

Unrepeatable read problem

T1	T2
R(A)
	R(A)
w(A)
	W(B)
	R(A)
Commit
	Commit

The same data which is read at two different times during a transaction is not the same.
This is only Unrepeatable read problem.

Phantom read problem

T1	T2
R(A)
	R(A)
DELETE(A)
	R(B)
	Commit
Commit

Here a transaction is trying to read a data which has been deleted.

Lost update problem

T1	T2
R(A)
	R(A)
W(A)
	W(A)
	Commit
Commit

Here the data written by Transaction-T1 is overwritten by Transaction-T2.

Schedule

It is just a collection of transactions.
If there are n transactions in a given Schedule then there are n! ways we can arrange these transactions.

Serail Schedule	Non Serail Schedule
The transactions in a given Schedule takes place one after the other serially.	The transactions in a given Schedule takes place interleaving one after the other.

Serail Schedules are always consistent.

Serializability

It is a technique to see if a non Serail Schedule can produce the same result if it is run as Serail Schedule.
There are two methods to check if a Schedule is Serializable or not

Conflict Serializable	View Serializable
We try to swap the order of the instructions and check if it produces the same output.	It is a sueprset of Conflict Serializable.
If two or more transactions are working on the same data and any of the following situation occurs then it is not not Conflict Serializable.	If a schedule is Conflict Serializable then it is view Serializable.
If `Read-Write`, `Write-Read` or `Write-Write`.	If a schedule is not Conflict Serializable then we have to perform some checks.

Recoverable schedule

When there is a failure in the system ehen some transaction is reading or writing some data to the database, we need to be able to recover the data or bring back the database to it's previous state.

recoverable

Strict schedule

If a transaction is working on some data item, then don't allow other transactions to get their hands on that data item till it commits.

Concurrancy control protocols

Timestamp protocol	Lock based
Assign a timestamp to all the transaction that takes place, so that when conflict occurs we can resolve it based on timestamp.	We assign lock on the the data.

Views

It is a virtual table that contains data from one or more tables.
Created using CREATE VIEW view_name as ....
Whenever there is a change in the underlying tables the view table changes.

TRIGGER

It is a special type of procedure which gets invoked automatically in a response to an event.
Pprocedures are basically stored SQL queries which can be executed anytime.

Index in SQL

It is a disc structure associated with a table or a view.
It is used to retrieve tha data faster as we will know where the required data is located.
Increases the data retrievel speed by reducing the number of pages that need to be searched to get the required data.

SQL

Structured Query Language.
Used to talk with the database.
Multiple types of commands are there in SQL, Mainly
1. DDL (Data Definition Language)
2. DML (Data Manipulation Language)
3. DCL (Data Control Language)
4. TCL (Transaction Control Language)

DDL	DML	DCL	TCL
Commands which are used to defines the data. Like TABLES, DATABASE, VIEWS, TRIGGERS	Commands which are used to Manipulate the existing data. Like UPDATE, INSERT	Commands which are used to control the behaviour or charecteristics of the data. Like GRANT	Commands which are used to control transactions in the database. Like COMMMIT
CREATE	INSERT	GRANT	COMMIT
DROP	UPDATE	REVOKE	--
ALTER	DELETE	--	--

Good Day 🐼