Union and intersection function in
this post.
Relational algebra is a procedural
query language. It requires to specify what data to retrieve as well as how to
retrieve those data. It requires user to instruct the system to perform a
sequence of operations on the database to retrieve the desired data.
Relational algebra provides the base
for commercial query languages. It illustrates fundamental techniques for
extracting data from the database.
Relational algebra consists of a set
of different operations. These operations take one or two relations as input
and produce a new relation as output. As the output is in form of relation, it
can become input to some another operation.
These operations
can be divided in two categories:
1.
Fundamental
operations of algebra (Fundamental operations):
Such as select, project, union,
set-difference, Cartesian product, and
rename.
2.
Secondary
operations:
Such as set-Intersection, Natural
Join, Division and Assignment.
Based on the number of input
relations, these operations can also be categorized as given below:
1.
Unary
operations:
These operations operate on single
relation.
Examples:
select, project, and rename operations.
2.
Binary
operations:
These operations operate on pairs of
relations.Such as union, set-difference, set-intersection,
Cartesian product, etc.
Among all these operations, some
operations use the concept of set theory of mathematics. Based on this concept,
these operations can be categorized as below:
1.
Set-theoretic
operations:
Considers that,
relations are essentially sets of rows.
Examples: union,
set-difference, set-intersection, Cartesian product.
2.
Native
relational operations:
These operations focus on the
structures of the rows.
Such as select, project, natural join,
division, etc.
These operations are described in
following sub-sections.
A sample database for a banking
system is given in following figure. This database is used to explain various
queries and their results. To keep this database simple, limited relations and
attributes are considered.
ano
|
balance
|
bname
|
A01
|
5000
|
VVN
|
A02
|
6000
|
KSAD
|
A03
|
7000
|
ANAND
|
A04
|
8000
|
KSAD
|
A05
|
6000
|
VVN
|
Bname
|
Baddress
|
VVN
|
MOTA BAZAR,
VVNAGAR
|
KSAD
|
CHHOTA BAZAR,
KARAMSAD
|
ANAND
|
NANA BAZAR,
ANAND
|
Database |
Here,
the Account relation contains five tuples representing five different accounts.
The branch relation represents three different branches of a bank.
CID
|
NAME
|
C01
|
RIYA
|
C02
|
JIYA
|
C03
|
PIYA
|
C04
|
DIYA
|
C05
|
TIYA
|
C06
|
PALAK
|
C07
|
TARAL
|
Customer
EID
|
NAME
|
MANAGER_ID
|
E01
|
PALAK
|
E03
|
E02
|
TARAL
|
E03
|
E03
|
JIYA
|
NULL
|
E04
|
SARAL
|
E03
|
E05
|
ZALAK
|
E03
|
E06
|
KAMAL
|
E03
|
E07
|
RIYA
|
E03
|
Employee
The
customer relation has only two attributes – CID and NAME – representing
customer ID and NAME. the Employee relation represents seven different employee
along with employee id of their manager. An employee with ID E03 has no any
manager as he is himself a manager. So, its MANAGER_ID field is null. Also
observe that the same person can be customer as well as employee.
Date Conversion in SQL
1.
The
select operations (select operation in dbms) :
Operation: select tuples from a relation that
satisfy a given condition (predicate).
Symbol : (Sigma) Notation: Sigma (condition)
<Relation>
Operators: the following operators can be used
to form a condition.
=,<,>,<=,>=,AND,OR
Ex: find out all accounts which belong to
branch named “VVN”.
The following
query returns all tuples which contains =”VVN” as an branch name.
Query: Sigma bname = “VVN”
(Account)
Output: ANO BALANCE BNAME
A01 5000 VVN
A05 6000 VVN
2.
The
set-difference (minus) operation (-)
Operation:
select tuples those are in one relation but not in another relation.
Symbol:
-(minus) Notation:
relation1 – relation2
Requirement:
Set-difference must be taken between
compatible relations.
Relations R and S are compatible, if
–
Both have same arity,
i.e. total number of attributes, and
Domains of ith
attribute of R and S are same.
Ex: list out all persons who are
customers but not employee.
Query: PI (NAME) (Customer) – PI (NAME)
(Employee)
Output: NAME
PIYA
DIYA
TIYA
3.
The
union operation (U)
Operation: selects tuple those are in either or
both of the relations.
Symbol: U (union) Notation: relation1 U
relation2
Requirement:
Union must be taken between
compatible relations.
Relations R and S are compatible, if
–
Both have same arity,
i.e. total number of attributes, and
Domains of ith
attribute of R and S are same.
SQL Aggregate Functions
Ex: list out all persons who are either customers
or employees of the bank.
Query: PI (NAME) (Customer) U PI (NAME)
(Employee)
Here, Customer and Employee are not
compatible relations. But, the resultant relations of project operations
contain only single attribute, name, with similar domain. So, it is possible to
apply union operation after applying such project operation even though the
original relations are not compatible.
4.
The
project operation
Operation: selects specified attributes of a
relation.
Symbol: PI Notation:
PI (attribute list) <Relation>
Ex: list out all accounts with account
numbers and balance.
Query: PI (ANO, BALANCE)
(Account)
Ex: lest out all account numbers having
balance less than 7000.
Query: PI (ANO) (SIGMA(sign) BALANCE
< 7000 (Account))
Output: ANO
A01
A02
A05
5.
The
rename operation in dbms
Operation: renames relations as well as its
attributes.
Symbol: P (rho) Notation:
Px (R) : Renames relation R to X.
Px (A1,A2,…,An)(R) : Renames relation
R to x and its attributes to A1, A2, …., An.
The following
example clarify the use of rename operation.
Ex: find largest balance from Account
relation.
Step-1:
Compare all balances with each other
and find balances which are not highest one.
For this, take Cartesian product of
Account relation with itself and compare balances, as shown below.
Rename is useful here to distinguish
two balances, as shown below.
Query: PI (A1.BALANCE)(Sigma sign
A1.BALANCE < A2.BALANCE (Pa1 (Account) X Pa2
(Account)))
Output of Step-1:
A1.BALANCE
5000
6000
7000
Step2:
Now, use the set-difference
operation to find out the largest balance.
Query: PI (BALANCE) (Account) –
PI (A1.BALANCE)(Sigma
sign A1.BALANCE < A2.BALANCE (Pa1 (Account) X Pa2
(Account)))
Output of Step-2:
BALANCE
8000
Ex: find the name of manager for an
employee having ID ‘E01’:
Here, there is a need to find
employee id of a manager of given employee.
This requires to compare each tuple
with all other tuples of the same
relation.
These can be performed by forming
Cartesian product on Employee
relation after renaming it as e1 and
e2, as shown below.
Query:
PI (e2.NAME)
(Sigma sign e1.EID = “E01” (Sigma sign e1.MANAGER_ID = e2.EID(Pe1
(Employee) XPe2 (Employee)))
Output:
E2.NAME
JIYA
In this example, a relation Employee
itself contains the information about manager of other employees. So, it is
used twice in single operation. In such cases, there is a need to distinguish
both occurrences of the same relation. This can be achieved by rename
operation.
6.
The
set-intersection operation
Operation: selects tuples those are in both
relations.
Symbol: intersection
Notation: relation1 Intersection Sign relation2
Requirement:
Both have same arity, i.e. total
number of attributes, and
Domains of ith attribute
of R and S are same.
Ex: list out all persons who are
customers as well as employees.
Query: PI (NAME) (Customer) Intersection
sign PI (NAME) (Employee)
Output: NAME
RIYA
JIYA
PALAK
TARAL
7.
The
Cartesian-product operation (X)
Operation: combines information of two
relations. It is also known as cross-
product operation and similar to
mathematical Cartesian-product operation.
Symbol: X (cross) Notation: relation1 X
relation2
Resultant
relation:
If relation 1 and relation 2 have n1
and n2 attributes respectively, then resultant relation will have n1 + n2
attributes, combining attributes from both the input relations.
If both relations have some attribute
having same name, it can be distinguished by combining ‘relation-name
Attribute-name’.
If relation1 and relation2 have n1 and
n2 tuples respectively, then resultant relation will have n1*n2 attributes,
combing each possible pair of tuples from both the input relations.
Ex: combine information from Account
and Branch relation.
ANO
|
BALANCE
|
ACCOUNT.
BNAME
|
BRANCH.
BNAME
|
ADDRESS
|
A01
|
5000
|
VVN
|
VVN
|
MOTA BAZAR, VVNAGAR
|
A01
|
5000
|
VVN
|
KSAD
|
CHHOTA BAZAR, KARAMSAD
|
A01
|
5000
|
VVN
|
ANAND
|
NANA BAZAR, ANAND
|
A02
|
6000
|
KSAD
|
VVN
|
MOTA BAZAR, VVNAGAR
|
A02
|
6000
|
KSAD
|
KSAD
|
CHHOTA BAZAR, KARAMSAD
|
A02
|
6000
|
KSAD
|
ANAND
|
NANA BAZAR, ANAND
|
A03
|
7000
|
ANAND
|
VVN
|
MOTA BAZAR, VVNAGAR
|
A03
|
7000
|
ANAND
|
KSAD
|
CHHOTA BAZAR, KARAMSAD
|
A03
|
7000
|
ANAND
|
ANAND
|
NANA BAZAR, ANAND
|
A04
|
8000
|
KSAD
|
VVN
|
MOTA BAZAR, VVNAGAR
|
A04
|
8000
|
KSAD
|
KSAD
|
CHHOTA BAZAR, KARAMSAD
|
A04
|
8000
|
KSAD
|
ANAND
|
NANA BAZAR, ANAND
|
A05
|
6000
|
VVN
|
VVN
|
MOTA BAZAR, VVNAGAR
|
A05
|
6000
|
VVN
|
KSAD
|
CHHOTA BAZAR, KARAMSAD
|
A05
|
6000
|
VVN
|
ANAND
|
NANA BAZAR, ANAND
|
Query: ACCOUNT
X BRANCH
As shown in figure, Cartesian product
combines information from Account
and Branch
relations. But here, for many tuples, values of branch names are not same.
For example, second row combines
account of ‘VVN’ branch with ‘KSAD’ branch. Such tuples provide inconsistent
data and they should be removed from the resultant relation.
To remove unnecessary tuples and to
retrieve only useful information, Cartesian product can be combined with select
and project operations. As bname is common attribute between two relations, if
can be used for equality check in select operation.
1.
The
Natural join operation
To retrieve consistent and useful
information from multiple relations, it was
necessary to
combine Cartesian production operation with select operation for equality check
on common attributes, as described below.
The natural join operation simplifies
such type of queries. It combines
following three operations into one operation.
The natural join operation –
1.
Forms
a Cartesian product on its argument relations,
2.
Performs
a selection for equality check on common attributes to remove unnecessary
tuples, and
3.
Removes
duplicate attributes.
The natural join
is denoted by symbol natural sign (join).
The notation to
perform this operation can be given as –
Relation1
join relation2
The following
example clarify working of natural join operation. These examples are same as
considered in Cartesian product operation.
advantage and disadvantage of DBMS (Database management system)
Ex: combine only consistent information
from Account and Branch relation.
Query: (ACCOUNT JOIN (sign) BRANCH)
Output:
ANO BALANCE BNAME BADDRESS
A01 5000 VVN MOTA
BAZAR, VVNAGAR
A02 6000 KSAD CHHOTA
BAZAR, KARAMSAD
A03 7000 ANAND NANA
BAZAR, ANAND
A04 8000 KSAD CHHOTA
BAZAR, KARAMSAD
A05 6000 VVN MOTA
BAZAR, VVNAGAR
As shown in figure, natural join
operation yield only consistent and useful information. It removes unnecessary
tuples as well as duplicate attributes. This makes the retrieval of information
from multiple relations very easy and convenient.
Consider another example which
retrieves specific information.
2.
The
outer join operation
The outer join operation is an
extension of the join operation. Join operations,
such as natural
join, may result in information loss sometimes. The outer join deals with such
kind of missing information.
To understand the concept of outer
join, consider the following two relations – college and hostel. The college
relation yields academic information about student studying in a college, such
as student name, id number, and department. The hostel relation yield
information about student staying in hostel such as student name, hostel name,
room number.
NAME
|
ID
|
DEPARTMENT
|
MANISHA
|
S01
|
COMPUTER
|
ANISHA
|
S02
|
COMPUTER
|
NISHA
|
S03
|
I.T.
|
COLLEGE
NAME
|
HOSTEL_NAME
|
ROOM_NO
|
ANISHA
|
KAVERI HOSTEL
|
K01
|
NISHA
|
GODAVARI HOSTEL
|
G07
|
ISHA
|
KAVERI HOSTEL
|
K02
|
HOSTEL
Now, consider a natural join
operation – college join hostel – to combine academic and hostel related
information about a student. The following figure represents the result of such
join operation.
NAME ID DEPARTMENT HOSTEL_NAME ROOM_NO
ANISHA S02 COMPUTER KAVERI HOSTEL K01
NISHA S03 I.T. GODAVARI
HOSTEL G07
The above resultant relation does
not contain any tuple regarding student
‘MANISHA’ and ‘ISHA’. The reason behind this is there is no matching tuple for
these two students in other relation. And so, academic information of ‘MANISHA’
and hostel related information of ‘ISHA’ is lost.
So, though natural join yields
useful and consistent information, it may omit some information, resulting in
information lost.
The outer join operation deals with
such kind of information loss.
The outer join
operation can be divided into three different forms:
1.
Left
outer join
2.
Right
outer join
3.
Full
outer join
These operations are described below:
1.
Left
outer join
The lift outer join retains all the tuples of
the left relation even though
three is no
matching tuple in the right relation.
For such kind of typle, the attributes of
right relation will be padded with
null in resultant
relation.
NAME ID DEPARTMENT HOSTEL_NAME ROOM_NO
MANISHA S01 COMPUTR NULL NULL
ANISHA S02 COMPUTER KAVERI HOSTEL K01
NISHA S03 I.T. GODAVARI HOSTEL G07
The left outer join operation may
not retain all the tuples of the right relation.
2.
Right
outer join
The right outer join retains all the tuples of
the right relation even though
there is no
matching tuples in the left relation.
For such kind of tuple, the attributes of left
relation will be padded with null
in resultant
relation.
The following figure illustrates the result of
right outer join on college and
hostel relation.
NAME
|
ID
|
DEPARTMENT
|
HOSTEL_NAME
|
ROOM_NO
|
ANISHA
|
S02
|
COMPUTER
|
KAVERI HOSTEL
|
K01
|
NISHA
|
S03
|
I.T.
|
GODAARI HOSTEL
|
G07
|
ISHA
|
NULL
|
NULL
|
KAVERI HOSTEL
|
K02
|
The right outer join operation may
not retain all the tuples of the relations.
3.
Full
outer join
The full outer join retains all the tuples of
both of the relations. It also pads
null values
whenever required.
The following figure illustrates the result of
full outer join on college and
hostel relation.
NAME ID DEPARTMENT HOSTEL_NAME ROOM_NO
MANISHA S01 COMPUTER
NULL NULL
ANISHA S02 COMPUTER KAVERI HOSTEL K01
NISHA S03 I.T. GODAVARI
HOSTEL G07
ISHA NULL NULL KAVERI
HOSTEL K02
1.
Assignment operation
Operation:
assigns result of a relational algebra expression to a relation
variable.
Symbol: left arrow
Notation: RelationVariable left arrow (sign)
Relational algebra expression
With
assignment operation, complex queries can be written as a series of smaller
expressions. Each expression assigns its result to a relation variable. This
variable can be used in subsequent expressions. Such kind of assignment must be
made to temporary relation variable. Assignment to permanent relation modifies
a database. The following example illustrates the use of the assignment
operation.
Ex: find largest balance from Account relation.
Query:
Temp1 left arrow (sign) PI (A1.BALANCE)
(SIGMA (sign) A1.BALANCE<A2.BALANCE (Pa1(ACCOUNT) X
Pa2(ACCOUNT))))
Temp2 left arrow (sign) PI (BALANCE)
(ACCOUNT)
Result left arrow (sing) temp2-temp1
So, the assignment operation provides
a convenient way to express comples
queries.
2.
Division
operation (/)
The division operation is suited for a special
kind of query. It is useful in queries that include “for all” phrase.
This operation is denoted by / (division).
The following example given in figure explains
the division operation.
R:
A
|
B
|
A1
|
B1
|
A1
|
B2
|
A2
|
B1
|
A3
|
B1
|
A3
|
B2
|
A4
|
B2
|
T:
B
|
B1
|
B2
|
A
|
A1
|
A3
|
Observe
that for all tuples of S all possible pairs – (A1, B1) and (A1, B2) –arepresent
in R for value A1 of attribute A. So, A1 is included in resultant relation T.
Similarly,
for all tuples of S, all possible pairs – (A3, B1) and (A3, B2) – are present
in R for value A3 of attribute A. So, A3 is also included in T.
But,
for A2, the combination (A2, B1) is present in R, but, (A2, B2) is not present.
So, A2 is not included in T.
Similarly,
for A4, the combination (A4, B2) is present in R, but, (A4, B1) is not present.
So, A4 is not included in T.
Now,
consider the real life application of division operation. Two sample relations
Employee and Works_on are described in figure.
The employee relation represents
employee ID and NAME of the employees. The works_on relation yields information
that which employee is working on which project. For sake of simplicity, only
limited attributes are considered in both the relations.
Employee
EID
|
NAME
|
|||
E01
|
ASMITA
|
|||
E02
|
SMITA
|
|||
E03
|
MITA
|
|||
E04
|
MITAL
|
|||
EID
|
PNO
|
|||
E01
|
P01
|
|||
E01
|
P03
|
|||
E02
|
P02
|
|||
E03
|
P04
|
|||
E04
|
P01
|
|||
E04
|
P03
|
Suppose,
there is a need to retrieve information as stated in following example.
Ex: find out id numbers of employees who work on all the projects that ‘ASMITA’ works.
Ex: find out id numbers of employees who work on all the projects that ‘ASMITA’ works.
Step-1: retrieve the list of projects on
which ‘ASMITA’ works.
Query: temp left arrow sign PI (PNO) (SIGMA
NAME = “ASMITA” (Employee join Works_on))
Result of step-1:
PNO
P01
P03
Step-2: apply the division operation between
works_on and temp relations to retrieve required information.
Query: Works_on / Temp
Result of step-2:
EID
E01
E04
Post a Comment
Please do not enter any spam link in the comment box.