CSE 382 Algorithms and Data Structures

Barbara Nostrand, Ph.D.

Electrical Engineering and Computer Science


Laboratory 4 - Relational Databases

Introduction

There are three main types of database:  flat-file,  network,  and relational.  Which is the best one to use for a particular job will depend on factors such as the type and the amount of data to be processed;  not to mention how frequently it will be used. 

Software Engineering

You are to design and build a simple database system in Java.  Begin by reading What is a Relational Database? and Introduction to Relational Databases.  Your system must support databases with a variety of schemes.  We will make several simplifying assumptions to keep the project managable: 

  1. All data will fit in main memory. 
  2. The columns of a relation will be numbered (and thus ordered),  but not named.  To make life simple in Java column numbers will start at zero. 
  3. All data can be stored as character strings. 

Relation abstract data type

You should begin by creating a Java class (ADT) to represent a relation.  Your relation class should have two constructors.  The first will create an empty relation with a given number of columns.  The second will take a second argument that specifies a file name from which initial data for the relation should be read.  You may assume that tuples in the file are separated by newline characters and that fields (attributes) within a tuple are separated by tabs.  You should deal gracefully with invalid file contents (e.g. a line with too few or too many attributes),  printing an appropriate error message and producing a well-defined result. 

Internally, you will probably want to represent a relation as an array of Java strings.  We will count on Java garbage collection to reclaim the space consumed by relations no longer in use. 

In addition to the constructors,  your ADT should provide the following methods: 

print
print the relation as lines of tab-separated fields on the screen. 
insert
insert a given tuple into the relation. 
delete
remove from the relation all tuples matching a given predicate
lookup
(a.k.a. select) return a new relation containing all tuples from the original relation that match a given predicate.
project
return a new relation that is the projection of the given relation onto a given list of fields.  If the original relation has six fields, for example, and [1, 4, 2] is the list onto which to project,  tuple (a, b, c, d, e, f) in the original relation would become tuple (b, e, c) in the new. 
join
return the join of the original relation and a relation passed as a argument,  equating the last field of the original relation with the first field of the argument,  in such a way that the fields of the original relation precede those of the argument in the result.  Note that this very simplistic notion of join (without attribute names) will generally need to be used in conjunction with projection operations that reorder the fields of the arguments and result. 
union
return the union of the original relation and a relation passed as an argument. 
intersect
return the intersection of the original relation and a relation passed as an argument. 

Several of these methods will be easier to implement if your relation class supports the standard Enumeration or Iterator interface. 

For the purposes of delete and lookup methods, a predicate is a triple consisting of

  1. attribute identifier (field number)
  2. comparison test (<=, <, >=, >, ==, !=) — to be performed lexicographically on strings
  3. attribute identifier or string constant
Note that lexicographic comparisons of the strings representing integers will do the "right" thing numerically if (and only if) you ensure that all tuples use the same number of digits to represent a given attribute.  For example,  "00123" < "01200", but "123" > "1200". 

Query language

To allow a user to manipulate the database,  you will need a command language and a way to refer to relations.  You should maintain,  internally, a mapping from single-character relation names (lower case letters will suffice) to the relation (if any) named by that character.  This mapping can be a simple array.  Your system should then read a sequence of commands from standard input, one command per line.  Valid commands are as follows: 

You should of course detect and handle invalid commands.  You should print a helpful message,  ignore the command,  and continue. 

CollaborationYou will complete this project with your lab partner. 
ImplementationYou must implement your system as a Java application.  You may use Jgrasp.  Do not use Netbeans,  Visual Studio,  or produce a Java applet. 
EnvironmentYour program must execute on the podium computer in our classroom.
Turn inYou must turn in both a report containing complete program sources and an executable electronic version of your program.

Last modified: 2008 JAN 28
bnostran@syr.edu