CSE 382 Algorithms and Data Structures
Barbara Nostrand, Ph.D.
Electrical Engineering and Computer Science
There are three main types of database: flat-file, network, and relational. Which is the best one to use for a particular job will depend on factors such as the type and the amount of data to be processed; not to mention how frequently it will be used.
Flat-file databases are ideal for small amounts of data that needs to be human readable or edited by hand. Essentially all they are made up of is a set of strings in one or more files that can be parsed to get the information they store; great for storing simple lists and data values, but can get complicated when you try to replicate more complex data structures. That's not to say that it is impossible to store complex data in a flat-file database; just that doing so can be more costly in time and processing power compared to a relational database. The methods used for storing the more complex data types, are also likely to render the file unreadable and un-editable to anyone looking after the database.
Hierarchical databases organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records, which have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by using trees, like set theory used in the relational model, "borrowed" from maths. For example, an organization might store information about an employee, such as name, employee number, department, salary. The organization might also store information about an employee's children, such as name and date of birth. The employee and children data forms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. If an employee has three children, then there would be three child segments associated with one employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs were popular from the late 1960s.
Network databases were also popular in the 1960s. Some data records were more naturally modeled with more than one parent per child. So, the network model permitted the modeling of many-to-many relationships in data. The basic data modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multiparent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types may exist, as well as sets between them. Thus, the complete network of relationships is represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the network arrow) and one or more record types are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The CODASYL (Conference on Data Systems Languages) network model is based on mathematical set theory.
Associative databases divide the real-world things about which data is to be recorded into two sorts:
Relational databases are an improvement on the network mode. A mathematical relation is a subset of the cartesian product of a collection of sets X1, …, Xk.
Relational databases have the following properties:
Relational databases such as MySQL, Microsoft SQL Server, and Oracle, have a much more logical structure in the way that it stores data. Tables can be used to represent real world objects, with each field acting like an attribute. For example, a table called books could have the columns title, author and ISBN, which describe the details of each book where each row in the table is a new book.
The "relation" comes from the fact that the tables can be linked to each other, for example the author of a book could be cross-referenced with the authors table (assuming there was one) to provide more information about the author. These kind of relations can be quite complex in nature, and would be hard to replicate in the standard flat-file format.
You are to design and build a simple database system in Java. Begin by reading What is a Relational Database? and Introduction to Relational Databases. Your system must support databases with a variety of schemes. We will make several simplifying assumptions to keep the project managable:
You should begin by creating a Java class (ADT) to represent a relation.
relation class should have two constructors. The
first will create an empty relation with a given number of columns.
The second will take a second argument that specifies a file name from
which initial data for the relation should be read.
You may assume that tuples in the file are separated by newline
characters and that fields (attributes) within a tuple are separated by tabs.
You should deal gracefully with invalid file contents (e.g. a line with
too few or too many attributes), printing an appropriate error message
and producing a well-defined result.
Internally, you will probably want to represent a relation as an array of Java strings. We will count on Java garbage collection to reclaim the space consumed by relations no longer in use.
In addition to the constructors, your ADT should provide the following methods:
Several of these methods will be easier to implement if your
relation class supports the standard
For the purposes of delete and lookup methods, a predicate is a triple consisting of
!=) — to be performed lexicographically on strings
To allow a user to manipulate the database, you will need a command language and a way to refer to relations. You should maintain, internally, a mapping from single-character relation names (lower case letters will suffice) to the relation (if any) named by that character. This mapping can be a simple array. Your system should then read a sequence of commands from standard input, one command per line. Valid commands are as follows:
c a 4 filenameThis command creates a new relation a with 4 fields, and reads the data for the relation from the (optional) file specified.
a b "foo" "3" "hi, mom"This command adds a tuple into the existing relation b. The number of additional arguments after the b must equal the number of attributes in relation b.
o aThis command will print relation a created above.
p a 2 3 4 bThis command will project relation a onto 2 fields (fields numbers 3 and 4 in relation a) producing relation b.
j a b cThis command joins relations a and b on the final field of a and the initial field of b, producing relation c. If a has
nfields and b has
m, c will have
s a 1 == 2 bThis command selects those tuples from relation a where field 1 == field 2, and produces a new relation b containing those tuples.
s a 1 != "foo" bThis command selects those tuples from relation a where field 1 != "foo", and produces a new relation b containing those tuples.
The operators allowed in the select command can be any of the comparison operators supported by your implementation of predicates.
d a 1 == 3This command deletes those tuples in relation a where field 1 == field 3.
As with the select command, the second operand of the predicate may be a string, and all of the comparison operations must be supported.
u a b cThis command creates relation c, giving it all tuples found in a or b. Relations a and b must have the same number of fields.
i a b cThis command creates relation c, giving it all tuples found in both a and b. Relations a and b must have the same number of fields.
You should of course detect and handle invalid commands. You should print a helpful message, ignore the command, and continue.
|Collaboration||You will complete this project with your lab partner.|
|Implementation||You must implement your system as a Java application. You may use Jgrasp. Do not use Netbeans, Visual Studio, or produce a Java applet.|
|Environment||Your program must execute on the podium computer in our classroom.|
|Turn in||You must turn in both a report containing complete program sources and an executable electronic version of your program.|