Database Files

A database is an organized collection of data.  Data is typically formatted into records (aka rows) and fields (aka columns).  The format that we choose for our records and fields determines the file format that will be used when we save and retrieve our data to/from disk.

Text Databases

The easiest way to create a database is using simple ASCII text.  Fields are separated by spaces or tabs, and records are separated by newline characters (so that one record is saved per line).  C++ considers spaces, tabs and newlines to be whitespace, and the built-in iostream library extraction operators will generally ignore it.

We use an ofstream to write the file, and an ifstream to read it back in.  We write the record count (number of records in the database) as the first line of the file, so that we know how many records to read in later.

We declared a data structure, team, that stores one database record.  Each of the structure members represent one field in the record.  We also define an insertion operator and an extraction operator to save and retrieve single records. We had looked at insertion operators previously; the general form for an extraction operator is:

istream & operator>>( istream &, type & );

where type is your data type.  You must pass the data by reference, because we want the function to change its value to whatever it read from the stream.  If we passed by value, the function could only change a copy of the data, not the original.

Like insertion operators, the iostream.h has several "built-in" extraction operators.  For example,

istream & operator<<( istream &, char * );
istream & operator<<( istream &, char & );
istream & operator<<( istream &, int & );
istream & operator<<( istream &, double & );

Note that the string extraction operator just uses a pointer (*) instead of a reference (&).  That's OK -- in reality, a reference is a pointer!  Since a string is an array of characters, having a pointer to the start of that array allows us to change the entire array.

Listing 1a.  Simple text database

// Simple text database using streams

#include <fstream.h>
#include <string.h>

struct team
{
	char * name;
	int gp;
	int wins;
	int losses;
	int ties;
	int gf;
	int ga;
	int pts;
};

// Write a team to an ostream (cout, ofstream, etc.) in text form

ostream &
operator<<(ostream & ostr, const team & theTeam)
{
	// write only the essential data, i.e., that which cannot
	//  be calculated

	// in our database, spaces are field (column) separators,
	//  so a team name can't have spaces

	ostr << theTeam.name << ' '
	     << theTeam.wins << ' '
	     << theTeam.losses << ' '
	     << theTeam.ties << ' '
	     << theTeam.gf << ' '
	     << theTeam.ga;

	return ostr;
}

// Read text-form team data into from an istream (cin, ifstream, etc.) 
//  into a team 

istream &
operator>>(istream & istr, team & theTeam)
{
	// assume no team name is longer than 19 letters

	char tmpname[20];
	istr >> tmpname;

	// only keep the ones we need

	theTeam.name = new char [strlen(tmpname) + 1];
	strcpy(theTeam.name, tmpname);

	// read the important numeric data

	istr >> theTeam.wins
	     >> theTeam.losses
	     >> theTeam.ties
	     >> theTeam.gf
	     >> theTeam.ga;

	// calculate the others on-the-fly

	theTeam.gp = theTeam.wins + theTeam.losses + theTeam.ties;
	theTeam.pts = theTeam.wins + theTeam.wins + theTeam.ties;

	return istr;
}

int
main()
{
	const int N_TEAMS = 6;

	// data taken from the Hamilton Spectator Sports section,
	//  Monday, February 22, 1999

	const team NHL[N_TEAMS] =
	{
		{ "Toronto",  56, 32, 20, 4, 181, 168, 68 },
		{ "Montreal", 59, 23, 28, 8, 139, 154, 54 },
		{ "Detroit",  59, 31, 23, 5, 175, 147, 67 },
		{ "NewYork",  57, 23, 27, 7, 158, 159, 53 },
		{ "Chicago",  59, 16, 32, 8, 131, 190, 40 },
		{ "Boston",   56, 23, 24, 9, 142, 132, 55 }
	};

	const char * const FILENAME = "nhl.dat";

	// Open an output stream to the file and overwrite (throw away) any
	//  previous contents

	ofstream out(FILENAME, ios::out | ios::trunc);
	if (!out)
	{
		cerr << "error saving " << '"' << FILENAME << '"' << endl;
	}
	else
	{
		out << N_TEAMS << endl;		// write the number of records

		for (int i = 0; i < N_TEAMS; i++)
		{
			out << NHL[i] << endl;	// write each record,
						// terminate with a newline
		}
	}
	out.close();		// close the stream, so we can re-open it
				//  for read ...
	ifstream in(FILENAME);		
	if (!in)
	{
		cerr << "error restoring " << '"' << FILENAME << '"' << endl;
	}
	else
	{		

		// we don't know in advance how many records there will
		//  be, so we have to allocate the array size dynamically

		team * teamArr;
		int n_teams;

		// read in number of teams (the array size)

		in >> n_teams;
		teamArr = new team [n_teams];

		for (int i = 0; i < n_teams; i++)
		{
			in >> teamArr[i];	// read in each record

			// write the team to cout, but we have to
			//  add the "games played" and "points" fields,
			//  because our insertion operator (<<) doesn't
			//  write them

			cout << teamArr[i] << ' '
			     << teamArr[i].gp << ' '
			     << teamArr[i].pts << endl;
		}
	}

	return 0;
}

Download this program

Listing 1b.  Simple text database

6
Toronto 32 20 4 181 168
Montreal 23 28 8 139 154
Detroit 31 23 5 175 147
NewYork 23 27 7 158 159
Chicago 16 32 8 131 190
Boston 23 24 9 142 132

If we look at a hex dump of the database file, we see that each character is stored by its ASCII value, and that the field separators are spaces (ASCII 20H) and the row separators are carriage return (ASCII 0DH) and line feed (ASCII 0AH) characters (under Windows 95).

Listing 1c.  Hex dump of simple text database

00000000:  36 0d 0a 54 6f 72 6f 6e 74 6f 20 33 32 20 32 30   6..Toronto 32 20
00000010:  20 34 20 31 38 31 20 31 36 38 0d 0a 4d 6f 6e 74    4 181 168..Mont
00000020:  72 65 61 6c 20 32 33 20 32 38 20 38 20 31 33 39   real 23 28 8 139
00000030:  20 31 35 34 0d 0a 44 65 74 72 6f 69 74 20 33 31    154..Detroit 31
00000040:  20 32 33 20 35 20 31 37 35 20 31 34 37 0d 0a 4e    23 5 175 147..N
00000050:  65 77 59 6f 72 6b 20 32 33 20 32 37 20 37 20 31   ewYork 23 27 7 1
00000060:  35 38 20 31 35 39 0d 0a 43 68 69 63 61 67 6f 20   58 159..Chicago 
00000070:  31 36 20 33 32 20 38 20 31 33 31 20 31 39 30 0d   16 32 8 131 190.
00000080:  0a 42 6f 73 74 6f 6e 20 32 33 20 32 34 20 39 20   .Boston 23 24 9 
00000090:  31 34 32 20 31 33 32 0d 0a                        142 132..
00000099

One limitation of our simple ASCII database is that string fields cannot contain spaces.  Because spaces are used to separate fields, the program can't know whether a space is a field separator or part of the field! ... Unless we change the program. 

Before each string field, if we write out the length of the string as an integer, the program would know how long a field is.  So, by adding an extra field to our database, we can make it more flexible.

Listing 2a.  Writing a text database with spaces in fields

#include <fstream.h>
#include <iomanip.h>
#include <string.h>

struct teamStruct
{
	char * name;
	int wins;
	int losses;
	int ties;
	int gf;
	int ga;
	int gp;
};

int
main()
{
	const int N_TEAMS = 6;

	// data taken from the Hamilton Spectator Sports section,
	//  Monday, February 22, 1999

	const teamStruct NHL[N_TEAMS] =
	{
		{ "Toronto",  32, 20, 4, 181, 168, 68 },
		{ "Montreal", 23, 28, 8, 139, 154, 54 },
		{ "Detroit",  31, 23, 5, 175, 147, 67 },
		{ "New York", 23, 27, 7, 158, 159, 53 },
		{ "Chicago",  16, 32, 8, 131, 190, 40 },
		{ "Boston",   23, 24, 9, 142, 132, 55 }
	};

	ofstream out("nhl.db");

	out << N_TEAMS << endl;

	for ( int i = 0 ; i < N_TEAMS ; i++ )
	{
		out << strlen( NHL[i].name ) << ' ' << NHL[i].name << ' '
		    << NHL[i].wins << ' ' 
		    << NHL[i].losses << ' ' 
		    << NHL[i].ties << ' ' 
		    << NHL[i].gf << ' ' 
		    << NHL[i].ga << ' '
		    << ( NHL[i].wins + NHL[i].losses + NHL[i].ties )
		    << endl;
	}

	return 0;
}

Download this program

Listing 2b.  Text database allowing spaces in fields

6
7 Toronto 32 20 4 181 168 56
8 Montreal 23 28 8 139 154 59
7 Detroit 31 23 5 175 147 59
8 New York 23 27 7 158 159 57
7 Chicago 16 32 8 131 190 56
6 Boston 23 24 9 142 132 56

To read each record in such a database, we first read in the length of the string field.  This not only tells us how many characters are in the field, but also the number of characters we need to dynamically allocate -- using new.  So it's both flexible (in terms of format) and efficient (in terms of memory usage).

We then call ws to skip over the field separator (whitespace), get to read the exact number of characters, and ws again to skip over the next field separator.  We can then read all of the other fields as before.

Listing 2c.  Writing a text database with spaces in fields

#include <fstream.h>
#include <iomanip.h>

struct teamStruct
{
	char * name;
	int wins;
	int losses;
	int ties;
	int gf;
	int ga;
	int gp;
};

int
main()
{
	teamStruct team;
	int n, length;
	ifstream in("nhl.db");
	in >> n;
	cout << n << endl;
	for (int i = 0; i < n; i++)
	{
		in >> length;
		cout << length << endl;
		team.name = new char [length + 1];
		if (length > 0)
		{
			in >> ws;
			in.get(team.name, length + 1);
			in >> ws;
		}
		team.name[length] = '\0';
		in >> team.wins;
		in >> team.ties;
		in >> team.losses;
		in >> team.gf;
		in >> team.ga;
		in >> team.gp;
		cout << '"' << team.name << '"' << setw(3)
		     << team.wins << setw(3) << team.ties << setw(3) << team.losses
		      << setw(4)<< team.gf  << setw(4)<< team.ga  << setw(3)<< team.gp << endl;
		delete [] team.name;
	}
	return 0;
}

Download this program


Back to the COMP435 page