Theory of Operation



DATAMAN is a general purpose data base and record manager. It consists of a database server that runs on a Linxu/Unix system and a library of routines that are bound into an application program to be called at run time. The package provides for both indexed and sequential record access. The individual files that DATAMAN handle have a physical limit of approximately 9.22x1018 bytes. The routines also provide for multi user capabilities in record locking. Internal file and record locking are performed invisibly to the user during file and index updating. Co-operative record locking is also provided for at the user level. Data records may contain zero or more fields that are blobs. This means that a data record may have both a structured and an unstructured part. The structured part of a data record may be up to 32Kb, and a record with blob data may be up to 2Gb.

General Information
Database Structure
Fields and Files
Indexes
Master vs Work Files
Datafile I/O
Designing a Datafile Layout
Building a Database
Index and File Pointers
Transaction Processing
Windowing

General Information
In general, the Dataman package tries to do everything for the user that it can. It pre-defines records and fields for the user. When any database access occurs, the record is placed into a variable already defined for the user. It knows how many fields are in a record and when the record is retrieved, it is parsed into it's constituant fields. Records that are retrieved by referencing an index are placed in what is called a 'master file record'. In each of the language APIs the most common way to access the data is as a string (appropriate for the language). In the "C" api this is an array of strings called mfld. In "C++" it is record class with an array of datafields called master. It is similar in java, there is a class predefined named Dataman.master, and in PHP it is simply an array named master. In C, you need to use conversion routines to change the data from a string to the type you want to use. In C++, it does some automatic conversion between string, integer, and floats depending on the operations you perform on them. Java has appropriate get and put routines. PHP, of course uses it's polymorphism to do it's own conversions.

Currently, in C, C++, and Java, the user must also use what (in the old Nixdorf/Entrex system) is called a workfile. This has traditionally been a 'scratch' space. It is used to keep disparate data from different records in one place. Not using a work file is soon going to be made a command line option. In 'C' the record from the work file is stored in the array wfld, 'C++' workfile, and in Java Dataman.workfile. The first record of the workfile is implicitly read on system initialization. For very simple applications the workfile is generally a dummy or place holder file, and is not really used.

Database Structure
There may, of course, be several databases on any one system. There may be copies of the same database. How one determines a database is the directory structure where it resides. On any connection to the server, the application must say where the root directory of the database is. So, the more roots you have, the more completely seperate databases you can have. Under this database root, there are several directories that are required; they are files, index, and blobs. The names are supposed to be self-explanatory Data files will contain each file for that database, index the indexes that refer to those files, and blobs will contain the blob data that different records in the database refer to. So, for example you have an accounting database, you might define a root to be /var/databases/accounting It doesn't matter if you have only the accounting database there, but logically, it can help by keeping things seperate.

Fields and Files
A database may consist of one or more data files. A data file is somewhat analagous to a table in a relational database. A record format is the description of the data contained in a record, and is analagous to a table description. You define fields (columns) for each record format, and define the sizes for each field. The difference is that a single data file may have up to 63 different record formats defined in it. This means that a table may contain 'sub tables', if you want to think of it that way. Another name for this structure is 'jagged tables'.

For example, in the packaged pharmacy example program, there is a file containing the English translations of the Latin instructions used for prescriptions. This file would contain one record format. This would perhaps have the English translation, the Latin SIG, and possibly a code number. Fields 1, 2 and 3 would each be 20 characters long, to contain all of the possible translation text, field 4 would be 7 characters long and contain the Latin SIG and field 5 would be 3 bytes long and contain a numeric code. Now, if for example, you lived in an area that was heavily populated with Spanish speaking people, you might want the same information in Spanish as well. Now you designate a second record format to be inserted immediately after the first. Because you know where the second record format is you don't need the Latin SIG in the format or the numeric code either, or even a key on the second record. All that is needed is the Spanish translation. By defining proper formats and because the structure of the database is determinate, you can establish relationships between or ownership of records

Indexes
The index sub directory contains the indexes to the database. All data is initially retrieved from master data files by referring to a key in an index. You cannot retrieve any data without first opening an index and performing one of the get routines. Currently an index may refer to data contained in up to 99 different files and the API allow an application to open and reference up to six different indices at any one time. As a result a database may refer to up to 594 data files at once.

Internally all keys in an index are the same length. Any keys that are shorter than the maximum length defined for that index are padded with null characters. If you are worried about space it may be advisable to have several indexes, and to group keys with approximately the same length into the same one. The package also allows you to have duplicate keys. Duplicate keys will be differentiated by the order that the data records they refer to were inserted into their data files.

Master vs Work Files
There are two types of files used in the dataman database. The first is referred to as the MASTER file. This is a file that is currently being pointed to by an open index. Records may be retrieved from these files only by referencing the index name. These files are opened and closed as needed as the index refers to them, and a process may refer to all MASTER files during a session. A WORK file is a remenant from the requirements from the original EDITOR language. It is actually only required in C, C++, and Java. It is opened at the beginning of the process, and the first record of that file is read into memory. This file is never referenced by an index name, and is always the only work file. The only exception to the latter is for sort routines. These routines create indices, and require all files that are to be included be named as work files. The procedures release and sort will only operate on work files. In C a master record is referred to through the mfld array, and the work file the wfld array. In C++ and Java the two names of the record class instansiated are master and workfile. In PHP the master record is a global array named $master. These arrays/classes are global and never need to be initialized by the user.

Datafile I/O
Input and output from data files is simplified. The user, just by moving to a new record in the data base writes the current record, and retrieves the new. A data record is returned as strings in the master file or work file arrays. Since we are talking about field numbers, the arrays do not use the zeroth array element. Please do not refer to the array element 0 in these arrays.

For a record to actually be sent to the server to be written to the datafile, it must be 'marked'. There are macros and functions that will mark a record. After you modify any data in a record, it is suggested that you call the appropriate marking routine. This way, there is no network overhead transferring records that do not need to be updated. Also remember that in the C library that there is no error checking for data written beyond the end of a data field. This is consistent with the C language. The user, in all cases is responsible for the integrity of the data fields.

Designing a Data File Layout
The first thing to do when implementing a database application is to determine the data elements that will be used. In an extremely simple charge account system the necessary data will include the account name, address, city, state, zip, and the account number. There would also be transaction records. These would include the transaction date, amount, location, and perhaps some sort of transaction code. The thinking is already in terms of different record formats. The first would be a demographic record and the second the charge record. The most logical way to organize this data base would be to have the demographic record immediately followed by multiple occurrences of the charge record. This saves the need for keys on every record, and groups related data together. All these considerations should be taken into account when designing your data base.

Building a Database
After all of the data elements needed have been defined, group related data into different files. In the charge account system described above, there would be only one file. In a large system, such as the example prescription filling program, there are several files; one each for doctors, patients, the drugs and so forth. Determine the structure of each file and the data required for the initial record needed for each file, as a data file must have at least one record in it. Now you use the mkdf utility to build the data file.

After the data files are built, you need to determine which record formats need index keys on them and how they will be built. Again, drawing from the prescription filling example, since the indexes can refer to data in multiple files, there is a sort character prepended to each key, depending on which file the record is in. This is entirely arbitrary, and up to the developer. This is, again, how it was done in for -this- particular application. Now the developer needs to write a sort routine. This routine will loop through each record and enter keys into the database as needed for the application. You now have a database that is ready for use.

Index and File Pointers
Each open index has a key pointer (KP), and a master file record pointer (MFRP). The KP refers to the key retrieved from the last get routine. A MFRP will refer to the data record that that particular index last retrieved. Because you can retrieve records independantly from key retrieval, the MFRP does not necessarily have to be the one set when the KP was returned. The following table indicates how the KP and MFRP are affected upon successful completion of the database functions.

Command KP MFRP
get updated updated
get_first updated updated
get_last updated updated
get_next updated updated
get_prior updated updated
get_current updated updated
forward unchanged updated
back unchanged updated
insert unchanged updated
delete unchnaged/invalid updated
include updated unchanged
remove unchanged/invalid unchanged
protect unchanged unchanged
clear unchanged unchanged
save unchanged unchanged
restore updated updated


Transaction Processing
There is now a basic form of transaction processing which introduces new verbs to the Dataman package. These are not part of the EDITOR language, so deserve a seperate section. The new verbs are (with variations appropriate for thier language) start_transaction, rollback, and commit. Starting a transaction has the effect of notifying the application that we are now in a transaction, and that everything until the corresponding rollback or commit is not to be written to the database. This does not perform an implicit protect() call on a record. If you are in a multi-user environment, you still need to use the protect and clear verbs. If you are modifying a record, and someone else wants to modify the same record, the two applications can over write each other. Also note that the work record is -not- part of the transaction. If you are trying to update the workfile record, do not do it in a transaction.

When rollback is called, the transaction is terminated, and the database is not updated. When commit is called, the transaction is terminated and all parts of the transaction are performed atomically. If any part fails, all parts fail, and it is if a rollback was called. At present, the transaction processing is not totally ACID compliant, but will perform much the same.

The fact that nothing is written to the database until commit is called may cause some confusion. The modified data is not available to the user either after she successfully performs any of the database calls that retreives a new key or record. Only after the commit does new data become generally available to -all- users.

Windowing
The C and C++ API libraries contain a very rudimentary windowing system. These routines can do simple windows, cursor positioned reads and writes, and display system warnings which require end user input to continue. If you built the libraries with DWINDOW defined in config.h.in, then the system is initialized automatically once you call either of the basic dataman initialization routines, init_dataman, or mkidx.

If you use this system, there is a keystroke that will return from a system warning, or return false from a positioned read. Historically, on the NIXDORF keyboard this key was labelled 'HELP'. Thus here, it is called the HELP key. It defaults to be the 'Home' key on modern keyboards. If you want it to be something else, you need to set the environment variable HELP to an appropriate value. You can use a numeric value to define it. For example if you want the HELP key to be <ctrl>A, then you would want to define help to be the numeric value '1'. You could also define it to be the string "^A".

Window and screen positions begin in the upper left corner with a co-ordinate of 0,0. All positions are relative to the display's upper left corner, not relative to the 'window' you are in. A window is merely an area on the screen that can be drawn with a border and colored background. The area under this window is saved, and when the window is poped, that area is then restored.