E. F. Codd's formal definition of the relational model and his insight that a database was really a formal logic system made up of propositions was published just before the advent of modern operating systems. SQL is a (more or less) standardized language for querying a database. It was developed by IBM for their System R relational database management system and was originally called SQUARE and SEQUAL. When SQL was developed, UNIX was not in existence and the original SQL developers were unable to contemplate a query language integrated with the underlying operating system (except for PICK, which still enjoys some aficionados. However, Pick Systems was acquired by OMNIS in December, 2000 and is now subsumed into Raining Data).

Every implementation of SQL is slightly different, and each manages to obscure the underlying concepts in a unique and imaginative way:

I used to think that UNIX integrators introduced frivolous incompatibilities because they simply didn't understand the costs. But I've realized that the problem runs deeper: they like the costs. UNIX integrators are competing with each other for new UNIX users; each integrator wants its users to stick to its system. As in Garrett Hardin's ``Tragedy of the Commons,'' every integrator sees an immediate benefit in introducing an incompatibility, even though these benefits eventually add up to a giant loss.

--- Dan Bernstein

The most important reason to use the operator/stream paradigm is that it most closely models the facilities of the underlying operating system. SQL and other (third generation, procedural) paradigms put up artificial walls of abstraction between the user and the actual operation of the underlying system, providing a gilded software cage which stunts imagination and innovation, to say nothing of performance.

There are several positive reasons for using SQL:

  • 1) you already know it.
  • 2) you have to use it where it's already being used.
  • 3) you need a scalable database with indices, and flat files don't offer the flexibility and performance that "real" database systems provide.

    1) and 2) are darned good reasons. If you already know SQL, and/or you or others have a lot of time or effort invested in SQL code written for yourself or others, then there is still a learning curve in order to properly understand and use /rdb in conjunction with the UNIX operating system. This is not an added cost without prior SQL experience.

    Is /rdb a "real" database management system?

    A specific formal implementation of a relational query language is said to be relationally complete if it can be used to express any query that the relational algebra supports. Further, applying any relational operator to a table (relation) must produce a well-formed table (another relation). /rdb implements "relations" as tables of rows (tuples) and columns (attributes), and provides the shell-level operators

    ACID

    Constraints

    Proper normalization is a first step in enforcing constraints. Then, specifying and applying rules and guaranteeing their enforcement prior to any write operation allows a formal proof of "correctness", though most people are more cavalier. Chris Date's commentary defines what we can say about our stuff, and how to make sure that what we say is true, or correct, and provides an execllent tutorial about constraints. Another way of thinking about the combination of a relational database and its constraints is to regard the column names as predicates and each row as a "true" statement using them. Constraints are expressed well in a declarative language like Prolog. /rdb has several facilities to assist integration with Prolog.

    What html scripting language supports /rdb?

    Here's an example using mod_dtcl:

    Does /rdb support indices?

    An indexed search is a high-speed natural join where the first table consists only of keys (the search argument(s)). /rdb supports five types of indexing methods. including hash. Once a hash index is created, additions, deletions and searches take an inconsequential time. The overhead for /rdb's hash index is ten bytes/record, and the time to create the index involves reading the original table, writing 10 bytes of hash index per record, and computation. I/O overhead comprises only 2% of the indexing CPU overhead, while computation overhead takes the other 98%: indexing overhead is reduced nearly linearly with increased cpu speed.

    Is /rdb scalable?

    Using the operator/stream paradigm results in "embarrassing parallelism" and benefits effortlessly from SMP multiprocessor systems. It is the work already done by the implementors of SMP aware operating systems that allows near linear speedup as additional processors are added. Developments in grid configurations like MOSIX are especially scalable.

    No unnecessary thought need be applied to questions of threads, thread-safety, or thread awareness. Unless, of course, you need to descend to using the C-callable subroutine library. Threads are seen by some as a bad idea.

    Connection Pooling

    The description of the problem doesn't get any better than at what used to be ars digita till Philip unloaded it to RedHat.

    Fastcgi as well as the cgi daemon approach to Apache web serving minimize the impact of server forked cgi processes that listen for keys on their standard input and write rows to their standard output, allowing the latency to access the database to approach a theoretical minimum, i.e. interprocess bandwidth (plus some constant). The least optimized design only involves several more forks per query. A 166mhz pc forks slightly under 1,000 times/second, or every .001 seconds. Using the operator/stream query paradigm rather than the third generation procedural SQL paradigm enables optimization and measurement. As traffic through a pipeline is decreased, the time to complete a given query can improve to as little as one or two forks per query operator, plus bandwidth latency. Scalability is not a problem.

    Why not use C? (or C++, or C#, or ...)

    ... it is strongly recommended that you think twice before writing C or other language programs. The shell, UNIX tools, and /rdb commands are so powerful, fast, and easy to develop, you seldom need to bother with the old third generation programming languages. If you think of writing a program in C by habit, try to break the habit. Always try to do things in the UNIX shell and /rdb first. Only resort to C when there is a compelling reason, as opposed to a compulsion. We consider resorting to C as a failure of imagination, or lack of knowledge or insight, in most cases. If you think you need a C program, you may really need to know more shell programming tricks. Think of your problems as a more general problem, and check to see if there is a UNIX or /rdb program that will do the job.

    Speed is the most common excuse for descending to C coding. Remember that computers are getting faster and cheaper. Modern UNIX shells are so fast that the speed penalty with respect to C programming is significantly narrowed. You can see if it is too slow before coding. Often our intuitions are wrong in these matters. Also, the users might change their minds and decide on a different way to approach the problem. Your fast prototype will then save you a lot of unnecessary C coding.

    If you do descend to C, try to write small programs that can be used in future shell programming. Only write what is necessary. Read from the standard-in and write to the standard-out. Make your programs table driven and use database table and list formats so that the full power of UNIX and the database can manipulate the tables that drive your programs. Of course, if you are a devotee of pain and suffering, forget this advice.

    Developers and hackers point out that "the more you can avoid programming in C the more productive you will be".