Database Stuff

C.J. Date says...

Relational systems should be used like simple access methods. Don't do things like code loops and joins by hand; let the relational system do it.

Avoid the use of any SQL constructs that reveal the implementation.

Don't fall into the common trap of thinking the term "relation" means a base relation specifically---or, in SQL terms, thinking the term "table" means a base table specifically. (Manni's note: but it's hard to think of a view as not a table when you know it's not, and you know there is a performance hit in current implementations.)

Do your best to avoid type coercions wherever possible.

Ensure that columns with the same name are always of the same type.

Use varchar over char in particular, and, in general, avoid using types that pad space.

Avoid possibly non-deterministic expressions as much as you can.

Don't use pointers or reference values. (Pointers are prohibited in the relational model.)

Since null is not a value, the phrase "null value" is a contradiction; don't use the phrase "null value" say "null" instead.

Appartently, entire rows can be compared in SQL. (Manni recommends never comparing rows, just individual scalars/items/cells.) Unless the rows being compared are of degree one (and thus effectively scalars), don't use the comparison operators "<", "<=", ">", and ">="; they rely on left-to-right column ordering (which the relational model does not support), they have no straightforward counterpart in the relational model, and in any case they're seriously error prone.

In the relational model, because there is no column ordering, the union operator requires its operands to have the same heading (and hence the same attribute names), and the result then has the same heading as well. As a prerequisite to enforcing the same headers on unioned relations, if two columns in SQL represent "the same kind of information", give them the same name wherever possible. Conversely, if two columns represent different kinds of information, it's usually a good idea to give them different names.

Never write SQL that relies on the ordinal positioning of columns. The relational model does not support column order.

Make sure you know when SQL eliminates duplicates for you, and when it does not. In those cases where you do have to specify duplicate removal, ensure you know whether it matters if you don't. In those cases where it matters, specify DISTINCT. And never specify ALL!

WHOLE LIST OF RECOMMENDATIONS ON AVOIDING NULLS IN SQL

Specify NOT NULL, explicitly or implicitly, for every column in every base table.

Don't use the keyword NULL in any other context whatsoever (i.e., anywhere other than a NOT NULL constraint).

Don't use the keyword UNKNOWN in any context whatsoever.

Don't omit the ELSE clause from a CASE expression unless you're certain it would never be reached anyway.

Don't use NULLIF.

Don't use outer join, and don't use the keywords OUTER, FULL, LEFT, and RIGHT. (Manni: However, for generating some reports, left outer join is way too useful; where possible, use COALESCE so that the resulting relation has no nulls.)

Don't use union join.

Don't specify either PARTIAL or FULL on MATCH (they have meaning only when nulls are present). For similar reasons, don't use the MATCH option on foreign key constraints, and don't use IS DISTINCT FROM. (In the absense of nulls, the expression x IS DISTINCT FROM y is equivalent to the expression x <> y.)

Don't use IS TRUE, IS NOT TRUE, IS FALSE, or IS NOT FALSE. The reason is that, if bx is a boolean expression, then the following equivalences are invalid only if nulls are present:

bx IS TRUE      = bx
bx IS NOT TRUE  = NOT bx
bx IS FALSE     = NOT bx
bx IS NOT FALSE = bx

In other words, IS TRUE and the rest are distractions at best, in the absence of nulls.

Use COALESCE on every scalar expression that might otherwise evaluate to null.

END OF NULL SECTION

Don't do SQL updates through a cursor.

Try to avoid operations that are inherently row-level. This recommendation doesn't prohibit set level operations in which the set just happens to be of cardinality one, as in the following example:

update s
set    city = 'New York'
wehre  sno = 'S5';

Always name the columns in an insert. Hence, prefer

insert into sp (pno, sno, qty) values ( 1, 2, 3 );

insert into sp values ( 1, 2, 3 );

In SQL, never lie to the system by defining as a key some column combination that you know isn't irreducible.

For base tables, use PRIMARY KEY to ensure that every such table has at least one key.

Ensure that foreign key columns have the same name as the corresponding key columns wherever possible. (Manni: I like my naming convention better.)

Two situations where this recommendation cannot be followed:

when some table T has a foreign key matching some key of T itself
when some table T2 has two distinct foreign keys both matching the same key K in table T1

Don't use any operation that violates closure if you want the result to be amenable to further relational processing.

JOINS (I don't like these recommendations because I like explicit joins)

Use NATURAL JOIN in preference to other methods of joining.

Avoid JOIN ON because it's guaranteed to produce a result with duplicate column names. (Ensure you do appropriate column renaming where you do use JOIN ON.)

In fact, generally ensure there are not duplicate column names. (Manni: makes sense because tuples, unlike rows, cannot have duplicate column names. This would also explain why Python's DB API default returns rows indexed by column ordinal location and not mapped by column names; because duplicate names are allowed.)

END JOINS

UNIONS

Ensure corresponding columns have the same name and type.

Always specify CORRESPONDING if possible, or, barring that (or your RDBMS's support for CORRESPONDING) ensure columns line up properly by specifying them in the select statements. (Manni: *always* speciify return columns, and *always* alias/name them, whether using union or not, and *never* use 'select *'.)

Never specify ALL.

END UNIONS

If you use GROUP BY or HAVING, make sure the table you're summarising is the one you really want to summarise. Also, be on the lookout for the possibility that some summarisation is being done on an empty set, and use COALESCE wherever necessary.

Given the state of today's SQL products, some constraint checking will almost certainly have to be deferred, even though logically speaking it should be immediate. In such a case, you should do whatever it takes---which in practice probably means terminating the transaction---to force the check to be done before performing any operation that might rely on the constraint being satisfied. (Manni: this seems a bit draconian, but definitely follow the spirit of this.)

Do not bother listing the column names in CREATE VIEW, because the sql in your view definition should specify the column names anyway. (Manni: and, again, *always* specify return columns and their aliases/names.)

CREATE VIEW also allows WITH CHECK OPTION to be specified if it regards the view as updateable. Always specify this option if possible. (Manni: I don't like updating views; I don't trust it to work; perhaps I should?)

Although SQL does not allow constraints on views, include such specifications in the form of comments in your view creation code. (Manni: seems superfluous.)

Specify WITH CASCADED CHECK OPTION on view definitions wherever possible.

Never use the term "view", unqualified, to mean a snapshot; never use the term "materialised view". (Manni: whatever.)

Don't use ALL or ANY comparisons---they're error-prone, and, in any case, their effect can always be achieved by other methods.

Favour the use of explicity range variables, expecially in "complex" expressons.