Eric’s Axioms of Data Management #1

Nov 6 2006

If supporting foreseeable business changes requires adding tables or columns to your database, the design is deficient.

Over the years I have learned many principles regarding database design. Some I discovered on my own, usually through my own mistakes and pain. Some passed to me through a mentor or seminar. But the very first of all these is still in my Top Ten list of database design ideas:

If supporting foreseeable business changes requires changing the structure of your database, you have a problem with the design.

The database design should not have to change when obvious expected things happen. Things like adding a new customer, or updating a product's price, or . . . well, August.

This nugget came to me from a coworker in my first position where I dealt with databases although I don't think he realized it at the time. I had recently migrated an Excel-centric department reporting application over to Microsoft Access 2.0. I gained a bit of notoriety in the division as having some expertise with Microsoft Office in general and others frequently asked for tips or help troubleshooting.

So a co-worker asks me one day, "Eric, is there a limit to the number of tables you can have in a UNION query?"

"Um..."

I didn't know! At first I was somewhat flustered that I didn't know. But then I also couldn't think of a scenario where it would matter too much. So I responded:

"Why do you ask?"

He drew my attention to his monitor where he pointed at the tables in his database, a work-in-progress. They were:

MonthlyResults_Jun94
MonthlyResults_Jul94
MonthlyResults_Aug94
MonthlyResults_Sep94

As dense as I can be, I could even see where this was going.

So I introduced him to my friend, the "date" type and showed how, with a simple yet magical column called "EffectiveMonth," one big table can actually store all these results and you can just flag each one by what month the results were for.

"Oh." He paused, bewildered. "Wow, that's neat!"

Indeed it is.

I have since come to believe this principle is one of the clearest marks of good database design. Or good software design in general. Think about likely changes and build a structure that won't make the obvious changes a nightmare to handle.

A "senior" developer I worked with more recently in my career was fond of saying, "You can't anticipate future requirements." I argue instead: You must anticipate future requirements if you call yourself a professional. Otherwise, you are just a typist who knows some syntax.

(As an aside, check out what has to be the worst example of this I have seen to date.)

Database Design , ,

Microsoft Word is a Legal Document

Feb 25 2005

To say "Access is a relational database" is as incorrect as saying, "Word is a legal document." We need to be more clear than this as data professionals.

Terminology confusion abounds in the software industry and I believe the worst examples are in database management in particular. So I shouldn't have been surprised to read the nonsense I discuss here... but still I was. The following quote opens up edition 6.14 of Woody’s Access Watch, a normally fine e-newsletter focused on building Microsoft Access solutions:

Access is a normalized database (except for purists who don’t count a database as normalized unless it adheres to all 13 of the Codd & Date rules), and that means that you should store linked data in different tables.

If you don’t reel from an affront to your sensibilities from that one sentence, you should. This is no different, to me, than saying:

Microsoft Word is a legal document.

Sounds silly, right?

But did the Access Watch quote sound as odd to you? It should. It reveals many misunderstandings about database technology and practices and is confused and muddled, something we of all people should avoid.

Ignoring the poor grammar, here are the technical reasons this WOW statement should offend you as a technology professional:

 

1. Microsoft Access is not a database.

Read that again. Access is not a database. A database is an organized collection of information accessible for some purpose. Is Access an organized collection of information? No! It is a piece of software. Many people are far too cavalier with the word “database.” Database professionals—those in our society who are responsible for (and presumably trained for) organizing and classifying the world’s information—should be the least so.

Fundamentally, Access is a “database management system” or “DBMS” for short. You could call it “database software” or even, loosely, a “database system.” But it not a database. “Northwind” and “pubs” are databases. IMDB.com is (largely) a database. The card catalog at the library is a database. But Microsoft Access is a DBMS.

Calling Access a database is like calling Microsoft Word a “document.”

 

2. Nothing guarantees that any particular database implemented within Access will be normalized or not normalized . . .

. . . any more than using Word to write documents guarantees they will be legally enforceable documents.

In my experience databases implemented with Access frequently violate all sorts of normalization rules. One recent example I saw had a single column, “[City, State]” (yes, including the comma and space!). This clearly violates first normal form (“1NF”).

And don’t mistake this for an Access limitation or problem. No DBMS forces you to follow good database design practices. I have seen equally good and bad designs in Access, Oracle, and SQL Server. (Okay, the more advanced tools are somewhat more likely to be used by better trained professionals. But the edge is slight.)

To think of Access as a “normalized” database tool or even a “relational” DBMS would be akin to thinking of Word as a tool that automatically produced legally enforceable documents regardless of what you typed.

Being an expert at Access (or any database tool) no more qualifies you to implement correct, high quality database solutions than being a Word expert would qualify you to be a Pulitzer prize-winning author. It helps get the job done faster, but you still have to know how to do the job. (Most hiring managers do not understand this!)

 

3. Normalization does not mean adhering to Codd’s 12 rules.

Note: the guidelines are normally listed as "12" rules and were proposed by Ted Codd (not "and Date"). They are much quoted, including of course in Date's standard database text. See this site for a quick summary. Since there is a "Rule Zero," it would be fine to say the "13 Rules," but most texts do not call them such.

Codd’s 12 rules for relational database systems were geared toward those hoping to implement a relational DBMS and, as such, dictate certain behaviors of those software systems.

Normalization, in stark contrast, applies to the table-level design of one particular database.

See why it’s so important to be clear about “database” v. “DBMS”? I hope so.

(As an aside, being fully “normalized” also does not mean being “relational.” Adhering to 1NF is required. Beyond that they are simply design recommendations. See C.J. Date’s excellent article “What First Normal Form Really Means” for details.)

 

4. Normalization does not per se mean storing data in different tables and then linking back together.

Granted, such a design is usually a consequence of a normalized design. But normalization rules are strictly about removing certain types of redundancy from a certain table. Usually this is accomplished by decomposing the table in question into two or more related tables, for example splitting out a “Children” column into a separate “Child” table containing one row per child. But it’s important to note the difference.

It is especially important to note that just because you have separated data tables and relationships between them does not mean you have a normalized database or a relational database.

Legal jargon is probably necessary to produce a legal document, but just a glut of legalese scattered in my document does not mean it will be legally enforceable.

But possibly the biggest problem with the Access Watch quote is that it comes from a supposed industry “expert.” Unfortunately such ignorance of our field is rampant. You must be careful and pay attention to the fundamentals.

Keep in mind:

If you do not know the history of your own profession you can never hope to be but an amateur.

—anonymous

 

Note: For much deeper and longer treatment of many of the industry's database confusions, see Fabian Pascal's site, www.dbdebunk.com. If you regularly work with databases, you owe it to yourself to be familiar with his and Chris Date’s writings available at this site.

Data Management , ,