Reading Recommendations to Accompany “Relational Principles” Training

Feb 5 2008

This is an ordered list (not a relation!) of my reading recommendations that emphasize relational fundamentals and building upon that to create great database models and implementations. Many other specific recommendations are scattered throughout the Training Slides (in the notes), for those who have copies of them.

1. C.J. Date, Database In Depth: Relational Theory for Practitioners. O'Reilly Media: 2005. ISBN: 0-596-10012-4. (Amazon link.)

This should be required reading for anyone who is a database professional. This really is a "greatest hits" of the core concepts Date has been espousing for years and which too few people really understand. If you read only one "theory" book, make it this one!

2. C.J. Date, "What First Normal Form Really Means." (Article)
You can get this directly (for a fee) via www.dbdebunk.com, or as Chapter 8 in the recently released compilation of Chris's writings (see below, #11).

This has given me more comprehension, per page of reading, about the relational model than any other thing I have read. It is so much easier to understand this one concept deeply, once, than to read and re-read the pithy two-paragraph spouting found in most modeling books about "eliminating repeating groups."

3. Graeme C. Simsion and Graham C. Witt, Data Modeling Essentials (3rd ed.). Morgan Kaufmann: 2004. ISBN: 0-12-644551-6. (Amazon link.)

  This is my favorite data modeling book and has a very pragmatic balance over theory for theory's sake. It also does an excellent job explaining the more advanced normalization issues. (Note, it does not include anything on 6NF.)
  If you have more a coding bias in your database work, I'd put Code Complete (next) higher on the list than this.

4. Steve McConnell, Code Complete (2nd ed.). Microsoft Press: 2004. ISBN: 0-7356-19670. (Amazon link.)

  Simply one of the best software engineering books ever published. This will help you become a better programmer no matter what language you use.
  If you have more a modeling/design bias in your database work, I'd put Data Modeling Essentials (above) higher on your list.

5. Joe Celko, SQL For Smarties: Advanced SQL Programming (3rd ed.). Morgan-Kaufmann: 2005. ISBN: 0123693799. (Amazon link.)

  Celko is the main guy for coding great SQL and solving problems the "SQL way" that might not be otherwise intuitive. I recommend having and referencing 2 or 3 of his books as needed.
  Note, however, he is no expert at pure relational ideas and often contradicts what I think is a mor academically-correct approach. Read Date for "what's right" and read Celko for "how to get the job done."

6. Stuart R. Faulk, "Software Requirements: A Tutorial." In Software Requirements Engineering, 2nd ed., 1997. (Amazon link for the whole book, or just the article can be found here and other places (just do a search).)

Not only an excellent summary of the Requirements problem as a whole, but an excellent article on how to design work-products (like requirements or models) to be useful to the audience.

7. Robin Williams, The Non-Designer's Design Book, 3rd ed. Peachpit Press: 2008. ISBN: 0-321-19385-7. (Not that Robin Williams!) (Amazon link.)

Your models are only as good as they communicate. This book is a great primer for generally making designed things (documents in particular) good without going through college as a graphic designer. I think anyone who produces materials for others to consume should read this. It's pretty short and fun.

8. Hugh Darwin, "The Askew Wall" (presentation). (Available here.)

Darwin walks through just some of the most egregious problems with the SQL language. Covers both non-relational aspects of SQL as well as just language design issues.


The rest of the list . . .

. . . are better for a team reference library and shared as needed. Most are a read-once, or an occasional reference, but very useful.

9. C.J. Date, The Database Relational Model: A Retrospective Review and Analysis. Addison Wesley: 2001. ISBN: 0-201-61294-1. (Amazon link.)

Reading this short book is probably more useful than reading Codd's original papers. Chris explains those two seminal papers and comments on areas that have changed (few!) and how they stand up to today's criticisms.

10. C.J. Date, An Introduction to Database Systems, 8th ed. Addison-Wesley: 2004. ISBN: 0321197844. (Amazon link.)

This is a de-facto text book for advanced undergraduate or graduate level database courses and is really what put Chris Date "on the map" for his expertise and clear writing. The 8th edition is worth having if for no other reason than Appendix A which shows by example some of why the "Transrelational Model" could be so revolutionary in implementing truly relational products.

11. C.J. Date, Date on Database: Writings 2000-2006. Apress: 2006. ISBN: 1-59059-746-X. (Amazon link.)

  Chris's shorter writings (articles, whitepapers) become available occasionally in collections like this one. Buying this one book gives you "What First Normal Form Really Means" (#2 above) in Chapter 8, and some great retrospective on Ted Codd in Chapter 1, among many other great topics.
  After purchase, you can spend an extra $10 to get the downloadable PDF version of the book from Apress.

12. Len Silverston, The Data Model Resource Book. (2 volumes.) Wiley: 2001. ISBN (vol. 1): 0-471-38023-7, (vol. 2): 0-471-35378-5. (Amazon links to Volume 1 and Volume 2.)

Len's is the best compilation of ready-made database designs covering various common business scenarios (Volume 1) and some specific industries (Volume 2). I don't advocate using the designs verbatim, but they are a great reference and idea source. It's useful to cross-check your own designs with ones this veteran of the field has assembled as a sanity check.

Database, Database Design, Software Engineering , , ,

Intro to Source-Control for SQL Server Development

Mar 9 2007

[Minor updates 2009-07-16 during re-posting.]

It seems developers coming from a “compiled code” background into database management struggle at first with how to version-control the database artifacts. Turns out, it isn’t really that much different. The first step is to reframe the question: “How do I version control the scripts that create my database?”

Since my good friend Chris recently asked me to remind him how we did it at a company where we worked together, I thought I’d write up a poor-man’s starter way of doing source control for database development. This works no matter what development tool or source-control system you use.

First, organize your source files:

  • One file for bootstrapping. This includes "create database" and any other necessary artifacts like filegroups, custom roles, etc.
  • One file per “object” like procedures, views, triggers, functions. Anything you’d want to have an independent log of changes on. For me this is usually everything except "schema" changes, by which I mean Tables with their associated constraints and indexes.
  • One file per "schema version update." I tend to roll together any table/index updates with whatever DML is needed to transform and move the data into the new schema. I keep a "Version" table in the database itself (created during bootstrap) that can track these updates.
  • Then, once per version update, create a little *.bat or *.vbs file that combines the necessary files in the right order for deployment.

This general strategy has worked well regardless of the source-control system I'm using.

(Aside: I recently started a little .NET utility to make this pattern easier to apply; it relies an a single XML manifest file listing the scripts in dependency order each with an attribute of which 'version' they were last modified in. The DB has a 'version' table, and provides a 2-proc interface for "GetCurrentVersion" and "UpdateVersion", then the *.exe can determine which scripts to run in which order. Updating objects in source is relatively painless because they can be checked in (n) times during iterations and the manifest is only updated when I'm confident enough in it to include in the official build/release/rollout.)

If you use the full Team Suite editions of Visual Studio, I'd also recommend Visual Studio for Database Professionals, which automates shredding the DB structure into a script&local-only environment for editing and builds deployment scripts for you. My only gripes with it are (a) it obviously doesn't handle any custom data transforms you might need during a schema update; (b) it shreds the DB a bit too granularly for my tastes -- each column, each index, each constraint, etc.

Some general tips for painless source-control:

  • Make scripts "rerunnable". For DDL, I always use some variation of "if not exists... then /* do something */". For objects like procedures and triggers I use the pattern "if not exists, create <some-stub-object>; ... alter <real-object-code>". This emulates Oracle's "create or replace package" syntax which I always liked. The script works the very first time and is re-runnable.
  • Include copious "print" statements for easily seeing what happened or logging your rollout to a file.
  • At the top of files, include a statement that raises a severe error if you're connected to Master or something. Like this:

if (db_name() in ('master','model','msdb','tempdb','Northwind','pubs','AdventureWorks'))
    raiserror('Must not target a built-in or sample database.', 20, 1) with log

  • Create ad hoc batch files to apply handfuls of updates as needed during development. This especially helps coordinate with other developers during tight iterations where you don't want the "real build" to see the changes yet.
  • Use CodeSmith or some other tool to code-generate most of the cute little change-script stubs. I plug in TableName, ColumnName and Index Options, and get a nice little rollout script.

Some DB source-control anti-patterns:

  • Don’t deploy new databases by (a) restoring a backup or (b) attaching a whole DB set of files. Ick. You lose the ability to track edits over time and you bloat your source control immensely!
  • Don’t combine all schema and object changes into one huge monolithic script to update the version. You might have checkin history on that one file, but every change ever is in one file (needle in haystack problem—good luck finding the version when you messed up the tax-calculation logic!) and multiple developers working together is messy or impossible.
  • Don’t develop against a 'live' database then using some—hopefully robust—third party tool to magically figure out your changes for you. I've never seen this lazy approach work robustly. Usually more time is spent validating the changes than it would have taken to just write the change scripts by hand.


I hope to post more in the future on this, including some sample files. I'm interested to hear your thoughts on what works well for you.

Database, MS SQL Server ,