SQL Server fails to run after R2 upgrade from SP1

May 20 2010

Ah, the joys of testing Microsoft installations…

We recently got corrupted installs and an unstartable service after upgrading from SQL Server 2008 SP1 to the R2 release. The service itself wasn’t started after the upgrade completed and attempting to start it resulted in these errors in the Windows event log:

Script level upgrade for database 'master' failed because upgrade step 'sqlagent100_msdb_upgrade.sql' encountered error 598, state 1, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

And

Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Both with the ever so helpful link to http://go.microsoft.com/fwlink/events.asp, possibly the most useless link on the entire interweb.

Calling upon Lord Google to help turned up lots of red herrings, mostly about cases where people have renamed the SA built in account. These didn’t apply to our situation.

Searching for “sqlagent100_msdb_upgrade 598” turned up (as of this morning) only about 3 hits, none in English. The powers of Google again rescued with a translation of this page from Portuguese to English for me.

The Problem…

Turns out one of the upgrade scripts will fail to run if the default data directory set for the SQL Instance doesn’t actually exist at the time of upgrade.

What the …?

How could you get into this predicament? you may ask.

Easy:

  1. Specify a specific default data folder during the previous installation.
  2. Have this be different than where SQL Server puts its own databases.
  3. Never actually use it.

#3 is easy especially if you code explicit paths for your user databases when you create them, a good practice for production work.

With this recipe, the default data folder never gets created.

Along comes the magic of SQL Server 2008 R2’s upgrade logic which doesn’t bother checking if the folder exists. At some point during the upgrade it tries writing to this folder and … predictably fails.

Miserably.

It leaves SQL unable to run and the upgrade in a partial state. Even attempting to start the service like this “net start MSSQLSERVER /f /T3608” so it skips the upgrade scripts doesn’t put it in a responsive state. Hosification.

The Solution…

Easy. Find where it thinks this directory is and create it prior to starting the upgrade.

In SSMS, go to the Server node, right-click, select Properties. In the Properties dialog select the “Database Settings” page and look for the “Database default locations” near the bottom.

Or, run this script:

declare @SmoDefaultFile nvarchar(512)

exec master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'DefaultData', @SmoDefaultFile OUTPUT

declare @SmoDefaultLog nvarchar(512)

exec master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'DefaultLog', @SmoDefaultLog OUTPUT

SELECT

ISNULL(@SmoDefaultFile,N'') AS [DefaultFile],

ISNULL(@SmoDefaultLog,N'') AS [DefaultLog]

Then create that data directory and the upgrade should work.

Also Note: I found you can create the directory also after a failed upgrade. The next time the Service starts it will complete the previously failed scripts and be back and running.

MS SQL Server ,

Enabling SQL Server 2008 TCP/IP connections via script

Mar 10 2010

As part of allowing users to connect to SQL Server remotely we often need to enable TCP/IP connections which are off by default in Express and other scenarios. While it’s simple to use SQL Server’s “Configuration Manager” to enable or disable certain protocols, sometimes we need to do things in a script, such as for an installer.

This solution uses the SQL PowerShell which is installed automatically with SQL Server 2008. I show how to do it directly or from a wrapper script such as VBScript or DOS.

Enabling TCP/IP through PowerShell

The basic PowerShell (“PS”) commands for doing this are:

$MachineObject = new-object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') .

$ProtocolUri = "ManagedComputer[@Name='" + (get-item env:\computername).Value + "']/ServerInstance[@Name='instance-name']/ServerProtocol"

$tcp = $MachineObject.getsmoobject($ProtocolUri + "[@Name='Tcp']")

$tcp.IsEnabled = $true

$tcp.alter()

Where “instance-name” is the name of the SQL Server instance to make this adjustment for.

This can be used to set up other protocols too, like Named Pipes by making the appropriate changes to the $ProtocolUri variable. Google around for these commands for the proper syntax.

Scripting the Change

The real challenge I ran into using this was that it needed to be run from a DOS batch file, not directly from within PowerShell.

Step #1 – invoking a PowerShell script from the command line.

Turns out powershell.exe expects the next text to be actual PS commands, not filenames such as for scripts. So to invoke the executable and have it execute a saved script, you need to use syntax like this:

powershell.exe -noexit &'c:\scripts\my powershell script.ps1'

  • “&” tells it the filename may contain spaces.
  • -noexit tells PowerShell not to exit without running the commands in the next file. (Intuitive, I know.)

Of course you can do this through normal VBScript shell commands and other techniques as well:


Set objShell = CreateObject("Wscript.Shell")
objShell.Run("powershell.exe -noexit &'c:\scripts\my powershell script.ps1'")

If you want commands like this to invoke with annoying users, check out Scott Hanselman’s blog entry on using a third-party utility library for running scripts in hidden windows.

Step #2 – running from DOS.

What I really wanted was a single line that could be run from within a DOS style batch file without referencing separate scripts. Turns out, we can.

  • Multiple PS commands can be combined in one line, separated with semi-colons (;).
  • We use single quote marks (') instead of doubles because the whole command needs to be a single argument to the PowerShell.exe invocation. In the case of nested quotes, escape the singles with 2-single quotes (''). Yeah, this can get a bit messy.

Here’s the first 5-line PowerShell script invoked by a single command from DOS. (This is all one line.)


"C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLPS.exe" "$MachineObject = new-object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') .;$ProtocolUri = 'ManagedComputer[@Name=''' + (get-item env:\computername).Value + ''']/ServerInstance[@Name=''instance-name'']/ServerProtocol';$tcp = $MachineObject.getsmoobject($ProtocolUri + '[@Name=''Tcp'']');$tcp.IsEnabled = $true;$tcp.alter();"

 

Voila.

MS SQL Server ,

Portland SQL Saturday Presentation

Jun 5 2009

sql_saturday_logo The presentation given yesterday at SQL Saturday Portland is up on the download page. Or directly at this link.

MS SQL Server, Database Design, Presentations , , ,

SQL Server 2008 Table-Valued Parameters and NULL

May 4 2009

So it turns out table-valued parameters can be optional with no problem. But it’s non-standard syntax. Normally, you’d have this:

create procedure myProc @Param int = null

The "= null" tells us it’s optional.

With table-valued ones, you have to specify “readonly” in any case (not related to optionality) and they are always also optional ... without specifying "= null". Like this:

create procedure myProc @TblValParam MyTableType readonly

This works and the @TblValParam does not have to be referenced during the call. It seems, conceptually, that the “table value” is always implicitly there, but if not passed in will just contain zero rows. You can still "select *" from it and get the header. Makes sense.

Treating it like a normal parameter, like this:

create procedure myProc @TblValParam MyTableType = null readonly

... gets you this fun error:

   Operand type clash: void type is incompatible with MyTableType

Fun.

I love when Microsoft products are internally inconsistent. (And by "love" I mean “hate with a passion.”)

MS SQL Server

Cleaned Up 600 MB in One Command

May 4 2009

I know, this sounds like the start of a bad weight-loss ad. Well, except for me not earning a commission, it’s sorta like that…

In my adventures yesterday trying to free up some room, I found a couple of interesting thing I thought I’d share.

1. MS has a little utility, the “Windows Installer CleanUp” app which lets you remove old defunct ‘uninstalls’ from the Add/Remove. Unlike most things that just delete the Registry entry for it, this one deletes any garbage left over from it (the $PatchCache$ and such folders), which can actually recover some space.

You can grab it here: http://support.microsoft.com/kb/290301

2. If you go to where you install it, you’ll also see a command-line utility, “MsiZap.exe” which it uses under the hood. This utility is cool in its own right. (You can google it for more detailed description.) One thing it can do is auto-delete any “orphaned” uninstalls with all their garbage. To do that, you run it with the “G!” switches:

MsiZap.exe G!

That step alone cleared up about 500-600 MB off my laptop.

Enjoy.

SQL Saturday Presentation on XML and Relational Databases

Oct 6 2008

What fun it was presenting at SQL Saturday #5. Officially the first time I've presented at a technical conference. We had a great turnout and the event planners did a great job putting the whole thing together. Thanks to those who attended and for the great questions you asked!

Grab the slides and code for the presentation here.

(If link broken, just navigate over to the download page.)

MS SQL Server, Database Design, Presentations , , ,

Testing for Failure in Database Unit Tests

May 28 2008

(Using Visual Studio Team Suite edition for Database Professionals. A.k.a. “data-dude” because someone at Microsoft forgot to drink their creative juice that morning.)

Although I enjoy the unit-testing features of Visual Studio "Database Professional" edition (or whatever they're calling it these days), it is a ghastly oversight that you can't easily test for a procedure producing a specific expected error result.

Instead, if the database code itself experiences an error, the whole test blows chunks—that is, it fails. No matter if it got exactly the error you expected it to. I concede it's (all too) common to have little/poor/no error handling in stored procedures—but come on Microsoft! Let's encourage better practices.

I would expect to have one of the "Test Conditions" be for checking that a specific error code was produced. This would be about the 2nd most useful one (after checking for a Scalar value, which handles just about everything anyway).

(Yeah, I know, it's "extensible" and I'm free to spend a bunch of time developing my own Test Condition and plugging in it and sharing that code with everyone on the planet. But seriously? Who buys this latest MS excuse for not thinking about how to build what the community needs in the first place?)

<sigh>

So an aha! moment hit the other night which led me to this simple workaround. Let's use the trusty Scalar Value test condition combined with a TRY/CATCH block within the SQL code to test itself. Like this:

begin try
    exec @rc = dbo.SomeProcedure @Param='some value';
    select @@error as ErrorNbr;
end try

begin catch
    select @@error as ErrorNbr;
end catch

Then, we just wire up a Scalar Value to test the 1st column / 1st row for the specific Error Number you expect this to generate.

(Don't get me started on using ordinal row/column numbers for this—wow, MS has completely lost sight of what relational DBs are supposed to be!)

On my current project, we're returning a specifically formatted XML string from database procedure errors, which include a number of fixed attributes we then parse on the client. Works great. But it has a side effect of all custom-raised errors being #50000. So in this case I'm using a slightly more complex way of testing for expected errors in the CATCH part:

begin try
    exec @rc = dbo.SomeProcedure @Param='some value';
    select @@error as ErrorNbr;
end try

begin catch
    declare @errmsg xml; set @errmsg = cast(error_message() as xml);
    select @errmsg.value('(//@ErrorNbr)[1]', 'int') as ErrorNbr
end catch

Happy error hunting!

Software Engineering, MS SQL Server ,

Cast ‘True’ as Bit

May 7 2008

Okay, I'm a bit slow on the uptake. It took the SQLServerCentral.com question of the day to point this out to me...

Apparently MS made the values "true" and "false" implicitly convertible to BIT (as 1 and 0 respectively) starting with SQL 2005. For many years I've defined variables in stored procedure scripts that emulate the constants TRUE and FALSE... Something like this:

declare @TRUE bit; set @TRUE = 1;
declare @FALSE bit; set @FALSE = 0;

But now, instead, I can just directly reference 'true' or 'false' where necessary, such as in a where clause. For example:

declare @t table (ID int not null, IsFlagged bit not null);
insert @t
      select 1, 0
union select 2, 1
union select 3, 1
;
select count(*) from @t where IsFlagged = 'true';

Or, more plainly:

select cast('true' as bit) as Result;

Of course, this isn't true Boolean type support. And don't get me started on the truly relational equivalents of TRUE and FALSE. But at least it's a step closer. Certainly less typing.

Also note: it does not work to convert directly to INT types though, only the BIT type. Even though BITs are theoretically a sub-type of the general Integer variable class.

MS SQL Server

Reading Recommendations to Accompany “Relational Principles” Training

Feb 5 2008

This is an ordered list (not a relation!) of my reading recommendations that emphasize relational fundamentals and building upon that to create great database models and implementations. Many other specific recommendations are scattered throughout the Training Slides (in the notes), for those who have copies of them.

1. C.J. Date, Database In Depth: Relational Theory for Practitioners. O'Reilly Media: 2005. ISBN: 0-596-10012-4. (Amazon link.)

This should be required reading for anyone who is a database professional. This really is a "greatest hits" of the core concepts Date has been espousing for years and which too few people really understand. If you read only one "theory" book, make it this one!

2. C.J. Date, "What First Normal Form Really Means." (Article)
You can get this directly (for a fee) via www.dbdebunk.com, or as Chapter 8 in the recently released compilation of Chris's writings (see below, #11).

This has given me more comprehension, per page of reading, about the relational model than any other thing I have read. It is so much easier to understand this one concept deeply, once, than to read and re-read the pithy two-paragraph spouting found in most modeling books about "eliminating repeating groups."

3. Graeme C. Simsion and Graham C. Witt, Data Modeling Essentials (3rd ed.). Morgan Kaufmann: 2004. ISBN: 0-12-644551-6. (Amazon link.)

  This is my favorite data modeling book and has a very pragmatic balance over theory for theory's sake. It also does an excellent job explaining the more advanced normalization issues. (Note, it does not include anything on 6NF.)
  If you have more a coding bias in your database work, I'd put Code Complete (next) higher on the list than this.

4. Steve McConnell, Code Complete (2nd ed.). Microsoft Press: 2004. ISBN: 0-7356-19670. (Amazon link.)

  Simply one of the best software engineering books ever published. This will help you become a better programmer no matter what language you use.
  If you have more a modeling/design bias in your database work, I'd put Data Modeling Essentials (above) higher on your list.

5. Joe Celko, SQL For Smarties: Advanced SQL Programming (3rd ed.). Morgan-Kaufmann: 2005. ISBN: 0123693799. (Amazon link.)

  Celko is the main guy for coding great SQL and solving problems the "SQL way" that might not be otherwise intuitive. I recommend having and referencing 2 or 3 of his books as needed.
  Note, however, he is no expert at pure relational ideas and often contradicts what I think is a mor academically-correct approach. Read Date for "what's right" and read Celko for "how to get the job done."

6. Stuart R. Faulk, "Software Requirements: A Tutorial." In Software Requirements Engineering, 2nd ed., 1997. (Amazon link for the whole book, or just the article can be found here and other places (just do a search).)

Not only an excellent summary of the Requirements problem as a whole, but an excellent article on how to design work-products (like requirements or models) to be useful to the audience.

7. Robin Williams, The Non-Designer's Design Book, 3rd ed. Peachpit Press: 2008. ISBN: 0-321-19385-7. (Not that Robin Williams!) (Amazon link.)

Your models are only as good as they communicate. This book is a great primer for generally making designed things (documents in particular) good without going through college as a graphic designer. I think anyone who produces materials for others to consume should read this. It's pretty short and fun.

8. Hugh Darwin, "The Askew Wall" (presentation). (Available here.)

Darwin walks through just some of the most egregious problems with the SQL language. Covers both non-relational aspects of SQL as well as just language design issues.


The rest of the list . . .

. . . are better for a team reference library and shared as needed. Most are a read-once, or an occasional reference, but very useful.

9. C.J. Date, The Database Relational Model: A Retrospective Review and Analysis. Addison Wesley: 2001. ISBN: 0-201-61294-1. (Amazon link.)

Reading this short book is probably more useful than reading Codd's original papers. Chris explains those two seminal papers and comments on areas that have changed (few!) and how they stand up to today's criticisms.

10. C.J. Date, An Introduction to Database Systems, 8th ed. Addison-Wesley: 2004. ISBN: 0321197844. (Amazon link.)

This is a de-facto text book for advanced undergraduate or graduate level database courses and is really what put Chris Date "on the map" for his expertise and clear writing. The 8th edition is worth having if for no other reason than Appendix A which shows by example some of why the "Transrelational Model" could be so revolutionary in implementing truly relational products.

11. C.J. Date, Date on Database: Writings 2000-2006. Apress: 2006. ISBN: 1-59059-746-X. (Amazon link.)

  Chris's shorter writings (articles, whitepapers) become available occasionally in collections like this one. Buying this one book gives you "What First Normal Form Really Means" (#2 above) in Chapter 8, and some great retrospective on Ted Codd in Chapter 1, among many other great topics.
  After purchase, you can spend an extra $10 to get the downloadable PDF version of the book from Apress.

12. Len Silverston, The Data Model Resource Book. (2 volumes.) Wiley: 2001. ISBN (vol. 1): 0-471-38023-7, (vol. 2): 0-471-35378-5. (Amazon links to Volume 1 and Volume 2.)

Len's is the best compilation of ready-made database designs covering various common business scenarios (Volume 1) and some specific industries (Volume 2). I don't advocate using the designs verbatim, but they are a great reference and idea source. It's useful to cross-check your own designs with ones this veteran of the field has assembled as a sanity check.

Database, Database Design, Software Engineering , , ,

Intro to Source-Control for SQL Server Development

Mar 9 2007

[Minor updates 2009-07-16 during re-posting.]

It seems developers coming from a “compiled code” background into database management struggle at first with how to version-control the database artifacts. Turns out, it isn’t really that much different. The first step is to reframe the question: “How do I version control the scripts that create my database?”

Since my good friend Chris recently asked me to remind him how we did it at a company where we worked together, I thought I’d write up a poor-man’s starter way of doing source control for database development. This works no matter what development tool or source-control system you use.

First, organize your source files:

  • One file for bootstrapping. This includes "create database" and any other necessary artifacts like filegroups, custom roles, etc.
  • One file per “object” like procedures, views, triggers, functions. Anything you’d want to have an independent log of changes on. For me this is usually everything except "schema" changes, by which I mean Tables with their associated constraints and indexes.
  • One file per "schema version update." I tend to roll together any table/index updates with whatever DML is needed to transform and move the data into the new schema. I keep a "Version" table in the database itself (created during bootstrap) that can track these updates.
  • Then, once per version update, create a little *.bat or *.vbs file that combines the necessary files in the right order for deployment.

This general strategy has worked well regardless of the source-control system I'm using.

(Aside: I recently started a little .NET utility to make this pattern easier to apply; it relies an a single XML manifest file listing the scripts in dependency order each with an attribute of which 'version' they were last modified in. The DB has a 'version' table, and provides a 2-proc interface for "GetCurrentVersion" and "UpdateVersion", then the *.exe can determine which scripts to run in which order. Updating objects in source is relatively painless because they can be checked in (n) times during iterations and the manifest is only updated when I'm confident enough in it to include in the official build/release/rollout.)

If you use the full Team Suite editions of Visual Studio, I'd also recommend Visual Studio for Database Professionals, which automates shredding the DB structure into a script&local-only environment for editing and builds deployment scripts for you. My only gripes with it are (a) it obviously doesn't handle any custom data transforms you might need during a schema update; (b) it shreds the DB a bit too granularly for my tastes -- each column, each index, each constraint, etc.

Some general tips for painless source-control:

  • Make scripts "rerunnable". For DDL, I always use some variation of "if not exists... then /* do something */". For objects like procedures and triggers I use the pattern "if not exists, create <some-stub-object>; ... alter <real-object-code>". This emulates Oracle's "create or replace package" syntax which I always liked. The script works the very first time and is re-runnable.
  • Include copious "print" statements for easily seeing what happened or logging your rollout to a file.
  • At the top of files, include a statement that raises a severe error if you're connected to Master or something. Like this:

if (db_name() in ('master','model','msdb','tempdb','Northwind','pubs','AdventureWorks'))
    raiserror('Must not target a built-in or sample database.', 20, 1) with log

  • Create ad hoc batch files to apply handfuls of updates as needed during development. This especially helps coordinate with other developers during tight iterations where you don't want the "real build" to see the changes yet.
  • Use CodeSmith or some other tool to code-generate most of the cute little change-script stubs. I plug in TableName, ColumnName and Index Options, and get a nice little rollout script.

Some DB source-control anti-patterns:

  • Don’t deploy new databases by (a) restoring a backup or (b) attaching a whole DB set of files. Ick. You lose the ability to track edits over time and you bloat your source control immensely!
  • Don’t combine all schema and object changes into one huge monolithic script to update the version. You might have checkin history on that one file, but every change ever is in one file (needle in haystack problem—good luck finding the version when you messed up the tax-calculation logic!) and multiple developers working together is messy or impossible.
  • Don’t develop against a 'live' database then using some—hopefully robust—third party tool to magically figure out your changes for you. I've never seen this lazy approach work robustly. Usually more time is spent validating the changes than it would have taken to just write the change scripts by hand.


I hope to post more in the future on this, including some sample files. I'm interested to hear your thoughts on what works well for you.

Database, MS SQL Server ,

Eric’s Axioms of Data Management #1

Nov 6 2006

If supporting foreseeable business changes requires adding tables or columns to your database, the design is deficient.

Over the years I have learned many principles regarding database design. Some I discovered on my own, usually through my own mistakes and pain. Some passed to me through a mentor or seminar. But the very first of all these is still in my Top Ten list of database design ideas:

If supporting foreseeable business changes requires changing the structure of your database, you have a problem with the design.

The database design should not have to change when obvious expected things happen. Things like adding a new customer, or updating a product's price, or . . . well, August.

This nugget came to me from a coworker in my first position where I dealt with databases although I don't think he realized it at the time. I had recently migrated an Excel-centric department reporting application over to Microsoft Access 2.0. I gained a bit of notoriety in the division as having some expertise with Microsoft Office in general and others frequently asked for tips or help troubleshooting.

So a co-worker asks me one day, "Eric, is there a limit to the number of tables you can have in a UNION query?"

"Um..."

I didn't know! At first I was somewhat flustered that I didn't know. But then I also couldn't think of a scenario where it would matter too much. So I responded:

"Why do you ask?"

He drew my attention to his monitor where he pointed at the tables in his database, a work-in-progress. They were:

MonthlyResults_Jun94
MonthlyResults_Jul94
MonthlyResults_Aug94
MonthlyResults_Sep94

As dense as I can be, I could even see where this was going.

So I introduced him to my friend, the "date" type and showed how, with a simple yet magical column called "EffectiveMonth," one big table can actually store all these results and you can just flag each one by what month the results were for.

"Oh." He paused, bewildered. "Wow, that's neat!"

Indeed it is.

I have since come to believe this principle is one of the clearest marks of good database design. Or good software design in general. Think about likely changes and build a structure that won't make the obvious changes a nightmare to handle.

A "senior" developer I worked with more recently in my career was fond of saying, "You can't anticipate future requirements." I argue instead: You must anticipate future requirements if you call yourself a professional. Otherwise, you are just a typist who knows some syntax.

(As an aside, check out what has to be the worst example of this I have seen to date.)

Database Design , ,

Microsoft Word is a Legal Document

Feb 25 2005

To say "Access is a relational database" is as incorrect as saying, "Word is a legal document." We need to be more clear than this as data professionals.

Terminology confusion abounds in the software industry and I believe the worst examples are in database management in particular. So I shouldn't have been surprised to read the nonsense I discuss here... but still I was. The following quote opens up edition 6.14 of Woody’s Access Watch, a normally fine e-newsletter focused on building Microsoft Access solutions:

Access is a normalized database (except for purists who don’t count a database as normalized unless it adheres to all 13 of the Codd & Date rules), and that means that you should store linked data in different tables.

If you don’t reel from an affront to your sensibilities from that one sentence, you should. This is no different, to me, than saying:

Microsoft Word is a legal document.

Sounds silly, right?

But did the Access Watch quote sound as odd to you? It should. It reveals many misunderstandings about database technology and practices and is confused and muddled, something we of all people should avoid.

Ignoring the poor grammar, here are the technical reasons this WOW statement should offend you as a technology professional:

 

1. Microsoft Access is not a database.

Read that again. Access is not a database. A database is an organized collection of information accessible for some purpose. Is Access an organized collection of information? No! It is a piece of software. Many people are far too cavalier with the word “database.” Database professionals—those in our society who are responsible for (and presumably trained for) organizing and classifying the world’s information—should be the least so.

Fundamentally, Access is a “database management system” or “DBMS” for short. You could call it “database software” or even, loosely, a “database system.” But it not a database. “Northwind” and “pubs” are databases. IMDB.com is (largely) a database. The card catalog at the library is a database. But Microsoft Access is a DBMS.

Calling Access a database is like calling Microsoft Word a “document.”

 

2. Nothing guarantees that any particular database implemented within Access will be normalized or not normalized . . .

. . . any more than using Word to write documents guarantees they will be legally enforceable documents.

In my experience databases implemented with Access frequently violate all sorts of normalization rules. One recent example I saw had a single column, “[City, State]” (yes, including the comma and space!). This clearly violates first normal form (“1NF”).

And don’t mistake this for an Access limitation or problem. No DBMS forces you to follow good database design practices. I have seen equally good and bad designs in Access, Oracle, and SQL Server. (Okay, the more advanced tools are somewhat more likely to be used by better trained professionals. But the edge is slight.)

To think of Access as a “normalized” database tool or even a “relational” DBMS would be akin to thinking of Word as a tool that automatically produced legally enforceable documents regardless of what you typed.

Being an expert at Access (or any database tool) no more qualifies you to implement correct, high quality database solutions than being a Word expert would qualify you to be a Pulitzer prize-winning author. It helps get the job done faster, but you still have to know how to do the job. (Most hiring managers do not understand this!)

 

3. Normalization does not mean adhering to Codd’s 12 rules.

Note: the guidelines are normally listed as "12" rules and were proposed by Ted Codd (not "and Date"). They are much quoted, including of course in Date's standard database text. See this site for a quick summary. Since there is a "Rule Zero," it would be fine to say the "13 Rules," but most texts do not call them such.

Codd’s 12 rules for relational database systems were geared toward those hoping to implement a relational DBMS and, as such, dictate certain behaviors of those software systems.

Normalization, in stark contrast, applies to the table-level design of one particular database.

See why it’s so important to be clear about “database” v. “DBMS”? I hope so.

(As an aside, being fully “normalized” also does not mean being “relational.” Adhering to 1NF is required. Beyond that they are simply design recommendations. See C.J. Date’s excellent article “What First Normal Form Really Means” for details.)

 

4. Normalization does not per se mean storing data in different tables and then linking back together.

Granted, such a design is usually a consequence of a normalized design. But normalization rules are strictly about removing certain types of redundancy from a certain table. Usually this is accomplished by decomposing the table in question into two or more related tables, for example splitting out a “Children” column into a separate “Child” table containing one row per child. But it’s important to note the difference.

It is especially important to note that just because you have separated data tables and relationships between them does not mean you have a normalized database or a relational database.

Legal jargon is probably necessary to produce a legal document, but just a glut of legalese scattered in my document does not mean it will be legally enforceable.

But possibly the biggest problem with the Access Watch quote is that it comes from a supposed industry “expert.” Unfortunately such ignorance of our field is rampant. You must be careful and pay attention to the fundamentals.

Keep in mind:

If you do not know the history of your own profession you can never hope to be but an amateur.

—anonymous

 

Note: For much deeper and longer treatment of many of the industry's database confusions, see Fabian Pascal's site, www.dbdebunk.com. If you regularly work with databases, you owe it to yourself to be familiar with his and Chris Date’s writings available at this site.

Data Management , ,