Only surrogate primary keys

Share this topic:



Link to this posting

Postby Ursego » 22 Feb 2013, 08:10

Each table must have a primary key consisting of one field which has one and only one meaning - to identify records uniquely, with no other real-life context. The only exception - tables, whose purpose is establishing relation between other tables: their PKs consist of fields which are Foreign Keys pointing to the related tables.

Obey this rule even if you are sure that the values, which could be used as a PK, are unique and required in the business. Such keys are named "natural keys".

ALL PRIMARY KEYS MUST SIMPLY BE NUMERATORS (1, 2, 3 ETC.) OR MNEMONIC CODES (ONLY IN CATALOG TABLES) HAVING NO OTHER CONTEXT BESIDES BEING RECORDS' UNIQUE IDENTIFIERS! Such keys are named "surrogate (synthetic) keys".

For example, you are creating an information system for a company where each worker has an employee number. Don't use that number as the employees table's PK! Instead, create a field like emp_id. In addition, you can create a field to store the existing employee number, and "hang" on it the NOT NULL and UNIQUE constraints (as well as for other "real-life" fields like Social Insurance Number). The database theory calls such fields "candidate keys" - they are like candidates to be elected as primary keys, but don't allow them to win the elections!

Many data architects don't realize which problems are waiting for developers if natural keys are used.

PROBLEM #1: combination of many fields (sometimes 5-8!!!) as a primary key. As a result, the developers are forced to:

1. Declare and pass between objects multiple variables or even whole structures/classes while ONE PRIMITIVE VARIABLE is enough to carry out the work - to identify the record.
2. Write SQL queries with extra-inflated, haircurling WHERE clauses having massifs of additional lines written only for the sake of joining tables. Important business conditions, restricting the retrieved/updated population, can simply be lost in that mess - bugs are welcome!

PROBLEM #2: the information system needs to be changed to support a change in the business. Examples of such a situation:

1. The company stops to manage employee numbers or changes them (for example, after a merge with another company).
2. Duplicate ID cards numbers found (that has happened in one of the countries I have worked in).
3. An entity was defined badly during the system analysis and the correct solution is found when a part of the system has already been developed.

In any of those cases, you will change the functionality (relationships between tables as well as database and GUI application's objects) spending time to re-build and re-test the system. But if you have used surrogate PKs then you don't need to perform so "black" work. You are OK because the primary key's meaning has not been changed: it was a primitive numerator yesterday, and it will stay it after years through any business changes!

I wonder why in developers' forums there are discussions which primary keys - surrogate or natural - should be used. We have always to think what will happen in the worst case, and I have described above what will happen if you use a natural key - a lot of head ache and more chances of bugs. And what will happen in the worst case if surrogate keys are used? In that case the table will have one extra field. Not a big trouble - we don't have the goal to save maximum disk space: these days we can leave in a restaurant more money than a hard disk costs.
User avatar
Ursego
Site Admin
 
Posts: 111
Joined: 19 Feb 2013, 20:33

Return to Ocean of Databases

Who is online

Users browsing this forum: No registered users and 1 guest


Only surrogate primary keys

Share this topic:


If you think that this site is not too bad, please LIKE it in Facebook. Thanks!





cron
free counters

eXTReMe Tracker