Tech Tip: The concept of atomicity in table design

PRODUCT: 4D | VERSION: 6.8 | PLATFORM: Mac & Win

Published On: June 13, 2002

Summary: Tables must be designed following the principle that the data in a record is decomposed into individual facts, each of which is stored in its own field, with each field having only one fact.

One of the fundamental concepts of relational table design is that tables must be designed so that the individual data values entered in fields will be atomic. Another way of stating this is that tables must be designed so that fields will contain a single fact.

This statement appears to be simple enough, so simple that it might seem that no further explanation is necessary. However, the simplicity of this statement is somewhat misleading. Complexity and uncertainty enter the picture because for each field we must determine what actually constitutes a single fact. The database literature defines a fact as "the smallest semantic unit of data," in other words, a fact is the smallest unit of data that has meaning. What at first may seem obvious and simple becomes uncertain because what is the smallest unit of data that has meaning is defined by the context of the business or activity for which you are designing your database.

Let's see what's actually meant by the phrase "smallest unit of data that has meaning." To illustrate this, let's use the activity of reading and the phrase "unit of data that has meaning" as an example. If I take the word "meaning" and consider each letter individually, the letters themselves carry no meaning. If I consider the letters "m-e-a-n-i-n-g" as a group, or word, the letters that individually had no meaning, will carry a meaning. So the smallest unit of data that has a meaning, for the activity of reading, is the individual word. And while the words can be broken down into component letters, the components are not the smallest unit of data that has meaning for the activity of reading.

However, if my activity was that I was searching this document in order to count every instance of use of the letter 'e', then the smallest unit of meaning would be the individual letter.

Conversely, if my activity was that I was searching this document in order to count every instance of the use of the phrase "unit of data that has meaning," then for the activity of that particular search, the smallest unit of meaning would not be the single letter, or even the individual words, but the entire phrase.

So, what constitutes the smallest unit of data that has meaning is determined by the context /activity.
Equivalent reasoning applies to why the word "atomic' is used to describe a data value that is equal to a single fact. An atom is a single unit, or element of matter. Atoms are composed of protons, electrons and possibly neutrons. The makeup of an atom determines what kind of matter the atom is; in other words, it's "meaning" as an element. So, an atom composed of one proton and one electron has the "meaning" Hydrogen. But if you consider the component proton or electron of that atom separately, they do not have the meaning "Hydrogen." So an atom is the smallest unit that has meaning as an element, but when it is broken down into its component parts, that meaning is lost. Conversely, atoms can be combined to form molecules, but for the purpose of considering elemental building blocks of matter, a molecule would not be the smallest unit of data that has a meaning, since a molecule can be broken down into component atoms. Hence the use of the word "atomic" to describe a data value that is a single value that cannot be broken down further (decomposed) into other meaningful values. In other words, the field contains a single fact (a single unit of data that has meaning), not a combination of facts.

Now, let's apply these concepts to conceptualizing storing data in a table of a database, specifically, storing street addresses.

In a contacts database, it is customary for the building number and street name to be stored in one field. This makes sense because for the purpose of the activity of storing contact information, the combination of a building number and a street name has the meaning of being a street address -- it is a single fact. Therefore in such a database, this data value would be "atomic," that is to say it would be the smallest unit of data that retained meaning as a street address.

Now let's say that you were storing data for a mapping program. In that case, you would probably want to store the building number and the street name in separate fields because each of these would be separate meaningful facts for a mapping program.

From this example we can see that what constitutes the smallest unit of meaning, depends upon the use or activity for which the database is designed. Obviously, taking nothing for granted and clarifying the details of the activities that people engaged in are critical, for what constitutes a fact is determined by the way that people will be using the information.

Remember, tables must be designed following the principle that the data to be stored in a record is decomposed into individual facts, each of which is stored in its own field, with each field having only one fact.