KNOWLEDGE BASE
Log In    |    Knowledge Base    |    4D Home
Tech Tip: How does 4D optimize Set storage?
PRODUCT: 4D | VERSION: 13.1 | PLATFORM: Mac & Win
Published On: September 12, 2012

Please note: the format of set storage in 4D changed in 4D v11 SQL. In other words this information applies to 4D v11 SQL and newer.

Conceptually sets can be thought of as an array, with each "bit" of the array corresponding to a record in the table. If the bit is "1" the record is in the set. If it is "0", the record is not in the set.



In practice this design is very poor for several reasons:

  • Requires contiguous memory. The larger the set (i.e. the greater the number of records) the larger the contiguous chunk of memory must be.
  • Modern CPU's are not bit-aligned. They are designed to access data in fixed widths (called a "word"). It's much more efficient to access a "word" than it is to access individual bits.
  • Inefficient if most of the records are in, or not in, the set.

4D Sets are stored internally as an object. It can still be imagined as a kind of array, but each element represents 32,768 records. The element value can mean one of three things:

  • If all 32,768 records for the current element are in the set the element simply indicates these are "all in".
  • If all 32,768 records for the current element are not in the set the element indicates these are "all not in".
  • If the element represents some records that are in the set and some that are not, it contains the address of a "page" in memory. A page is 4096 bytes (or 32,768 bits). Each bit in the page represents the "in" or "out" status of the corresponding record.



This diagram is in fact too simple. Not only do the elemnts of the set point to pages but, in fact, the set is also broken up in the same way. A set then is not really an array but more a tree; each node is either an "element" as described above, or points to another node of the set.

This design is extremely efficient:

  • It compresses the set without a performance penalty for large amounts of records either in or not in the set.
  • It is easy to accomodate in RAM because it does not require large contiguous chunks of memory.

See Also: