Calgary Indexes

Calgary database and index loads will generally take full advantage of the hardware they are created with.  A minimum of 2 CPU Cores is recommended for building indexes, where 4 or more CPUs are considered optimal.

Normal Indexes

Strings that are indexed are limited to somewhat under 8192 characters.  Less than 100 or so characters would be highly recommended.  The index will fail to build if there is a string longer than that.  Shorter fixed length strings are recommended.

NULL values are allowed in indexes.  By definition, there is only 1.  

There are two different modes for creating non spatial indexes:

  • High Selectivity Index. This is used when the values are mostly unique.  Fuzzy Match Key, Full Name, etc...  Bool, Byte & Int16 do not support the high selectivity index because they can't have that many possible values.
  • Low Selectivity Index:  This is use when the values each contain lots of records. State, Gender Code, ZIP Code...
  • There is also an auto mode which starts building both types and then after 1,000,000 non-null records throws out the larger of the 2. The auto can guess wrong if the nature of the data changes significantly from the 1st million records.

Field Types and Indexes:

  • Int32 & Int64 fields have a special case index that compresses the integer portion of the index. They should be very efficient for storage.
  • Fixed Decimal will be converted to Int64 with the appropriate multiplication and treated as an Int index. High numbers of decimal places is not recommended and more than 6 is not supported. Also Fixed Decimals longer than 20 characters are not supported.
  • Bool, Byte, Int16, Float, and Double all use the same underlying index structure. An index built on a double field has a very good chance of being very large. Consider using a float instead, or even an integer if you can figure out how to convert.
  • String, WString, V_String, V_WString, Date, Time, and DateTime are indexed as String. Wide Strings are converted via UTF-8 before building, hence they are very efficient for storage if there only occasional wide characters, but not as efficient as could be if they are all wide characters, like Chinese or something.
  • Blob: Indexing is not supported for Blobs.

Spatial Indexes

This index is very different than all the others internally.

Only point level indexes are supported. Other types will be converted to a point via centroid.

NULL and 0,0 points are dropped. There is no way to query on NULL values in the spatial index.

The Calgary spatial index uses 5 decimal places of accuracy for compression and speed.  The YXDB spatial index uses 6 decimal places.  This adds an addition round off error of up to a maximum of 1.8 feet to Calgary indexes.  In other words, it is possible that a point can be 1.8 feet inside of a polygon and be found as outside in Calgary.

Like any index type, you can have multiple spatial indexes associated with each record.  An example of this would be a residential address & a work address so you could query geographically for daytime or nighttime pop.  These could be combined in a query with an OR so you could get all the people that live or work within your trade area.

Advanced Indexes

Advanced  indexes get loaded via the CalgaryAdvancedIndex tool and the load is separate from the regular data load/index build.  If you wanted to combine advanced index loads with your regular index build, it would be recommended to select only the fields you require for the advanced index build and then use a Block Until Done Tool such that you are not trying to build these indexes simultaneously with the other Calgary Data Load.

Full Text (Contains)

This is a special index that searches anywhere in the field as opposed to the whole field or starting at the beginning.  By necessity, the index is much bigger and takes longer to load than any other index type, but the query speed is nearly as fast as a regular whole field query.  When querying with a Contains style index, type='range' and allowPartial='T' are not valid.

Because of the nature of this index, it is not recommended for really long fields if you have a lot of records 50 chars & 100M records would be good approximate limits.  With more records, you might not want to index such a large field.  It has an absolute limit of 255 characters fields.

This is a string only index.  It works similarly well for narrow strings as wide strings, but wide strings will result in a larger slower index.

Raw Indexes

This is a special build mode for standard indexes.  With this mode, you can specify the field value and the record ID, which must be created by hand as a 0 based int32).  This is handy for creating indexes where 1 record might be indexed with multiple values.  

For example, if you had a file that has fields for Person #1 through Person #6, you could create 1 index that contained all the names from all the fields such that you could search on a single name and then you would be able to query by this index for a name and find the records whichever fields they are in.  It can also be useful for creating indexes on fields that are not in the Calgary Database, like Fuzzy Match keys. See Fuzzy Match Edit Match Options .  Any type of index is supported, although SpatialObj probably does not have a valid use case.

See Building a Raw Index in Calgary Loader.