Calgary database and index loads generally take full advantage of the hardware they are created with. A minimum of 2 CPU Cores is recommended for building indexes, where 4 or more CPUs are considered optimal.
Strings that are indexed are limited to somewhat under 8192 characters. Less than 100 or so characters are highly recommended. The index fails to build if there is a string longer than that. Shorter fixed-length strings are recommended.
NULL values are allowed in indexes. By definition, there is only 1.
There are 2 different modes for creating non-spatial indexes:
High Selectivity Index. This is used when the values are mostly unique. Fuzzy Match Key, Full Name, etc...Bool, Byte, and Int16 do not support the high selectivity index because they can't have that many possible values.
Low Selectivity Index: This is used when the values each contain lots of records. State, Gender Code, ZIP Code.
There is also an auto mode that starts building both types and after 1,000,000 non-null records, it throws out the larger of the 2. The auto mode can guess wrong if the nature of the data changes significantly from the 1st million records.
Field Types and Indexes:
Int32 and Int64 fields have a special case index that compresses the integer portion of the index. They should be very efficient for storage.
Fixed Decimal is converted to Int64 with the appropriate multiplication and treated as an Int index. High numbers of decimal places are not recommended and more than 6 is not supported. Also, Fixed Decimals longer than 20 characters are not supported.
Bool, Byte, Int16, Float, and Double all use the same underlying index structure. An index built on a double field has a very good chance of being very large. Consider a float instead, or even an integer if you can figure out how to convert.
String, WString, V_String, V_WString, Date, Time, and DateTime are indexed as String. Wide Strings are converted via UTF-8 before building, hence they are very efficient for storage if there are only occasional wide characters, but not as efficient as could be if they are all wide characters, like Chinese or something similar.
Blob: Indexing is not supported for Blobs.
This index is very different than all the others internally.
Only point-level indexes are supported. Other types are converted to a point via centroid.
NULL and 0,0 points are dropped. There is no way to query on NULL values in the spatial index.
The Calgary spatial index uses 5 decimal places of accuracy for compression and speed. The YXDB spatial index uses 6 decimal places. This adds an additional round-off error of up to a maximum of 1.8 feet to Calgary indexes. In other words, it is possible that a point can be 1.8 feet inside of a polygon and be found outside in Calgary.
Like any index type, you can have multiple spatial indexes associated with each record. An example of this would be a residential address and a work address so you could query geographically for daytime or nighttime pop. These could be combined in a query with an OR so you could get all the people that live or work within your trade area.
Advanced indexes get loaded via the CalgaryAdvancedIndex tool and the load is separate from the regular data load/index build. If you wanted to combine advanced index loads with your regular index build, it's recommended to select only the fields you require for the advanced index build and then use a Block Until Done tool such that you are not trying to build these indexes simultaneously with the other Calgary Data Load.
Full Text (Contains)
This is a special index that searches anywhere in the field as opposed to the whole field or starting at the beginning. By necessity, the index is much bigger and takes longer to load than any other index type, but the query speed is nearly as fast as a regular whole-field query. When querying with a Contains style index, type='range'
and allowPartial='T'
are not valid.
Because of the nature of this index, it's not recommended for really long fields if you have a lot of records 50 chars and 100M records would be good approximate limits. With more records, you might not want to index such a large field. It has an absolute limit of 255 characters fields.
This is a string-only index. It works similarly well for narrow strings as wide strings, but wide strings will result in a larger slower index.
This is a special build mode for standard indexes. With this mode, you can specify the field value and the record ID, which must be created by hand as a 0-based int32). This is handy for creating indexes where 1 record might be indexed with multiple values.
For example, if you had a file that has fields for Person #1 through Person #6, you could create 1 index that contained all the names from all the fields such that you could search on a single name and then you would be able to query by this index for a name and find the records whichever fields they are in. It can also be useful for creating indexes on fields that are not in the Calgary Database, like Fuzzy Match keys. Go to Fuzzy Match Edit Match Options for more information. Any type of index is supported, although SpatialObj probably does not have a valid use case.
Go to Building a Raw Index in Calgary Loader for more information.