Calgary Data Loads
Data Loads
Data should always be sorted with maximum locality. You would typically sort on multiple fields by descending selectivity. The reason you pick multiple fields is such that if any fields are not populated, you still get a decent amount of locality. A typical sort would be:
- State
- City
- ZIP
- Block
- Point
It is HIGHLY recommended that the Auto Field Tool is used before doing the load. Maximizing the efficiency of packing in the fields can be huge. Before running the Auto Field Tool, data may need to be trimmed via a Formula Tool. Especially when data comes from flat files, it can be padded with spaces. Removing these spaces can improve compression & indexing.
There is a 2^31 limit on the number of records in a Calgary Database (approximately 2 Billion). It is designed for a sweet spot of around 100-300 million records. If the database contains more than 300M records, the 64bit version of Alteryx Designer would be highly recommended.
Record Size
The Calgary Database (*.cydb) is tuned for smaller records while at the same time having no limit to ultimate record size. Records over 2K bytes will incur both a compression and performance penalty compared to smaller records.
There is a record size limit of 2GB in a 64-bit environment.
Compression
The initial implementation of the Calgary Database file (*.cydb) has minimal compression. It is designed for load & read speed as well as random access. In initial test results show the compression achieving factors of between 1:1 & 4:1 (25 % of original size) compared to csv files. Presuming the load was done with a good geolocated sort, between 2:1 & 3:1 can be expected. It is similar, but slightly larger, than the yxdb format without the spatial index.
The Calgary spatial index uses 5 decimal places of accuracy for compression and speed. The .yxdb spatial index uses 6 decimal places. This adds an addition round off error of up to a maximum of 1.8 feet to Calgary indexes. In other words, it is possible that a point can be 1.8 feet inside of a polygon and be found as outside in Calgary.
Loading
When loading Calgary Databases, you can choose to load or not load any set of fields, as well as choosing to index or not index any set of fields. Only fields that are chosen to be indexed will be available for queries and only fields chose for data will be available for retrieval. There is no requirement that fields selected for indexes have to be loaded for retrieval and visa versa. An example of this might be the Spatial Object (Point). You might index this field so you can retrieve records spatially, but not include it in the data because you also have lat/long fields.