TS Filler Tool

The Time Series Filler tool takes a data stream of time series data and fills in any gaps in the series.

This tool is used primarily as a preparation step for using downstream time series-related tools and macros. Some time series tools will produce unexpected results or errors if the data stream contains gaps in the time series, e.g. you have a series of data that is supposed to contain measurements every 5 minutes, but you don’t actually have measurements covering every 5 minutes. Use this macro whenever you suspect your data stream may have gaps in it.

This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.

Input

The macro accepts any data stream with a date or datetime column.

Configuration Properties

Configuration

For example, to generate a series covering “every 3 weeks” you would choose “Week” for the interval and 3 for the increment.

Output

Every row entering the macro will appear on the output. In addition, the macro will evaluate if any periods in the time series are missing, and if so, will generate a row to ‘fill’ the gap.

The tool appends two data columns to the incoming data stream:

For source data rows passing through the macro:

For the rows that were generated to fill gaps:

      1. The FlagGeneratedRow column will read “True” to indicate that it is a row generated by the macro, not a source data row.

      2. The [yourDateTimeColumn] column will have a populated date or datetime value generated by the macro which “fills” the gap identified in the series.

      3. All other columns will be null.

Period Starting/Ending

Per the most common business usage scenarios, time series periods are handled as follows:

See table for examples.

DateTime “Rounding”

The only modification that the macro makes to the incoming data is: a “rounding” is applied to the [yourDateTimeColumn]. It is based on three key elements:

First, all fractions of the chosen interval are truncated off. For example, if you choose “Hour” as the interval, the macro will truncate any minutes and seconds from the datetime value, so e.g. 1:58pm will become 1:00pm.

After truncating the fractions of the chosen interval, the macro then determines what the valid periods are. The earliest datetime value entering the macro forms the basis of the series. Continuing the above “hour-series” example, if the earliest datetime in the [yourDateTimeColumn] was 1:58pm, then the first period is 1:00pm. If the chosen increment is 5, the subsequent periods are generated based on successively adding the chosen increment—5 hours—to the previous period, resulting in this case to 1:00pm, 6:00pm, 11:00pm etc.

Finally, now that all valid periods have been determined, all datetime values are then adjusted to the appropriate valid period. So a raw value of 5:59pm in the [yourDateTimeColumn] would become 1:00pm, 6:02pm would become 6:00pm, etc.

Period “Labels”

After all datetime adjustments have been applied by the macro, the resulting [yourDateTimeColumn] is now effectively a “label” for that period. However, the datatype is a date or datetime column (rather than a string), and as such this “label” may not be entirely intuitive. The key is to recognize that any portion of the date or datetime “label” that is “fractional” to the chosen interval should be ignored. So if you interval is “year”, than an example period could be “2018-01-01” but it’s the year that matters in the label; the fractional portions of the label—the “01-01”—should be ignored, meaning that the period is simply the year 2018, or “any time in 2018”. See table below.

Data Preservation

Note that, aside from the datetime “rounding” described above, the macro does not alter any incoming data. It doesn’t do any combining, summarizing, uniquing or any type of validations on the incoming data. However, because of the date rounding, multiple measurements within the same period will now carry the same period label. As such, the data is now cleaner and can more readily be summarized because data can be “grouped by” this consistent, rounded period label.

Daylight Saving Time

Note that the macro does not account for Daylight Saving Time, so if you have time-level series data that crosses a Daylight Saving Time cutover, it’s possible that the macro will generate an hour row (or minute rows) representing an invalid hour. For example, areas that “Spring forward” one hour in March will have row(s) generated for the 2am hour on the Spring forward day even though that hour never actually occurred. (The clock switches from 1:59:59 to 3:00:00.) This hour can easily be removed with a filter tool after the macro.

Use Case Example Table

To clarify, the following table lists the interval options available, followed by what datatype is returned, and whether the period for that interval type is a “period starting” or “period ending” interval. The table then displays an example increment and “first raw value” in the series, to illustrate what the resulting period “Label” would be and an English description of what the period label actually means. Finally, the table notes which portion of the period label should essentially be ignored, i.e. the portions that are fractional to the interval.

Interval Data Type Returned Period Starting/Ending Example Increment Example First Raw Value Result First Period Label Period Description Ignore On Period Label

Minute

DateTime Starting 2 2015-04-09 02:16:33 2015-04-09 02:16:00 "The 2 minute period STARTING at 2:16, covering all of 2:16 and 2:17" Seconds

Hour

DateTime Starting 4 2015-04-09 02:16:33 2015-04-09 02:00:00 "The 4 hour period STARTING at 2, covering all of the 2, 3, 4, and 5 o'clock hours." Minutes or less

Day

Date Ending 6 2015-04-09 02:16:33 2015-04-09 "The 6 day period ENDING on 4/9, covering all of 4/4 - 4/9." Hours or less

Week

Date Ending 1 2015-04-09 02:16:33 2015-04-09 "The 1 week period ENDING on 4/9, covering all of 4/3 - 4/9." Hours or less

Month

Date Ending 3 2015-04-09 02:16:33 2015-04-01 "The 3 month period ENDING in April, covering all of February, March, and April." Days or less

Year

Date Ending 2 2015-04-09 02:16:33 2015-04-01 "The 2 year period ENDING in 2015, covering all of 2014 and 2015." Months or less