TS Filler Tool
The Time Series Filler tool takes a data stream of time series data and fills in any gaps in the series.
This tool is used primarily as a preparation step for using downstream time series-related tools and macros. Some time series tools produce unexpected results or errors if the data stream contains gaps in the time series, e.g. you have a series of data that is supposed to contain measurements every 5 minutes, but you don’t actually have measurements covering every 5 minutes. Use this macro whenever you suspect your data stream may have gaps in it.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Connect an input
The TS Filler tool requires a Designer data stream that has a column containing Date or DateTime values.
Configure the tool
Use the Configuration tab to set the controls for how the time series is filled.
- Select Date or DateTime column: Choose the date or datetime column indicating when the data row was collected.
- Interval: Select the interval by which the time series is measured. Options include Minute, Hour, Day, Week, Month, and Year.
- Increment: Select the increment by which each unique time series period should be separated. You can set the increment to any integer from 1 to 100.
View the output
Every row entering the macro appears in the output. In addition, the macro evaluates if any periods in the time series are missing, and if so, generates a row to fill the gap.
The tool appends two data columns to the incoming data stream:
- OriginalDateTime: A DateTime column containing the original, unaltered DateTime value
- FlagGeneratedRow: A Boolean flag column, with “True” indicating that the data row was generated by the macro to fill a time series gap; “False” indicating it’s an source data row.
For source data rows passing through the macro:
- The FlagGeneratedRow column will read “False” to indicate that it is a source data row; not a row generated by the macro.
- The [yourDateTimeColumn] column will be “rounded” to the appropriate period. See “Rounding” section below.
- All other source columns will pass through unaltered.
For the rows that were generated to fill gaps:
- The FlagGeneratedRow column will read “True” to indicate that it is a row generated by the macro, not a source data row.
- The [yourDateTimeColumn] column will have a populated date or datetime value generated by the macro which “fills” the gap identified in the series.
- All other columns will be null.
Per the most common business usage scenarios, time series periods are handled as follows:
- “Time-level”: For intervals
- “Date-level”: For intervals >= a day, the period is “period ending.”
The only modification that the macro makes to the incoming data is: a “rounding” is applied to the [yourDateTimeColumn]. It is based on three key elements:
- Interval
- Earliest datetime value in data stream
- Increment
First, all fractions of the chosen interval are truncated off. For example, if you choose “Hour” as the interval, the macro will truncate any minutes and seconds from the datetime value, so e.g. 1:58pm will become 1:00pm.
After truncating the fractions of the chosen interval, the macro then determines what the valid periods are. The earliest datetime value entering the macro forms the basis of the series. Continuing the above “hour-series” example, if the earliest datetime in the [yourDateTimeColumn] was 1:58pm, then the first period is 1:00pm. If the chosen increment is 5, the subsequent periods are generated based on successively adding the chosen increment—5 hours—to the previous period, resulting in this case to 1:00pm, 6:00pm, 11:00pm etc.
Finally, now that all valid periods have been determined, all datetime values are then adjusted to the appropriate valid period. So a raw value of 5:59pm in the [yourDateTimeColumn] would become 1:00pm, 6:02pm would become 6:00pm, etc.
After all datetime adjustments have been applied by the macro, the resulting [yourDateTimeColumn] is now effectively a “label” for that period. However, the datatype is a date or datetime column (rather than a string), and as such this “label” may not be entirely intuitive. The key is to recognize that any portion of the date or datetime “label” that is “fractional” to the chosen interval should be ignored. So if you interval is “year”, than an example period could be “2018-01-01” but it’s the year that matters in the label; the fractional portions of the label—the “01-01”—should be ignored, meaning that the period is simply the year 2018, or “any time in 2018”. See table below.
Note that, aside from the datetime “rounding” described above, the macro does not alter any incoming data. It doesn’t do any combining, summarizing, uniquing or any type of validations on the incoming data. However, because of the date rounding, multiple measurements within the same period will now carry the same period label. As such, the data is now cleaner and can more readily be summarized because data can be “grouped by” this consistent, rounded period label.
Note that the macro does not account for Daylight Saving Time, so if you have time-level series data that crosses a Daylight Saving Time cutover, it’s possible that the macro will generate an hour row (or minute rows) representing an invalid hour. For example, areas that “Spring forward” one hour in March will have row(s) generated for the 2am hour on the Spring forward day even though that hour never actually occurred. (The clock switches from 1:59:59 to 3:00:00.) This hour can easily be removed with a filter tool after the macro.
To clarify, the following table lists the interval options available, followed by what datatype is returned, and whether the period for that interval type is a “period starting” or “period ending” interval. The table then displays an example increment and “first raw value” in the series, to illustrate what the resulting period “Label” would be and an English description of what the period label actually means. Finally, the table notes which portion of the period label should essentially be ignored, i.e. the portions that are fractional to the interval.
Interval | Data Type Returned | Period Starting/Ending | Example Increment | Example First Raw Value | Result First Period Label | Period Description | Ignore On Period Label |
---|---|---|---|---|---|---|---|
Minute |
DateTime | Starting | 2 | 2015-04-09 02:16:33 | 2015-04-09 02:16:00 | "The 2 minute period STARTING at 2:16, covering all of 2:16 and 2:17" | Seconds |
Hour |
DateTime | Starting | 4 | 2015-04-09 02:16:33 | 2015-04-09 02:00:00 | "The 4 hour period STARTING at 2, covering all of the 2, 3, 4, and 5 o'clock hours." | Minutes or less |
Day |
Date | Ending | 6 | 2015-04-09 02:16:33 | 2015-04-09 | "The 6 day period ENDING on 4/9, covering all of 4/4 - 4/9." | Hours or less |
Week |
Date | Ending | 1 | 2015-04-09 02:16:33 | 2015-04-09 | "The 1 week period ENDING on 4/9, covering all of 4/3 - 4/9." | Hours or less |
Month |
Date | Ending | 3 | 2015-04-09 02:16:33 | 2015-04-01 | "The 3 month period ENDING in April, covering all of February, March, and April." | Days or less |
Year |
Date | Ending | 2 | 2015-04-09 02:16:33 | 2015-04-01 | "The 2 year period ENDING in 2015, covering all of 2014 and 2015." | Months or less |