TS Filler Tool
Use Time Series Filler to take a data stream of time series data and fill in any gaps in the series.
This tool is used primarily as a preparation step for using downstream time series-related tools and macros. Some time-series tools produce unexpected results or errors if the data stream contains gaps in the time series, for example, you have a series of data that is supposed to contain measurements every 5 minutes, but you don’t actually have measurements covering every 5 minutes. Use this macro whenever you suspect your data stream may have gaps in it.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Connect an Input
The TS Filler tool requires a Designer data stream that has a column containing date or date-time values.
Configure the Tool
Use the Configuration tab to set the controls for how the time series is filled.
- Select Date or DateTime column: Choose the date or date-time column that indicates when the data row was collected.
- Interval: Select the interval by which the time series is measured. Options include Minute, Hour, Day, Week, Month, and Year.
- Increment: Select the increment by which each unique time series period should be separated. You can set the increment to any integer from 1 to 100.
To generate a series covering “every 3 weeks” you would set Interval to Week and Increment to 3.
View the Output
Every row entering the macro appears in the output. In addition, the macro evaluates if any periods in the time series are missing, and if so, generates a row to fill the gap.
The tool appends two data columns to the incoming data stream:
- OriginalDateTime: A date-time column that contains the original, unaltered date-time value
- FlagGeneratedRow: A Boolean flag column, where "True" indicates that the data row was generated by the macro to fill a time series gap and "False" indicates it’s a source data row.
For source data rows passing through the macro:
- The FlagGeneratedRow column reads "False" to indicate that it is a source data row, not a row generated by the macro.
- The [yourDateTimeColumn] column is "rounded" to the appropriate period. See Rounding section below.
- All other source columns pass through unaltered.
For the rows that were generated to fill gaps:
- The FlagGeneratedRow column reads “True” to indicate that it is a row generated by the macro, not a source data row.
- The [yourDateTimeColumn] column has a populated date or date-time value generated by the macro which “fills” the gap identified in the series.
- All other columns are null.
Per the most common business usage scenarios, time series periods are handled as follows:
- "Time-level": For Intervals
- "Date-level": For intervals >= a day, the period is "period ending".
The only modification that the macro makes to the incoming data is: a "rounding" is applied to the [yourDateTimeColumn]. It is based on three key elements:
- Earliest Date-time Value in Data Stream
First, all fractions of the chosen interval are truncated off. For example, if you choose "Hour" as the interval, the macro will truncate any minutes and seconds from the date-time value, so for example. 1:58pm becomes 1:00pm.
After truncating the fractions of the chosen interval, the macro then determines what the valid periods are. The earliest date-time value entering the macro forms the basis of the series. Continuing the above "hour-series" example, if the earliest date-time in the [yourDateTimeColumn] is 1:58pm, then the first period is 1:00pm. If the chosen increment is 5, the subsequent periods are generated based on successively adding the chosen increment—5 hours—to the previous period, resulting in this case to 1:00pm, 6:00pm, 11:00pm, etc.
Finally, now that all valid periods have been determined, all date-time values are then adjusted to the appropriate valid period. So a raw value of 5:59pm in the [yourDateTimeColumn] becomes 5:00pm and 6:02pm becomes 6:00pm, etc.
After all date-time adjustments have been applied by the macro, the resulting [yourDateTimeColumn] is now effectively a "label" for that period. However, the datatype is a date or date-time column (rather than a string), and as such this "label" may not be entirely intuitive. The key is to recognize that any portion of the date or date-time "label" that is "fractional" to the chosen interval should be ignored. So if you interval is "year", then an example period could be "2018-01-01" but it’s the year that matters in the label, the fractional portions of the label—the "01-01"—should be ignored, meaning that the period is simply the year 2018, or "any time in 2018". See the table below.
Note that, aside from the date-time “rounding” described above, the macro does not alter any incoming data. It doesn’t do any combining, summarizing, uniquing, or any type of validations on the incoming data. However, because of the date rounding, multiple measurements within the same period will now carry the same period label. As such, the data is now cleaner and can more readily be summarized because data can be “grouped by” this consistent, rounded period label.
Daylight Saving Time
Note that the macro does not account for Daylight Saving Time, so if you have time-level series data that crosses a Daylight Saving Time cutover, it’s possible that the macro will generate an hour row (or minute rows) representing an invalid hour. For example, areas that "Spring forward" one hour in March will have rows generated for the 2am hour on the Spring forward day even though that hour never actually occurred (the clock switches from 1:59:59 to 3:00:00). This hour can easily be removed with a filter tool after the macro.
Use Case Example
To clarify, the following table lists the interval options available, followed by what datatype is returned, and whether the period for that interval type is a "period starting" or "period ending" interval. The table then displays an example increment and "first raw value" in the series, to illustrate what the resulting period "Label" would be and an English description of what the period label actually means. Finally, the table notes which portion of the period label should essentially be ignored, like the portions that are fractional to the interval.
|Interval||Data Type Returned||Period Starting/Ending||Example Increment||Example First Raw Value||Result First Period Label||Period Description||Ignore On Period Label|
|Minute||DateTime||Starting||2||2015-04-09 02:16:33||2015-04-09 02:16:00||"The 2 minute period STARTING at 2:16, covering all of 2:16 and 2:17"||Seconds|
|Hour||DateTime||Starting||4||2015-04-09 02:16:33||2015-04-09 02:00:00||"The 4 hour period STARTING at 2, covering all of the 2, 3, 4, and 5 o'clock hours."||Minutes or less|
|Day||Date||Ending||6||2015-04-09 02:16:33||2015-04-09||"The 6 day period ENDING on 4/9, covering all of 4/4 - 4/9."||Hours or less
|Week||Date||Ending||1||2015-04-09 02:16:33||2015-04-09||"The 1 week period ENDING on 4/9, covering all of 4/3 - 4/9."||Hours or less|
|Month||Date||Ending||3||2015-04-09 02:16:33||2015-04-01||"The 3 month period ENDING in April, covering all of February, March, and April."||Days or less|
|Year||Date||Ending||2||2015-04-09 02:16:33||2015-04-01||"The 2 year period ENDING in 2015, covering all of 2014 and 2015."||Months or less|