The Time Series Filler tool takes a data stream of time series data and fills in any gaps in the series.
This tool is used primarily as a preparation step for using downstream time series-related tools and macros. Some time series tools will produce unexpected results or errors if the data stream contains gaps in the time series, e.g. you have a series of data that is supposed to contain measurements every 5 minutes, but you donât actually have measurements covering every 5 minutes. Use this macro whenever you suspect your data stream may have gaps in it.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
The macro accepts any data stream with a date or datetime column.
Select Date or DateTime column: Choose the date or datetime column indicating when the data row was collected.
Interval: Select the interval by which the time series is measured. Options:
Increment: Select the increment by which each unique time series period should be separated. Options:
Any integer from 1 to 100
For example, to generate a series covering âevery 3 weeksâ you would choose âWeekâ for the interval and 3 for the increment.
Every row entering the macro will appear on the output. In addition, the macro will evaluate if any periods in the time series are missing, and if so, will generate a row to âfillâ the gap.
The tool appends two data columns to the incoming data stream:
OriginalDateTime: A datetime column containing the original, unaltered datetime value
FlagGeneratedRow: A Boolean flag column, with âTrueâ indicating that the data row was generated by the macro to fill a time series gap; âFalseâ indicating itâs an source data row.
For source data rows passing through the macro:
The FlagGeneratedRow column will read âFalseâ to indicate that it is a source data row; not a row generated by the macro.
The [yourDateTimeColumn] column will be âroundedâ to the appropriate period. See âRoundingâ section below.
All other source columns will pass through unaltered.
For the rows that were generated to fill gaps:
The FlagGeneratedRow column will read âTrueâ to indicate that it is a row generated by the macro, not a source data row.
The [yourDateTimeColumn] column will have a populated date or datetime value generated by the macro which âfillsâ the gap identified in the series.
All other columns will be null.
Per the most common business usage scenarios, time series periods are handled as follows:
âTime-levelâ: For intervals
âDate-levelâ: For intervals >= a day, the period is âperiod ending.â
See table for examples.
The only modification that the macro makes to the incoming data is: a âroundingâ is applied to the [yourDateTimeColumn]. It is based on three key elements:
First, all fractions of the chosen interval are truncated off. For example, if you choose âHourâ as the interval, the macro will truncate any minutes and seconds from the datetime value, so e.g. 1:58pm will become 1:00pm.
After truncating the fractions of the chosen interval, the macro then determines what the valid periods are. The earliest datetime value entering the macro forms the basis of the series. Continuing the above âhour-seriesâ example, if the earliest datetime in the [yourDateTimeColumn] was 1:58pm, then the first period is 1:00pm. If the chosen increment is 5, the subsequent periods are generated based on successively adding the chosen incrementâ5 hoursâto the previous period, resulting in this case to 1:00pm, 6:00pm, 11:00pm etc.
Finally, now that all valid periods have been determined, all datetime values are then adjusted to the appropriate valid period. So a raw value of 5:59pm in the [yourDateTimeColumn] would become 1:00pm, 6:02pm would become 6:00pm, etc.
After all datetime adjustments have been applied by the macro, the resulting [yourDateTimeColumn] is now effectively a âlabelâ for that period. However, the datatype is a date or datetime column (rather than a string), and as such this âlabelâ may not be entirely intuitive. The key is to recognize that any portion of the date or datetime âlabelâ that is âfractionalâ to the chosen interval should be ignored. So if you interval is âyearâ, than an example period could be â2018-01-01â but itâs the year that matters in the label; the fractional portions of the labelâthe â01-01ââshould be ignored, meaning that the period is simply the year 2018, or âany time in 2018â. See table below.
Note that, aside from the datetime âroundingâ described above, the macro does not alter any incoming data. It doesnât do any combining, summarizing, uniquing or any type of validations on the incoming data. However, because of the date rounding, multiple measurements within the same period will now carry the same period label. As such, the data is now cleaner and can more readily be summarized because data can be âgrouped byâ this consistent, rounded period label.
Note that the macro does not account for Daylight Saving Time, so if you have time-level series data that crosses a Daylight Saving Time cutover, itâs possible that the macro will generate an hour row (or minute rows) representing an invalid hour. For example, areas that âSpring forwardâ one hour in March will have row(s) generated for the 2am hour on the Spring forward day even though that hour never actually occurred. (The clock switches from 1:59:59 to 3:00:00.) This hour can easily be removed with a filter tool after the macro.
To clarify, the following table lists the interval options available, followed by what datatype is returned, and whether the period for that interval type is a âperiod startingâ or âperiod endingâ interval. The table then displays an example increment and âfirst raw valueâ in the series, to illustrate what the resulting period âLabelâ would be and an English description of what the period label actually means. Finally, the table notes which portion of the period label should essentially be ignored, i.e. the portions that are fractional to the interval.
Interval | Data Type Returned | Period Starting/Ending | Example Increment | Example First Raw Value | Result First Period Label | Period Description | Ignore On Period Label |
---|---|---|---|---|---|---|---|
Minute |
DateTime | Starting | 2 | 2015-04-09 02:16:33 | 2015-04-09 02:16:00 | "The 2 minute period STARTING at 2:16, covering all of 2:16 and 2:17" | Seconds |
Hour |
DateTime | Starting | 4 | 2015-04-09 02:16:33 | 2015-04-09 02:00:00 | "The 4 hour period STARTING at 2, covering all of the 2, 3, 4, and 5 o'clock hours." | Minutes or less |
Day |
Date | Ending | 6 | 2015-04-09 02:16:33 | 2015-04-09 | "The 6 day period ENDING on 4/9, covering all of 4/4 - 4/9." | Hours or less |
Week |
Date | Ending | 1 | 2015-04-09 02:16:33 | 2015-04-09 | "The 1 week period ENDING on 4/9, covering all of 4/3 - 4/9." | Hours or less |
Month |
Date | Ending | 3 | 2015-04-09 02:16:33 | 2015-04-01 | "The 3 month period ENDING in April, covering all of February, March, and April." | Days or less |
Year |
Date | Ending | 2 | 2015-04-09 02:16:33 | 2015-04-01 | "The 2 year period ENDING in 2015, covering all of 2014 and 2015." | Months or less |
©2018 Alteryx, Inc., all rights reserved. Allocate®, Alteryx