âShow Table of Contents
Simulation Sampling Tool
The Simulation Sample tool samples data parametrically from a distribution, from input data, or as a combination best fitting to a distribution. Data can also be "drawn" if you are unsure of the parameters of a distribution and lacking data.
This tool uses the R programming language. Go to Options > Download Predictive Tools to install R and the packages used by the R Tool.
- D: (Optional) Sample data. This input is required if you are sampling from raw or binned data.
- S: (Optional) Simulation data. If previous simulation tools have been used, the data from them can be connected to this input to append the data and obtain an iteration count and seed.
- Select sampling mechanism: Monte Carlo / Simple Sampling or Latin HyperCube / Stratified Sampling. For stratified sampling from data, the maximum strata size is determined by the choice of chunk size.
- Chunk size: The maximal size of data to evaluate at a time. This can be used to avoid R's in-memory processing limitation. For stratified sampling from data, this is also the maximal size of the strata.
- Seed: The random seed used for sampling. This option is not available if a dataset containing a seed field is connected to the S input, as that seed will be incremented and used instead.
- Number of iterations: The number of samples to select. This option is not available if a dataset is connected to the S input, as the size of that dataset determines the number of iterations.
- Select sampling mode:
For parametric sampling, do not connect a data stream to the D input.
- Enter name for outgoing data: Specify a field name for the output field.
- Select distribution: Select from the list of supported distributions. Along with the parameters, this determines the plot of the depicted cumulative density / mass function.
- Enter Parameters and Bounds: To define the parameters for the distribution, use the sliders or the up/down arrows to adjust the values or manually enter values. Along with the selected distribution, this determines the plot of the depicted cumulative density / mass function. You can optionally specify bounds for the distribution. If bounds are specified, rejection sampling will be used to ensure that samples drawn are between the upper and lower bounds. Bounds are inclusive.
Sample from data
Either connect a data stream to the D input or sample via manual entry.
- Sample with replacement: Select this option to sample with replacement.
- Specify kind of data:
- Raw Data (not binned): Select the fields to sample, and then a sampling strategy (see below).
- Binned Data: Requires an ID field and a value field with equally-spaced bins. (IDs are equally-spaced numbers.) Specify a name for the outgoing data (output field name), sampling strategy (see below), ID field for the binned data, and the value field for the binned data.
- Manual Entry: This is for manually-entered data via a Roulette widget. Specify a name for the outgoing data (output field name) and Roulette widget parameters. Use the Lower, Upper, Height, and Num Bins options to configure the size of the grid. Then, click in the grid to provide the height of the bins.
- Select sampling strategy: Choose how to sample your data. You can sample entire rows, each column independently, or from the best-fitting distribution. If you choose to sample from the best-fitting distribution, select the distributions to which to fit the data.
- D: The data output. This is the result of the simulation.
Â©2017 Alteryx, Inc., all rights reserved. AllocateÂ®, AlteryxÂ®, GuzzlerÂ®, and SolocastÂ® are registered trademarks of Alteryx, Inc.