Simulation Sampling Tool
The Simulation Sample tool samples data parametrically from a distribution, from input data, or as a combination best fitting to a distribution. Data can also be "drawn" if you are unsure of the parameters of a distribution and lacking data.
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Connect inputs
- D anchor: Optional. Sample data. This input is required if you are sampling from raw or binned data.
- S anchor: Optional. Simulation data. If previous simulation tools have been used, the data from them can be connected to this input to append the data and obtain an iteration count and seed.
Configure the tool
- Select sampling mechanism: Monte Carlo / Simple Sampling or Latin HyperCube / Stratified Sampling. For stratified sampling from data, the maximum strata size is determined by the choice of chunk size.
- Chunk size: The maximal size of data to evaluate at a time. This can be used to avoid R's in-memory processing limitation. For stratified sampling from data, this is also the maximal size of the strata.
- Seed: The random seed used for sampling. This option is not available if a dataset containing a seed field is connected to the S input, as that seed will be incremented and used instead.
- Number of iterations: The number of samples to select. This option is not available if a dataset is connected to the S input, as the size of that dataset determines the number of iterations.
- Select sampling mode:
- Enter name for outgoing data: Specify a field name for the output field.
- Select distribution: Select from the list of supported distributions. Along with the parameters, this determines the plot of the depicted cumulative density / mass function.
- Enter Parameters and Bounds: To define the parameters for the distribution, use the sliders or the up/down arrows to adjust the values or manually enter values. Along with the selected distribution, this determines the plot of the depicted cumulative density / mass function. You can optionally specify bounds for the distribution. If bounds are specified, rejection sampling is used to ensure that samples drawn are between the Lower Bound and Upper Bound. Bounds are inclusive.
- Sample with replacement: Select this option to sample with replacement.
- Specify kind of data: Select one of the data options.
- Raw Data (not binned): Select the fields to sample, and then a sampling strategy (see below).
- Binned Data: Requires an ID field and a value field with equally-spaced bins. (IDs are equally-spaced numbers.) Specify a name for the outgoing data (output field name), sampling strategy (see below), ID field for the binned data, and the value field for the binned data.
- Manual Entry: Manually enter data via a Roulette widget. Enter a name for the outgoing data (output field name) and Roulette widget parameters. Use the Lower, Upper, Height, and Num Bins options to configure the size of the grid. Click in the grid to provide the height of the bins.
- Select fields to sample: Select columns to sample data from.
- Select sampling strategy: Choose how to sample your data. Sample entire rows, each column independently, or from the best-fitting distribution. If you choose to sample from the best-fitting distribution, select the distributions to which to fit the data.
For parametric sampling, do not connect a data stream to the D input.
Either connect a data stream to the D input or sample via manual entry.
View the output
Connect a Browse tool to the output anchor to view results.
- D anchor: The data output. This is the result of the simulation.