MB Rules Tool
The MB Rules tool takes transaction data and, after transforming the data, creates either a set of association rules using the Apriori algorithm or frequent itemsets using either the Apriori or Eclat algorithms. A summary report of both the transaction data and the rules/itemsets is produced, along with a model object that can be further investigated in a downstream process.
Rules and itemsets differ in that association rules imply a specific, causal relationship between items in a group, while item sets consist of groups that frequently co-occur in transactions. In the case of association rules, the presence of some subset of items in a transaction (the left-hand side items, or LHS) leads to the inclusion of other items in the transaction (the right-hand side items, or RHS).
This tool uses the R tool. Go to Options > Download Predictive Tools and sign in to the Alteryx Downloads and Licenses portal to install R and the packages used by the R Tool. See Download and Use Predictive Tools.
Configure the tool
Input data structure
Two different formats of the input data are possible. The first format consists of records that contain a single item identifier along with an identifier for the transaction the item was contained within (a set of unique item-transaction pairs). The second format consists of a single record per transaction that contains a delimited list of items contained in the transaction.
- One item per record with a transaction key: This option corresponds to the unique item-transaction pairs. Under this option, the user needs to specify two fields in the data stream using the options:
- Select the transaction key field: The integer or string oriented field that contains the transaction identifier field.
- Select the field that contains the item identifier: The integer or string oriented field that contains the item identifiers.
- One transaction per record with all items in a single (internally delimited) field: This option corresponds to the format where a transaction is contained in a single record. For this format the user needs to specify two fields in the data stream using the options:
- Select the field with the delimited transaction items: The string oriented field that contains the delimited transaction item lists.
- Provide the delimiter character used to separate items in a transaction: The delimiter character such as a comma.
Method to use
The two most commonly used algorithms for finding association rules and frequent itemsets are provided, Apriori and Eclat. The Apriori algorithm employs level-wise search for three types of frequent itemsets (frequent, maximally frequent, and closed frequent), association rules, or association hyperedgesets. The Eclat algorithm uses simple intersection operations for equivalence class clustering along with bottom-up lattice traversal to find the three types of frequent itemsets. In addition to selecting the method to use, the user need to specify what to find (e.g., itemsets, rules, or hyperedgesets). The options are:
- Apriori: This option selects the Apriori algorithm. With this method, the user can find either frequent itemsets, maximally frequent itemsets, closed frequent item sets, association rules (the default), or association hyperedgesets.
- Eclat: This option selects the Eclat algorithm for finding itemsets. The user specifies whether frequent itemsets, maximally frequent itemsets, or closed frequent itemsets should be found.
Control parameters
The control parameters influence the nature of the association rules, frequent itemsets, or association hyperedgesets that are extracted from the transaction data. These parameters are:
- The allowable minimum number of items in a rule or itemset: This parameter limits the returned rules or itemsets to contain at least the specified number of items. By default its value is set to 1 (also it minimal value), but can be set to a higher number. The natural choices for this parameter are either 1 or 2.
- The minimum required level of support for a rule or itemset: Support is the proportion of transactions that contain the items in the itemset or association rule. The default value for this parameter is 0.02, but can be set between 0.002 and 1. In general, the lower the value of this parameter, the larger the number of rules or itemsets are returned. In some instances, the number of returned rules or itemsets can exhaust the users available system memory, so too small a value should not be used.
- The minimum required level of confidence for a rule or itemset (valid only for Apriori): Confidence is the proportion of transactions where the RHS items are in the transactions that also contain the LHS items. In other words, it is a measure of the probability that the RHS items will be in the transaction when the LHS are also in the transaction. This measure is only applicable in the case of the Apriori algorithm. As with the support parameter, the lower the value of this parameter, the larger the number of rules or itemsets are returned. In some instances, the number of returned rules or itemsets can exhaust the users available system memory, so too small a value should not be used.