Partition

Prev Next

Description

The Partition node splits a data set randomly, using a predefined seed number. A new 'ROLE' column is created with the default values 'TRAINING' or 'SCORING' for each row.


Configuration Options

Basic Configuration Options

Setting Description\Parameters
Partition Column Name Name of the new column to be created. Data is partitioned into two groups called and called the Role.
Partition Mode Partition By Fraction splits data into training/scoring records by fraction, i.e., 80% of records become training, and 20% become scoring. Partition By Record Count assigns an exact number of records to training/scoring, i.e., assign the first ten records to training, or the last five records to scoring.
Percent Training Percent of the data set that has the Training Value assigned to it.
Percent Scoring Percent of the data set that has the Scoring Value assigned to it.
Training Value Text value assigned to the Percent Training fraction.
Scoring Value Text value assigned to the Percent Scoring fraction.
Row Assignment Select Training Records to set the exact number of training records; the rest become scoring records. Select Scoring Records to set the exact number of scoring records; the rest become training records.
Partition Records Number of records assigned the selected Row Assignment setting.
Sample Randomly Sample Randomly assigns training/scoring rows at random throughout the data set. Split Time Series Data splits rows into training/scoring while maintaining order.
Random Seed If Static, use the same seed value each time this node is run. If Random, use a new seed value each time.
Seed Value Number seeds the random split between the scoring and training fraction. Changing this number alters the random distribution.
Split Columns The selected column(s) will be used to split the data set over. Each new subset is assigned to training/scoring.
Sort By Columns Sort data rows by these columns.

Actions

Action Description
Preview Once the node is configured, the combined result set can be previewed at any time.