Description
The Partition node splits a data set randomly, using a predefined seed number. A new 'ROLE' column is created with the default values 'TRAINING' or 'SCORING' for each row.
Configuration Options
Basic Configuration Options
Setting |
Description\Parameters |
Partition Column Name |
Name of the new column to be created. Data is partitioned into two groups called and called the Role. |
Partition Mode |
Partition By Fraction splits data into training/scoring records by fraction, i.e., 80% of records become training, and 20% become scoring. Partition By Record Count assigns an exact number of records to training/scoring, i.e., assign the first ten records to training, or the last five records to scoring. |
Percent Training |
Percent of the data set that has the Training Value assigned to it. |
Percent Scoring |
Percent of the data set that has the Scoring Value assigned to it. |
Training Value |
Text value assigned to the Percent Training fraction. |
Scoring Value |
Text value assigned to the Percent Scoring fraction. |
Row Assignment |
Select Training Records to set the exact number of training records; the rest become scoring records. Select Scoring Records to set the exact number of scoring records; the rest become training records. |
Partition Records |
Number of records assigned the selected Row Assignment setting. |
Sample Randomly |
Sample Randomly assigns training/scoring rows at random throughout the data set. Split Time Series Data splits rows into training/scoring while maintaining order. |
Random Seed |
If Static , use the same seed value each time this node is run. If Random , use a new seed value each time. |
Seed Value |
Number seeds the random split between the scoring and training fraction. Changing this number alters the random distribution. |
Split Columns |
The selected column(s) will be used to split the data set over. Each new subset is assigned to training/scoring. |
Sort By Columns |
Sort data rows by these columns. |
Actions
Action |
Description |
Preview |
Once the node is configured, the combined result set can be previewed at any time. |