Partition
  • 29 Sep 2022
  • 1 Minute to read
  • Dark
    Light
  • PDF

Partition

  • Dark
    Light
  • PDF

Description

The Partition node is used to split a data set randomly, using a predefined seed number. A new 'ROLE' column will be created with the default values 'TRAINING' or 'SCORING' for each row.


Configuration Options

Basic Configuration Options

Setting Description\Parameters
Partition Column Name Name of the new column to be created. Data is partitioned always into two groups. We call those groups the role.
Partition Mode Partition By Fraction splits data into training/scoring records by fraction, i.e. 80% of records become training, and 20% become scoring. Partition By Record Count assigns an exact number of records to training/scoring, i.e., assign the first 10 records to training, or the last 5 records to scoring.
Percent Training Percent of the data set that will have the Training Value assigned to it.
Percent Scoring Percent of the data set that will have the Scoring Value assigned to it.
Training Value Text value to be assigned to the Percent Training fraction.
Scoring Value Text value to be assigned to the Percent Scoring fraction.
Row Assignment Select Training Records to set the exact number of training records; the rest will become scoring records. Select Scoring Records to set the exact number of scoring records; the rest will become training records.
Partition Records Number of records to assign to the selected Row Assignment setting.
Sample Randomly Sample Randomly assigns training/scoring rows at random throughout the data set. Split Time Series Data splits rows into training/scoring while maintaining order.
Random Seed If Static, use the same seed value each time this node is run. If Random, use a new seed value each time.
Seed Value Number used to seed the random split between the scoring and training fraction. Changing this number will change the random distribution.
Split Columns The selected column(s) will be used to split the data set over. Each new subset will be assigned to training/scoring.
Sort By Columns Sort data rows by these columns.

Actions

Action Description
Preview After configuring the node, the combined result set can be previewed by clicking the Preview button.

Was this article helpful?

What's Next