Partition
- 06 Jan 2023
- 1 Minute to read
- Print
- DarkLight
Partition
- Updated on 06 Jan 2023
- 1 Minute to read
- Print
- DarkLight
Article summary
Did you find this summary helpful?
Thank you for your feedback
Description
The Partition node is used to split a data set randomly, using a predefined seed number. A new 'ROLE' column will be created with the default values 'TRAINING' or 'SCORING' for each row.
Configuration Options
Basic Configuration Options
Setting | Description\Parameters |
---|---|
Partition Column Name | Name of the new column to be created. Data is partitioned always into two groups. We call those groups the role. |
Partition Mode | Partition By Fraction splits data into training/scoring records by fraction, i.e. 80% of records become training, and 20% become scoring. Partition By Record Count assigns an exact number of records to training/scoring, i.e., assign the first 10 records to training, or the last 5 records to scoring. |
Percent Training | Percent of the data set that will have the Training Value assigned to it. |
Percent Scoring | Percent of the data set that will have the Scoring Value assigned to it. |
Training Value | Text value to be assigned to the Percent Training fraction. |
Scoring Value | Text value to be assigned to the Percent Scoring fraction. |
Row Assignment | Select Training Records to set the exact number of training records; the rest will become scoring records. Select Scoring Records to set the exact number of scoring records; the rest will become training records. |
Partition Records | Number of records to assign to the selected Row Assignment setting. |
Sample Randomly | Sample Randomly assigns training/scoring rows at random throughout the data set. Split Time Series Data splits rows into training/scoring while maintaining order. |
Random Seed | If Static , use the same seed value each time this node is run. If Random , use a new seed value each time. |
Seed Value | Number used to seed the random split between the scoring and training fraction. Changing this number will change the random distribution. |
Split Columns | The selected column(s) will be used to split the data set over. Each new subset will be assigned to training/scoring. |
Sort By Columns | Sort data rows by these columns. |
Actions
Action | Description |
---|---|
Preview | After configuring the node, the combined result set can be previewed by clicking the Preview button. |
Was this article helpful?