Partition
- 13 Jan 2023
- 1 Minute to read
- Print
- DarkLight
Partition
- Updated on 13 Jan 2023
- 1 Minute to read
- Print
- DarkLight
Article summary
Did you find this summary helpful?
Thank you for your feedback
Description
The Partition node splits a data set randomly, using a predefined seed number. A new 'ROLE' column is created with the default values 'TRAINING' or 'SCORING' for each row.
Configuration Options
Basic Configuration Options
Setting | Description\Parameters |
---|---|
Partition Column Name | Name of the new column to be created. Data is partitioned into two groups called and called the Role. |
Partition Mode | Partition By Fraction splits data into training/scoring records by fraction, i.e., 80% of records become training, and 20% become scoring. Partition By Record Count assigns an exact number of records to training/scoring, i.e., assign the first ten records to training, or the last five records to scoring. |
Percent Training | Percent of the data set that has the Training Value assigned to it. |
Percent Scoring | Percent of the data set that has the Scoring Value assigned to it. |
Training Value | Text value assigned to the Percent Training fraction. |
Scoring Value | Text value assigned to the Percent Scoring fraction. |
Row Assignment | Select Training Records to set the exact number of training records; the rest become scoring records. Select Scoring Records to set the exact number of scoring records; the rest become training records. |
Partition Records | Number of records assigned the selected Row Assignment setting. |
Sample Randomly | Sample Randomly assigns training/scoring rows at random throughout the data set. Split Time Series Data splits rows into training/scoring while maintaining order. |
Random Seed | If Static , use the same seed value each time this node is run. If Random , use a new seed value each time. |
Seed Value | Number seeds the random split between the scoring and training fraction. Changing this number alters the random distribution. |
Split Columns | The selected column(s) will be used to split the data set over. Each new subset is assigned to training/scoring. |
Sort By Columns | Sort data rows by these columns. |
Actions
Action | Description |
---|---|
Preview | Once the node is configured, the combined result set can be previewed at any time. |
Was this article helpful?