Partition

Published on Jan 6, 2023

1 minute(s) read

Description

The Partition node is used to split a data set randomly, using a predefined seed number. A new 'ROLE' column will be created with the default values 'TRAINING' or 'SCORING' for each row.

Configuration Options

Basic Configuration Options

Setting	Description\Parameters
`Partition Column Name`	Name of the new column to be created. Data is partitioned always into two groups. We call those groups the role.
`Partition Mode`	`Partition By Fraction` splits data into training/scoring records by fraction, i.e. 80% of records become training, and 20% become scoring. `Partition By Record Count` assigns an exact number of records to training/scoring, i.e., assign the first 10 records to training, or the last 5 records to scoring.
`Percent Training`	Percent of the data set that will have the `Training Value` assigned to it.
`Percent Scoring`	Percent of the data set that will have the `Scoring Value` assigned to it.
`Training Value`	Text value to be assigned to the `Percent Training` fraction.
`Scoring Value`	Text value to be assigned to the `Percent Scoring` fraction.
`Row Assignment`	Select `Training Records` to set the exact number of training records; the rest will become scoring records. Select `Scoring Records` to set the exact number of scoring records; the rest will become training records.
`Partition Records`	Number of records to assign to the selected `Row Assignment` setting.
`Sample Randomly`	`Sample Randomly` assigns training/scoring rows at random throughout the data set. `Split Time Series Data` splits rows into training/scoring while maintaining order.
`Random Seed`	If `Static`, use the same seed value each time this node is run. If `Random`, use a new seed value each time.
`Seed Value`	Number used to seed the random split between the scoring and training fraction. Changing this number will change the random distribution.
`Split Columns`	The selected column(s) will be used to split the data set over. Each new subset will be assigned to training/scoring.
`Sort By Columns`	Sort data rows by these columns.

Actions

Action	Description
`Preview`	After configuring the node, the combined result set can be previewed by clicking the `Preview` button.

Was this article helpful?