Clean and Aggregate Multiple Text Files
  • 23 Dec 2022
  • 1 Minute to read
  • Dark
    Light

Clean and Aggregate Multiple Text Files

  • Dark
    Light

Article summary

Goal: You're a sales manager tracking sales for a few stores. You receive multiple daily sales files every morning. Automate the aggregation of sales data at the total store level.


High-level Task: Combine the DailySales_Store1.csv, DailySales_Store2.csv and DailySales_Store3.csv files. Group by the "SKU" field to calculate sums for the "Quantity" and "Revenue" fields. Export to flat file.


Detailed Task:

  1. Create a new flow.
  2. Add a Flat File source node to the canvas. Open the DailySales_Store1.csv file.
  3. Under the Data Type Assignment tab, make sure the "SKU" field is the String data type.
  4. Repeat step 2 and 3 for the DailySales_Store2.csv and DailySales_Store3.csv files.
  5. Add a Filter node to the canvas. Attach the DailySales_Store3.csv flat file source node to it.
  6. Filter out the "Description" field to remove the unnecessary field.
  7. Add a Union node to the canvas. Attach the Filter node as well as the other flat file source nodes to it.
  8. Configure the node to perform a union between all three nodes.
  9. Add an Aggregate node to the canvas. Attach the Union node to it.
  10. Configure the node to aggregate on the "SKU" field, and calculate the aggregate SUM function on the other two fields (Quantity and Revenue).
  11. Attach this node to a Flat File export node. Configure to export to `SalesOutput.csv'.

Was this article helpful?

What's Next