CSV files are well-liked for their ease of use and compatibility with a wide array of software. They provide a straightforward and human-readable format for storing and sharing structured data, making them a favored choice for data exchange and manipulation across diverse fields, ranging from business and finance to scientific research and beyond. If you find yourself dealing with a substantial number of CSV files that require analysis, employing scripting services is a wise approach. Google Apps Script is one such service that enables you to seamlessly combine CSV files according to your specific criteria.
In this blog post, I will demonstrate how to effectively merge CSV files using Google Apps Script. Furthermore, I will provide insights into various techniques for managing CSV files prior to their combination, helping you achieve your desired results.
Table of Contents
Video tutorial
If you’d prefer written instructions with additional tips, just keep reading.
Storing the CSV files for Use in Apps Script
The optimal location for storing CSV files for use with Google Apps Script is in Google Drive. This is because Google Apps Script offers built-in support for accessing files within Google Drive, simplifying the process of reading and writing CSV files, and eliminating the need to concern yourself with the underlying file system.
After you’ve written the code, you only need to modify the folder ID to interact with files stored in a different folder. This simplifies working with multiple folders, and you can even extend the code to read files from other folders without manual ID adjustments
To start with the tutorial, upload your CSV files to a folder in your Google Drive.
How to combine CSV files in Google Drive using Google Apps Script
As mentioned earlier, accessing files stored in Google Drive through Google Apps Script is straightforward. To work with CSV files, you should begin by creating an array of CSV files and assigning it to a variable. This variable can then be used to access the files for various tasks.
Access the CSV files in Google Drive with Google Apps Script
You can utilize the following code to access CSV files in Google Drive. This code retrieves the folder, which contains the CSV files from Google Drive using the DriveApp service and the folder object is stored in the variable folder
. The folder is identified by the folder ID (The folder ID is a unique identifier for a Google Drive folder, which can be found in the folder’s URL after the “folders/” segment).
Then it retrieves a list of files from the folder
object that have the mime type MimeType.CSV
. Mime types are used to identify the type of data that is contained in a file. CSV files have the mime type text/csv
. The list of files is stored in the files
variable.
You can use this files
variable is used later in the combineCsv()
function to process the CSV files.
Combine CSV files with Google Apps Script
Once you have created the files
variable containing the list of CSV files, you need to iterate through each file in the variable and get its content. Then, combine the content of each CSV file into a single array. Finally, you can create a new CSV file with the combined data.
The following code combines all the CSV files in the designated Google Drive folder and generates a new file called “Combined.csv” within that same folder.
Lines 2 and 3 of this code were explained in the previous section.
The combinedData
is an empty array to store the combined data from all the CSV files.
The the while
loop is used to iterate through each file in the folder (files
variable). Then, for each file, reads the contents of the file as a string using file.getBlob().getDataAsString()
.
The string data is then parsed into an array of arrays (csvData
) using Utilities.parseCsv()
. The array of arrays is then concatenated with the combinedData
array using combinedData.concat(csvData)
.
Finally, the combined data is written to a new CSV file called “Combined.csv” in the same folder using folder.createFile()
.
Prevent the column headers from being repeated in the merged CSV file
The provided code appends the complete content of the CSV files. If there’s a header row, it will be repeated in the merged CSV file.
To avoid this, you can adjust the code as follows to prevent the header row from being repeated when merging the CSV files
In the above code, we introduce an additional variable named firstFile
and initialize it to TRUE
. The code then iterates through the files
object, parsing the CSV data from each file and appending it to a combined data array. However, unlike the previous approach, this method first checks whether it’s processing the first file. In the first iteration, since firstFile
is true, it adds all rows, including the header row, to the combined data array and it sets firstFile
to FALSE
. Then from the next iteration, it removes the header row before adding the remaining data to the combined array. This approach effectively ensures that the headers are only included once in the merged CSV file.
How to ensure all CSV files have the same columns before combining them?
If you are unsure whether the columns in your CSV files are the same, you can use the following method to ensure it.
First, you need to create a separate CSV file by extracting the header row of each CSV file. When creating this header list, you need to include the corresponding file name of each header that was taken in an additional column.
You can utilize the following code to extract the header row from each CSV file and create a new CSV file consisting of header rows.
You can open this file in Excel or Google Sheets. Once opened, you can compare the headers for any discrepancies with ease by using sorting or filtering options. In case there are any discrepancies, you can easily identify the relevant file as the corresponding file name is available in the first column.
How to use the above codes to combine CSV files
If you’re new to Google Apps Script, using the codes above might be confusing. In that situation, start by logging into your Google Account. After that, make a copy of the Google Apps Script file from the link provided below.
The above Apps Script contains all the functions given above. To run these functions,
- Create a folder in Google Drive and upload your CSV files to it.
- Copy the folder ID as explained previously, and then replace the current folder ID in the code for each function. Then, save the file.
- Now, click the downward arrow located next to the selected function name on the top navigation bar in the Apps Script Editor, and then select the function that you need to execute.
- Afterward, click on the “Run” button. If you are running this code for the first time, you will be prompted to grant authorization to run the function on your account.
- In the pop-up box, click the Review permission button.
- In the next dialog box, click on your email address
- Click the “Advanced” link
- Click on the link at the bottom with the name of your Apps Script project.
- Then click “Allow”
- Now, if you check the Google Drive folder, you can see the new “Combined.csv” file.
Wrapping UP
Combining CSV files with Google Apps Script is a straightforward and efficient method for streamlining data analysis when dealing with multiple CSV files. By utilizing the provided code snippets and following the step-by-step instructions, you can seamlessly merge CSV files, prevent header row duplication, and ensure consistent columns, eliminating the need for manual manipulation. Whether you’re a data analyst, researcher, or simply handling large amounts of data, this technique empowers you to manage and organize CSV files effectively.