Processing Spreadsheets with Vertical Headers in Laravel

1 year ago August 19, 2022 - 08:08 am

When dealing with clients, especially the bureaucratic kind, you'll inevitably deal with the holy grail of business documents, the spreadsheet. Most of the time, if you're lucky, they just want you to upload and be able to retrieve these spreadsheets through the application. Most likely, however, they want you to process the contents and save it to the database.

Libraries like rap2hpoutre/fast-excel and maatwebsite/excel do the job well in most cases, both having its fair share of pros and cons depending on your use case (i.e you might want to use Fast Excel if your application is somehow processing thousands or even millions of rows on the daily). The latter has served me well in crunching through rows of spreadsheet data.

But what if you encounter a format where the spreadsheet header is in the first column and subsequent columns contain data? No library (that I know of) ever supports this kind of processing right off the bat so manual intervention is needed to be able to process something like this.


For this brief guide, I'll be using the maatwebsite/excel library because of its ability to correctly process this kind of spreadsheet without missing any data.

Before we begin, the code used in this tutorial is available in its entirety along with a sample spreadsheet in a GitLab repository.

Now to start off, we must setup the Import class we'll use to extract the data from the spreadsheet and store into the database.

<?php
namespace App\Imports;

use Maatwebsite\Excel\Concerns\WithBatchInserts;
use Maatwebsite\Excel\Concerns\WithChunkReading;
use Maatwebsite\Excel\Concerns\SkipsOnError;
use Maatwebsite\Excel\Concerns\SkipsErrors;
use Maatwebsite\Excel\Concerns\SkipsFailures;
use Maatwebsite\Excel\Concerns\WithCalculatedFormulas;
use Maatwebsite\Excel\Concerns\WithMultipleSheets;
use Maatwebsite\Excel\Concerns\SkipsEmptyRows;
use Maatwebsite\Excel\Concerns\SkipsUnknownSheets;

class VerticalFormInputSheetImport implements WithBatchInserts, WithChunkReading, SkipsOnError, SkipsEmptyRows, WithCalculatedFormulas, WithMultipleSheets, SkipsUnknownSheets
{
   use SkipsFailures, SkipsErrors;

   public function sheets(): array
   {
       return [
           'USERDATA' => new VerticalFormInputSheetImport(),
       ];
   }

   public function batchSize(): int
   {
       return 250;
   }

   public function chunkSize(): int
   {
       return 250;
   }

   public function onUnknownSheet($sheetName)
   {
       // E.g. you can log that a sheet was not found.
       info("Sheet {$sheetName} was skipped");
   }
}

Now this Import class is pretty straightforward. For this example, we have a spreadsheet with a worksheet name called "USERDATA" and because of the SkipsUnknownSheets Concern, any other sheets present will be ignored. This approach helps with memory consumption and execution time.

Next is to create the function that will process the spreadsheet itself. For this we must create a function called processVerticalLayoutSpreadsheet(). You can change the name of the function of course, but this is what I will go with for this guide. This function will have two parameters.

1. $filepath - This is a string containing the location of a file in the disk

2. $header_column_index - Since there may be content before the header, an example of this is available in the sample file linked in the article, this parameter is needed to indicate to the system where the headers are. The value of this parameter is going to be the basis where the function will start when it goes through each of the spreadsheet's columns.

To start this function, we must load the spreadsheet rows and columns for processing. You can do this through Excel::toArray(). Other ways to import your data are available as well if it doesn't fit your use case. However you wish to approach this, it's crucial that the output is converted to an array for the subsequent steps.

    try{
      $excel_rows = Excel::toArray(new VerticalSheetImport(), $filepath)["USERDATA"];
    } catch(Exception $exception) {
      throw new Exception("Worksheet with the name USERDATA was not found. Please reupload the spreadsheet with this worksheet name where the equipment data is saved.");
    }

With the spreadsheet loaded, we can finally start going through the rows and columns. First off, we must get the contents of the header, which will be the indexes of the array when we get the data from each column after this. If there are any rows before the heading row, it will be removed from the array since it will interfere with later processes extracting the values from each following columns.

   $all_processed_columns = [];
   $heading_column = [];
   $number_of_data = 0;
   //Gets heading row and removes row data before it to avoid conflicting with later processes.
   foreach($excel_rows as $excel_row_key => $excel_row){
      $heading_column[] = str_replace("-", "_", Str::slug($excel_row[$header_column_index]));
      for($counter = 0; $counter <= $header_column_index; $counter++){
         unset($excel_rows[$excel_row_key][$counter]);
      }
      $excel_rows[$excel_row_key] = array_values($excel_rows[$excel_row_key]);
   }

The output of this piece of code would be $heading_column containing all of the header names in your vertical spreadsheet similar to the one below.

^ array:12 [▼
 0 => "first_name"
 1 => "middle_name"
 2 => "last_name"
 3 => "id_no"
 4 => "barangay"
 5 => "citymunicipality"
 6 => "province"
 7 => "street"
 8 => "full_address"
 9 => "civil_status"
 10 => "occupation"
 11 => "date_of_birth"
]

$excel_rows would also have the header column and the columns before it removed so it doesn't conflict when we go through the actual data inside the spreadsheet.

Before removal:

After removal:


Next is compiling all of the data in the spreadsheet and pairing them with the header column so it can be a proper associative array ready to be stored and processed.

   $number_of_data = count($heading_column);
   foreach($excel_rows as $excel_row_key => $excel_row){
      for($counter = 0; $counter < $number_of_data; $counter++){
         if (isset($heading_column[$excel_row_key]) && $heading_column[$excel_row_key] != ""){
            $all_processed_columns[$counter][$heading_column[$excel_row_key]] = $excel_row[$counter];
         }
      }
   }

In this part of the code, we first get the number of items in the $heading_column array. This is because we can assume that each item in the heading column matches the data it represents. So for example, we can assume that "first_name", "middle_name" and "last_name" would always be in the 0th, 1st and 2nd index of the array respectively.

Next, we use $number_of_data to get only the array items we need from $excel_rows. This uses $heading_row as the index as we grab the data from each column in $excel_rows.

^ array:12 [▼
  0 => array:12 [▼
    "first_name" => "Max "
    "middle_name" => "Richard"
    "last_name" => "Simmons"
    "id_no" => 12345
    "barangay" => "Brgy. Marcelo"
    "citymunicipality" => "Paranaque City"
    "province" => "Metro Manila"
    "street" => "Arial St."
    "full_address" => "2/F Brgy Marcelo Arial St. Paranaque City 1550"
    "civil_status" => "Single"
    "occupation" => "Data Analyst"
    "date_of_birth" => 35514
  ]
  1 => array:12 [▼
    "first_name" => "Samantha"
    "middle_name" => "Panganiban"
    "last_name" => "Lim"
    "id_no" => "12215"
    "barangay" => "Shangri-La Plaza Mall"
    "citymunicipality" => "Mandaluyong City"
    "province" => "Metro Manila"
    "street" => "Edsa Corner"
    "full_address" => "2/F Shangri-La Plaza Mall Edsa Corner Shaw Boulevard 1550"
    "civil_status" => "Single"
    "occupation" => "Civil Engineer"
    "date_of_birth" => 34337
  ]
  2 => array:12 [▼
    "first_name" => "Francis"
    "middle_name" => "Arturo"
    "last_name" => "Gimenez"
    "id_no" => "173378"
    "barangay" => "Legaspi Village"
    "citymunicipality" => "Makati City"
    "province" => "Metro Manila"
    "street" => "Salcedo St."
    "full_address" => "5/F Bloomingdale Building 205 Salcedo Street Legaspi Village 1200"
    "civil_status" => "Married"
    "occupation" => "Software Developer"
    "date_of_birth" => 36207
  ]
  3 => array:12 [▶]
  4 => array:12 [▶]
  5 => array:12 [▶]
  6 => array:12 [▶]
  7 => array:12 [▶]
  8 => array:12 [▶]
  9 => array:12 [▶]
  10 => array:12 [▶]
  11 => array:12 [▶]
]

Cleaning up the Data

Now you may have noticed that associative arrays with no data are also included in the final output.

^ array:12 [▼
 0 => array:12 [▼
   "first_name" => "Max "
   "middle_name" => "Richard"
   "last_name" => "Simmons"
   "id_no" => 12345
   "barangay" => "Brgy. Marcelo"
   "citymunicipality" => "Paranaque City"
   "province" => "Metro Manila"
   "street" => "Arial St."
   "full_address" => "2/F Brgy Marcelo Arial St. Paranaque City 1550"
   "civil_status" => "Single"
   "occupation" => "Data Analyst"
   "date_of_birth" => 35514
 ]
 1 => array:12 [▼
   "first_name" => "Samantha"
   "middle_name" => "Panganiban"
   "last_name" => "Lim"
   "id_no" => "12215"
   "barangay" => "Shangri-La Plaza Mall"
   "citymunicipality" => "Mandaluyong City"
   "province" => "Metro Manila"
   "street" => "Edsa Corner"
   "full_address" => "2/F Shangri-La Plaza Mall Edsa Corner Shaw Boulevard 1550"
   "civil_status" => "Single"
   "occupation" => "Civil Engineer"
   "date_of_birth" => 34337
 ]
 2 => array:12 [▼
   "first_name" => "Francis"
   "middle_name" => "Arturo"
   "last_name" => "Gimenez"
   "id_no" => "173378"
   "barangay" => "Legaspi Village"
   "citymunicipality" => "Makati City"
   "province" => "Metro Manila"
   "street" => "Salcedo St."
   "full_address" => "5/F Bloomingdale Building 205 Salcedo Street Legaspi Village 1200"
   "civil_status" => "Married"
   "occupation" => "Software Developer"
   "date_of_birth" => 36207
 ]
 3 => array:12 [▼
   "first_name" => null
   "middle_name" => null
   "last_name" => null
   "id_no" => null
   "barangay" => null
   "citymunicipality" => null
   "province" => null
   "street" => null
   "full_address" => null
   "civil_status" => null
   "occupation" => null
   "date_of_birth" => null
 ]
 4 => array:12 [▼
   "first_name" => null
   "middle_name" => null
   "last_name" => null
   "id_no" => null
   "barangay" => null
   "citymunicipality" => null
   "province" => null
   "street" => null
   "full_address" => null
   "civil_status" => null
   "occupation" => null
   "date_of_birth" => null
 ]
 5 => array:12 [▼
   "first_name" => null
   "middle_name" => null
   "last_name" => null
   "id_no" => null
   "barangay" => null
   "citymunicipality" => null
   "province" => null
   "street" => null
   "full_address" => null
   "civil_status" => null
   "occupation" => null
   "date_of_birth" => null
 ]
 6 => array:12 [▼
   "first_name" => null
   "middle_name" => null
   "last_name" => null
   "id_no" => null
   "barangay" => null
   "citymunicipality" => null
   "province" => null
   "street" => null
   "full_address" => null
   "civil_status" => null
   "occupation" => null
   "date_of_birth" => null
 ]
 7 => array:12 [▼
   "first_name" => null
   "middle_name" => null
   "last_name" => null
   "id_no" => null
   "barangay" => null
   "citymunicipality" => null
   "province" => null
   "street" => null
   "full_address" => null
   "civil_status" => null
   "occupation" => null
   "date_of_birth" => null
 ]
 8 => array:12 [▶]
 9 => array:12 [▶]
 10 => array:12 [▶]
 11 => array:12 [▶]
]

One of the solutions I use for this is using one or more headers as the basis for removing them from the output array. So, for instance, if a column does not contain an ID number in the data, it will be removed from the array through an unset() call.

   //Filters out all of the empty columns to get only the relevant data.
   foreach($all_processed_columns as $key => $column){
      if (!isset($column[Str::slug("ID_No", "_")])){
         unset($all_processed_columns[$key]);
      }
   }
   $all_processed_columns = array_values($all_processed_columns);
  return $all_processed_columns;

You can replace $column[Str::slug("ID_No", "_")] with an array of strings and iterate through it, but for this tutorial we'll keep it simple with a single string. Once you've got that added in, you're done! You now have a function that can process and save a spreadsheet with vertical headers.

^ array:3 [▼
 0 => array:12 [▼
   "first_name" => "Max "
   "middle_name" => "Richard"
   "last_name" => "Simmons"
   "id_no" => 12345
   "barangay" => "Brgy. Marcelo"
   "citymunicipality" => "Paranaque City"
   "province" => "Metro Manila"
   "street" => "Arial St."
   "full_address" => "2/F Brgy Marcelo Arial St. Paranaque City 1550"
   "civil_status" => "Single"
   "occupation" => "Data Analyst"
   "date_of_birth" => 35514
 ]
 1 => array:12 [▼
   "first_name" => "Samantha"
   "middle_name" => "Panganiban"
   "last_name" => "Lim"
   "id_no" => "12215"
   "barangay" => "Shangri-La Plaza Mall"
   "citymunicipality" => "Mandaluyong City"
   "province" => "Metro Manila"
   "street" => "Edsa Corner"
   "full_address" => "2/F Shangri-La Plaza Mall Edsa Corner Shaw Boulevard 1550"
   "civil_status" => "Single"
   "occupation" => "Civil Engineer"
   "date_of_birth" => 34337
 ]
 2 => array:12 [▼
   "first_name" => "Francis"
   "middle_name" => "Arturo"
   "last_name" => "Gimenez"
   "id_no" => "173378"
   "barangay" => "Legaspi Village"
   "citymunicipality" => "Makati City"
   "province" => "Metro Manila"
   "street" => "Salcedo St."
   "full_address" => "5/F Bloomingdale Building 205 Salcedo Street Legaspi Village 1200"
   "civil_status" => "Married"
   "occupation" => "Software Developer"
   "date_of_birth" => 36207
 ]
]

Another method is a simple array_map() function call. This could remove all of the empty arrays present in $all_processed_columns.

   $all_processed_columns = array_filter(array_map(function($column){
      if (!empty(array_filter($column))){
         return $column;
      }
   }, $all_processed_columns));

Simple as that.

If you have any ideas how to improve this function, do email me. This function is not 100% resource efficient and still has a lot of room for improvement.