Tuesday 13 March 2012

Handling Excel spreadsheets XLS and XLSX with dot NET

More than once in my projects the format for input and/or output is Excel spreadsheet either using Excel 97-2003 (.xls) or Excel 2007 OOXML (.xlsx). The open source community contains several libraries, for legacy files the most famous is NPOI, which in fact is the .NET version of POI Java project at http://poi.apache.org/. The result about modern OOXML were more diverse, ExcelPackage was the first but this project was abandoned late 2010. However this code wasn’t to the trash, EPPlus is the evolved version based on it and strongly optimized by the new team.
If we want to use indiscriminately the two formats, it’s necessary to have updated the two libraries and since they are developed by two different teams, they have two completely different interfaces and ways to be used. Despite the fact it’s open source and the hard work is done, it’s a mess to deal with this difference in our code.
Start writing if-else statement alongside the code is not an option! This will get us to Spaghetti code anti-pattern and the consequent difficulties for maintaining the resulting software. So the design patterns to the rescue. The quick solution for the immediate needs, it’s necessary a common interface for basic operation with Excel files and we have two different implementations. The adapter pattern (aka wrapper) is the best match. The interfaces created are:
public interface IExcelHandler : IDisposable
{
    void LoadExelFile(string fileName);
    IExcelSheet GetSheet(int index);
    void Save();
    void Save(string filename);
}

public interface IExcelSheet
{
    string GetCellValue(int row, string column);
    void SetCellValue(int row, string column, object value);
}
 
The xml comments were removed for brevity but the methods’ names are self-explanatory (or at least I think so), the only thing might be confusing is the column parameter which is defined as string because it’s intended to be invoked with the actual Excel column name, such as A, B, etc. The operations defined above were enough in order to solve my needs; I know this might be extended for more complex scenarios but this is a first version and I don’t like to make over engineering or as we say in my town: killing mosquitos with cannons.
Having the common interface and all the implementations is not enough for a reusable component and abstract the client from the concrete implementation, how is to be created the instances of each implementation according to the file format? Design patterns again, the Factory Method may be one solution, in fact I made a simplified version using an interface that defines a Create method who is the responsible for create the concrete IExcelHandler instance.
public interface IExcelHandlerFactory
{
    IExcelHandler Create(string filename);
}
 
This interface might be injected via your favorite IoC Container and linked to the default implementation I provide in the class ExcelHandlerFactory or if you aren’t using dependency injection it can be used as a Singleton through the static property Instance.
As NPOI and EPPlus are already packages in the NuGet Gallery, I’ve done a package with these codes in order to contribute to the community and to be easier to bootstrap when handling Excel files. The package is available on NuGet feed at https://nuget.org/packages/ExcelHandler, just execute the following command at NuGet Package Manager Console:
Install-Package ExcelHandler
This will add automatically as dependencies the NPOI and EPPlus packages with all the corresponding references in the project. An example on how to use it is as follows:
Console.WriteLine("Opening and modifying a file with Excel 2007+ format...");
using (var excelHandler = ExcelHandlerFactory.Instance.Create(@"..\..\modern_file.xlsx"))
{
    var sheet = excelHandler.GetSheet(1);
    Console.WriteLine(sheet.GetCellValue(1, "A"));
    sheet.SetCellValue(1, "A", "Test value in A1 cell");
    excelHandler.Save();
}
Console.WriteLine("... done.");
Console.WriteLine("Opening and modifying a file with Excel 97-2003 format...");
using (var excelHandler = ExcelHandlerFactory.Instance.Create(@"..\..\legacy_file.xls"))
{
    var sheet = excelHandler.GetSheet(1);
    Console.WriteLine(sheet.GetCellValue(1, "A"));
    sheet.SetCellValue(1, "A", "Test value in A1 cell");
    excelHandler.Save();
}
Console.WriteLine("... done."); 
 
This example just opens two Excel files, read the content from the cell A1 and set some text in the same cell. Since the interface IExcelHandler inherit from IDisposable it can be used in a using statement in order to simplifying the resource disposing. The source code is available at https://bitbucket.org/abelperezok/excelhandler.

2 comments:

  1. There is another .NET Library for Excel which you can use to manage your excel files. It offers many features, its not free but offers free trial and its worth trying.

    ReplyDelete
  2. Thank you for bringing up this library, I'll take a closer look, my goal with this idea was to unify the way to access Excel file content without being concerned about the actual Excel version for the file.

    ReplyDelete