Tuesday, 1 May 2012

Upload very large files - Part 1

In my last post I talked about settings we have to set server side on IIS in order to allow large uploads, and I also mentioned the security risks involved. The solution I’ll talk this time is becoming more common as HTML5 is being implemented more and more.

The main problem was related to the size of the post when uploading, but why not split file into several small pieces and upload piece by piece? Maybe some time ago was crazy to think about manipulating the file in client side from JavaScript, the only ways were using Flash or Silverlight, now the file API is more popular, almost all modern browsers implement it but IE which is always retarded implementing good stuff.

I’ve found some implementations such as plupload or http://hertzen.com/experiments/jsupload/ but I’m not 100 % satisfied with it, I took the ideas and hands on. In this first part I’ll explain how it works to achieve a bullet proof mechanism, for example: The connection might be interrupted or something can happen on the server and the worker thread was shot down. The mechanism must be protected against these events.

I talked to a friend of mine and after a session of brainstorming I decided to imitate some kind of TCP connection in regard to sending packet and receiving the ACK (Acknowledgement), this stuff. The idea is implemented as follows:

  • The browser opens the file dialog and a file is selected, the initial packet is prepared with file name, the type (MIME) and the size.
  • In the server a new GUID is generated for this file that will be the file id for the whole process.
  • A new file is created with this id that will be the metadata file, where the information from the client will be stored for further use.
  • A new blank file is created with this id and the extension of the original name that will be the data file.
  • The reply for this initial packet is a JSON with the GUID generated.

The following steps in client side are:

  • Read the file fragment according to the current packet number and send to the server the data packet with the id, the current packet number and the total packets.
  • Each data packet sent receives its ACK which indicates the number of the last packet received OK, after that the client will send the last received + 1 unless this is the last one.

The server process the data packet:

  • If the metadata file or actual file doesn’t exist, the result is an ACK with packet -1, which means the file must be uploaded from scratch.
  • If the file doesn’t have the required size: the packet number – 1 multiplied by packet size, then return the required value in order to the client knows which is the correct fragment to resend.
  • If everything goes well, a success ACK with the current packet number is sent encouraging to the client to send the next packet.

I’ve implemented this using HTML5 techniques but not all the browsers are capable for to do it, Modernizr could be a great helper on this, since there’s no HTML5 detection but feature detection on demand, for now it’s necessary the browser to be capable of use the file API and if you want to support retry after refresh the page, it should be capable of access to local Storage which allows to store data and retrieve it when necessary. The browser will able to use the XMLHttpRequest object for upload files.

I haven’t implemented yet any fallback mechanism, it will be nice to have a third party such as Silverlight which could substitute the missing browser HTML5 features, a good example of implementation using Silverlight can be found at HSS.Interlink. they cover all the aspects on read a fragment of file and send the request to the server.

My implementation I am planning to distribute it via nuget.org containing a js script and a dll with the classes to be called from any handler/controller without to have to deal with all the verifications mentioned above.

In next posts I’ll cover the implementation of this solution and I’ll provide a code sample for to understand it better.