Blob vs file system storage

Images can be stored as a database blob or in the file system. How do you decide what to choose? What are the performance impacts of each one?

Well, there are several reasons why you should not store binary data in your database:

  • The whole point of storing data in a SQL database, is to put some kind of ordering and structure on your data, as well as being able to search on these data. But how do you search in the binary data of a picture?
  • For large data sets, storing binary data will quickly run up the size of your database files, making it harder to control the size of the database.
  • In order to store binary data in the database, you must continually escape and unescape the data to ensure that nothing breaks.
  • Storing images on the file system has a marginally faster retrieval rate.

Now here are some reasons why you should:

  • There is one good reason why you might want to store the binary data in the database: Replication. Storing images in a database allows for all of your data to be central stored which is more portable, and easy to replicate.

Here's one solution that takes into account the points above:

Store a link (e.g. a file path) to the image file in the database. Whenever you need the image, use the link in whatever program you use to retrieve the file containing the image.

Or you could think of storing your images in the database to gain the benefits there (preferable for smaller images and for limited images), but also use file system caching of these to obtain the performance benefits.

Some tips for getting the best performance out of the file system:

  • Limit the number of images in any one directory.
  • Include not only an image identifier in the filename, but also a secret code.

Comments:

Exactly!

We did that a 2 years ago with a virtual voucher that would hold many pdf files. We also created directory structures / file names soo we could rebuild the database or at least understand the storing if it was lost/damaged/or not needed anymore. That was done by generating files names and directorys with unix timestamps and other classification information. This is indeed the best of two worlds. And i can add that it also helps to have the files hosted on a separate file server then the database server. Leaves the server free for the real database stuff instead of feeding binary content.

Posted by IpNextGen on March 20, 2008 at 04:04 AM PDT #

Another benefit to the database approach is transactions. If you're writing an image to a file system via NFS or FTP, and the server gets rebooted (for example) you won't get a chance to clean up the partially written file. But if you're writing to a database when it gets rebooted, the database should automatically rollback the transaction that didn't complete.

Posted by Dean on March 20, 2008 at 04:09 AM PDT #

Another plus for the DB is that you don't have to mess with server permissions as you would if you're were writing to a server share. Also, if you've got a proxy/load balancer storing on a file system would get messy very fast and is just generally not a very elegant or scalable approach IMO.

Posted by Sanjay on March 20, 2008 at 04:20 AM PDT #

The benefit you mentioned, with reguards to replication, is mute.

Since the image is stored on the file system and not in the database, it is independent of the database completely. As long as the application has access to the file through a share or NAS, it will be able to retrieve the file regardless of which database server it is hitting in the cluster.

Furthermore having the image stored in the database will increase the size of the database causing replication to take long.

There only are no speed or replication benefits for storing images or binary data in a database.

Posted by tony petruzzi on March 20, 2008 at 05:23 AM PDT #

I'm sorry...a secret code?? Ok, so I've added a column to my `images` table which encrypts "sorry for the inconvenience" in a randomized, 256 bit block cipher. Now what?

Also, limiting the number of files per directory is only an optimization on Windows and certain outdated Linux file systems. If you're using one of these for your "high performance" storage, you've got bigger problems than some academic distinction between files and BLOBs.

One of your reasons for not storing images in BLOBs is right on: DB bloat. Once you open that can of worms, your database will get very unwieldy very fast. Don't get me wrong, there are reasons to use BLOBs, but one must be very discriminating.

Oh, P.S. If you're using JDBC, you shouldn't have to escape anything (hint: java.sql.PreparedStatement).

Posted by Daniel Spiewak on March 20, 2008 at 01:40 PM PDT #

but what if the db and the soft consuming it are not on the same physical system ?
I am to store a lot bunch of binary data, but I can't decide how. If I store links, I will have to have a folder shared across all my services, so they access the same data, and the links are relevant.
I fear I'd face to many issues. But if I store BLOB, the DB is likely to explode.

and so is my weak brain.

Posted by brice laurencin on March 20, 2008 at 05:49 PM PDT #

Er, backup?

You will have to backup db and pictures. Furthermore, if you got a db and a replication db, you could be up (in case the server broke down) in minutes switching the db. If you use filesystem you would need a replicated db plus a replicated filesystem of the pictures.

Nevertheless, storing many pictures in the filesystem will have OS limitations. You need to split the load in more and more directories to guarantee that the system will not slow down (the "inode-problem").

And most db's do have special and faster treatment for BLOBs nowadays. I'd prefer to store all information in one place.

Just my 0,02$, Georgi

Posted by Georgi on March 20, 2008 at 11:24 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

manveen

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today