By user13366078 on Mar 25, 2009
No, this is not going to be another "Remember to do snapshots" post. I'm also not going to talk about backups. Instead, let's look at some very practical aspects of deleting files.
So, why delete a file? "Trivial", you think, "so I can save space!". Sure, dear reader, but at the expense of what?
Let's stop and think for a minute. Our lives try to center around doing cool, worthwhile, meaningful, useful stuff. Deleting files isn't really cool, nor fun, it is a necessity we're forced to do. Don't you hate it when that dreaded "Your startup disk is almost full" message appears while you're in the middle of downloading new photos from your latest exciting vacation trip?
Actually, the seemingly simple act of deleting is really a challenge: "Will I need this again?", "Wouldn't it be better to archive this instead?", "Last time I was really glad I kept that email from 2 years ago, so why delete this one?". Sometimes I surprise myself thinking a long time before I really press that "ok" button or hit "Enter" after the "rm".
The reality is: Storage is cheap, so why delete stuff in the first place?
To put things in perspective, let's try an ROI analysis of deleting files. Let's say we need about 6 seconds of thinking time before we can decide whether a particular file can really be deleted without regret. Let's also assign some value to our time, say $12 per hour (I hope you're getting paid much more than that, but this is just to keep the numbers simple).
Storage is cheap, and last time I checked, a 1 TB USB hard drive cost about $100 at a major electronics retailer, with prices falling by the hour.
Now, how much space does the act of deleting a file need to free up so it justifies the effort of deciding whether to delete or keep it?
Well, our $12 per hour conveniently breaks down to $0.20 per minute, which allows us to perform 10 delete-it-or-not decisions per minute at $0.02 each. Fine. Deleting seems to be cheap, doesn't it?
Now, for that $0.02 you can buy a 1/5000th of a 1 TB hard drive. Wait a minute, 1TB/5000 still amounts to 200 MB of data per $0.02! That's more than you need to store a 10 minute video, or a full CD of music, compressed at high quality! Or 20 presentations at 10MB each! Not to mention countless emails, source code and other files!
So, unless the file you're pondering is bigger than 200MB, it's not really worth even considering to delete it. I'll call this 200MB boundary the "Destructive Utility Heuristic (DUH)".
The result is therefore: Save your time, buy more harddisk space (or upgrade your old hard drive to a bigger one before it dies) and move on. Life's too precious to waste it on deleting stuff. Create good stuff instead! Only think about deleting stuff if the file in question is bigger than 200MB.
I can hear some "Wait, but!"'s in the audience, ok, one at a time:
"But I can delete much faster than 6 seconds!"
No big deal. So you can delete 1 file per second, that's still a threshold of 33MB, more than 5 songs worth or even the biggest practical business presentation or the source code to a major open source project. And harddisks are getting cheaper every day, while your time will become more and more precious as you age. Yes, if you're dead sure that file is useless junk and don't need to think about it, go ahead and delete it, but why did you save it in the first place?
"But I like my directories to be clean and tidy!"
Congratulations, that's a good habit! Keeping files organized doesn't mean you need to delete stuff, though. Set up an "Archive" folder somewhere and dump everything you think you may or may not use again there. Use one archive folder for each year if you want. File search technology is pretty advanced these days so you should be able to find your archived files quicker than the time you'd take to decide which ones you'll never want to find again. Then, you can still decide to delete your whole archive from 3 years ago because you never used it, and it will likely make some sense, because its size may be above the destructive utility heuristic, but chances are you won't really care because storage will have become even cheaper after those 3 years so you won't save a big deal, relatively speaking.
"That still doesn't help me when that damn 'Your startup disk is almost full' message comes!"
You're right. The point is: It's often hard to sift through data and decide what to keep and what not. That's why we dread deleting stuff and instead wait until that message comes. I'm only offering relief to those that felt that the act of having to delete stuff isn't really rewarding, and it isn't (at least while you're below the DUH). Go buy a bigger harddrive for your laptop, it's really the cost effective option. Use the numbers above to help you justify that towards your finance department.
"I'm still not convinced. I actually kinda like going through my files and delete them once in a while..."
Sure, go ahead. Just know that you could use that time to do more productive stuff, such as checking out the Sun cloud, installing OpenSolaris or testing our new Sun OpenStorage products.
"Wait, aren't you supposed to write about OpenSolaris, ZFS and this stuff anyway?"
I'm glad you mentioned that :). Actually, OpenSolaris and ZFS make it even easier for you to both not care about deleting stuff while keeping your files organized at the same time. The amazing ZFS auto snapshot SMF service will create snapshots of your data automagically every 15 minutes, so it won't matter whether you delete files or not. You can then choose to either not delete them at all and just move them to some archive, or you can delete whatever you want, without the 6 seconds of thinking (just to keep stuff tidy), knowing that you'll always be able to recover those files with Time Slider later. You could then use zfs send/receive to dump your data incrementally to a file server as a backup mechanism and the hooks are already there to automate this.
See, once you think of it, there's not really a need to delete files at all any more. At least not for mere mortals like us with file sizes that are typically below the destructive utility heuristic of currently 200MB (and rising...) most of the time. Music has already reached the point where a song can be stored at studio quality with lossless compression at manageable file sizes so that kind of data won't see significant growth any more. And photos and videos will soon follow. This means we'll need to care less and less about restricting personal data storage. Instead, we now need to focus more on managing personal storage.
Now there's a completely different problem that'll keep us entertained for some time...