During autumn 2002 file servers for students at the Norwegian university NTNU began to fill. Due to several file servers being down and new solutions not ready for use, disk capacity was more limited than normal. Quick analysis showed that while our normal diskquota-enforcement «Period deletion» were disabled in holidays, many students had filled up their home areas with large amounts of «unnecessary» files, such as mp3s and dvd-rips of Lord of the Rings. We needed a more effective way to control disk usage. Some users even showed up in our logs as repeating disk abusers, filling up several GB even after receiving warnings from us. This is where our thoughts started forming daddyQ.
NTNU's existing solution to control disk usage for students were a system called «Periodeslett», or «Period deletion». It is a perl script, written and maintained by several students working in our computer department through several years. Period deletion worked by twice a month checking disk usage for each user. A general quota (at the time only 60 MB) is set, along with a list of exceptions for users with valid claims for extended quotas (such as graduate students writing their thesis). Such extensions were given manually by editing a configuration file.
If a user was found to be over his quota on the dates the check was run (known to the public, 1. and 15. each month), they were sent warning emails, giving 5 days to clean up their areas. After 5 days, the check run again, and files were selected for deletion in a largest-file-first-fashion until users were below their quota. The user had no way to know which files would be deleted except for a similar list generated in his warning, but if he had recently received many emails containing attachment his INBOX could suddenly be the largest file and selected for deletion, even if it had not been listed in the initial warning.
This solution seemed nice and simple, and users only needed to be 'afraid' of the deletion process twice a month, on given dates. There were little administration except for manual recovery of 'deleted' files. (in fact, we moved the files to hidden directories for a period before actual deletion). However, since all users were checked at the same day, twice a month system administrators got possibly several hundreds of such requests from regretting users.
This solution had many disadvantages. One of them is that only checking on given dates makes it possible for «advanced» users to clean up before these dates, and otherwise fill up several GB without getting noticed. It may sound that this might not be a problem as long as there is enough disk space available, but it will give a false impression that more disk usage is needed in total, pushing forward large investement. In addition, backup costs increases, especially in cases with students doing «file exchange work», as files change every night, forcing more and more backup.
Not only did many users «live good» by knowing how period deletion worked, but normal users suffered. If you were unfortunate you would be over the quota only the day checked, by receiving large amounts of spam email. In many cases your largest file would be INBOX, that is, the novice user looses all his emails. (novice users don't seperate e-amil into several folders, might not delete large amounts of old spam and attachments, and certainly does not use maildir. One might argue that these users deserve to get their INBOX deleted, but not more often than disk abusers.)
One quick-fix for this problem was to skip INBOX in the file selection process until all other means were tried. This had the unfortunate effect that for users that were over their quota simply because of a very large INBOX, had ALL their files removed, and even then they were not under their quota, so their INBOX was deleted as well. Users experienced total blank home directories, with no configuration files, address books or anything.
Period deletion did create a lot of extra work. All users getting their INBOX deleted even though they only were slightly over quota naturally requested to get their email back, resulting in a lot of manual work. Many users requested extended quota to avoid all this hazzle, again manual work. Some times, file servers BECAME FIlled to the rim inbetween the normal deletion, and system administrator needed to run manual deletion on disk abuser after analyzing logs to find out who might be a proper candidate.
On this basis, Stian Søiland and Dyre Meen began thinking of a new way to control disk usage. During the autumn 2002 most principles were formed, and the system began to find it's shape around christmas 2002. (During the winter not much development were done, though, as our employer found projects more important at the time.)
We quickly decided that the quality of the old «Period deletion» was to poor to continue working on, beeing a large set of patches and not beeing flexible enough for our new thoughts. We concluded that we needed to write a completely new solution, giving extended freedom of choice.
The idea we found most appealing was to introduce some measurement of total all-time overuse for each user. We were inspired by the electricity pricing constantly rising, and created a concept megabyte-days. 500 MBD means a user has been over the quota with 1 MB in 500 days, 500 MB in 1 day, or anything inbetween. We could then use this measurement as a qualifier for performing deletion on a user or not.
In practice, we check every users total disk usage each day. (A user's disk usage is the size of the files in his personal areas, such as his home directory and web area. We ignore the ownership of files.) If a user is above his quota, we add the over-use to his grand total of over-use. To be kind and 'forgettng', we reduce the existing grand total with an amount each day, say by multiplying it with 0.95.
If a users grand total is over a given limit, like 5 GB-days, we start our warning-deletion process in the same style as with the old «Period deletion», but we reduce the number of days between warning and deletion based on how large the grand total is. This makes a tough world for people with excessive disk usage or a previous history of being an excessive disk user. A user previously beeing a «large fish» will be in constant watch, just tipping over his quota with a few MB will immediately trigger a deletion warning. This is the equivalent of real life police keeping an extra eye on «old relatives».
Since not all people are disk abusers, but merely overruns their quota now and then, we also keep a record of every time a user is above their quota. Normally no acton is performed even though they are over their quota, but if a user has more than 10 such records in the last 30 days, he will also get a deletion warning. However, he will not have such a short warning period as disk abusers.
We also wanted a new way to select files for deletion, to avoid deleting email when not necessary. We got help from our colleague Steinar Hamre, who came up with a pretty smart algorithm we will refer to as the bucket method.
Instead of thinking of which files to delete, Hamre proposed to consider which files to keep. First we split our file list (in our implementation represented as tupples of filesize and filenames) into different buckets, according to priorities. Each bucket is then sorted, smallest files at top. We'll then pick files we want to keep. We start with the bucket with highest priority, picking the smallest file and constantly counting the total size of files chosen. If chosing the smallest file from the bucket would overrun the user's quota, we cannot pick anything from this bucket, and move on to the bucket with lower priority were we repeat our algorithm.
Example: We have a quota of 50 MB, and sort files into different buckets. In our 'high' bucket our selection process has placed important documents and some mail boxes. We start with the smallest files, and we decide to keep 'thesis.doc' (3 MB) and 'INBOX' (30 MB). However, the file 'sent-mail' (42 MB) will not fit into our quota at this time and cannot be selected. We skip to our next bucket finding 'labreport.doc' (5 MB) and 'exercise1.pdf' (5 MB). In this bucket we also find a file 'britney.mp3' (8 MB), which we cannot include.
The simplified example illustrates how removal of 'important' files does not necessary mean removal of less important files. We got to keep both our lab report and our exercise. «Period deletion» would in this example considered the total use as 93 MB, and to get to 50 MB we need to delete 43 MB. Selecting from the largest file first, we would have deleted both 'sent-mail' and 'INBOX', but keeping 'britney.mp3'.
We wanted a simple day-to-day operation for the system administrators, and for the users. By not running deletion on fixed dates we avoid large amount of requests for restore and extended quota at the same time, such requests will be independent of each other. In addition we wanted to simplify the processing of these requests.
With daddyQ, users will log into a 'customer center', where they can check their current disk usage and status (such as their grand total), and press a button to apply for extended quota or retrieval of deleted files. Similary, for the system administrators, the web system will present the requests giving the administrator the option of accepting or declining. Uppon acceptance, a user's quota is extended or files are moved back automagically. This even makes it possible for 1. line support to handle such requst, as they don't require special system rights (ie. root login).