Why
does an Operating System need a File System and what does it do?
While
you are working on a document, an audio file, or a photo the File
System is dormant (unless you have instructed the application you
are working in to automatically make a backup copy). All your activity
is going on in RAM (Random Access Memory) which is volatile, meaning
your work will disappear when you shut down the computer. It could
be that it is what you want – you were just goofing. More likely,
you would like to save those hours of creativity and retrieve the
work at a future date. In the case of “hard copy,” the document
would be in a notebook or on sheets of paper that you would put
in a folder in your file cabinet. That's just what a File System
does – it organizes and manages your work so you can get at it later
without having to reenter all that data.
The
File System can save your work as – you guessed it – a file on non-volatile
media. Files can be of many types: document, music, movie, graphic,
etc. The Operating System and the File System work together so that
when you open your work file, if it is a registered filetype
, it knows the appropriate application to bring up. It is your
PC's Post Office – but a very superior one. If the USPS tears your
letter up, you have little chance of recovery. If your mail box
overflows and letters blow away, tough. The efficient File System
adds information that assures file recovery.
The
non-volatile media mentioned can be of many types: the hard drive,
a floppy, a CD (Compact Disk), a ZIP drive, a flash memory card,
etc. For this article, we will concentrate on the hard drive. Okay,
but what's a Hard Drive ? It is a recording device comprised
of platters and read/write heads; think of a miniature jukebox.
The platters are stacked on a spindle and spin at a high speed.
This high speed spinning creates a laminar air flow over the platters
and supports the read/write heads. Never move or jostle your PC
while the HD is reading or writing to the disks, this could cause
permanent damage as a head crash . The musical jukebox
handles one platter at a time and reads a continuous spiral of data
via a needle tracking a groove in the vinyl LP record. If our HDs
behaved the same way we would soon fill them and take an inordinately
long time to write or read our files. As HDs grew in capacities,
sizes unimagined when first introduced, technological advances required
a different scheme of organization. IBM, in 1956, had the RAMAC
305 (Random Access Method of Accounting and Control). It stored
5 MB on 50 24-inch disks, cost about $50,000, and was as large as
two refrigerators.
The
stacked platters are formatted as cylinders – a collection of circular
tracks, on each side of the individual platters, at a fixed radial
distance, like tubes within tubes. They are consecutively numbered
from the edge to the hub. These cylinders are further divided into
sectors – think of a slice of a many layered cake. The sectors are
combined to form a cluster. A cluster is the smallest unit of the
HD's memory, a quantum of storage. Each cluster can contain only
one file but a large file can occupy more than one cluster. Enter
the File System – it fills the clusters, records their address,
and appends other noteworthy information to the directory; other
info such as date and time of creation, modification, attribute,
and (most importantly) where the various fragments of a file that
did not fit in a single cluster are located.
There
are three Windows file systems, developed historically as Hard Drive
capacities grew. FAT/FAT16 (File Allocation Table 16 bits) can handle
drives up to 4 GB, FAT32 - over 32 GB, and NTFS (New Technology
File System). The major parameter to the user is the cluster
size . The table below shows the required cluster size for
a hard drive's (or partition's) capacity. In the specification for
FAT16 only 16 bits are allowed for the cluster address. With 2 16
= 65,536 available addresses (clusters), the size of the cluster
is thus determined by the drive capacity. As we move to FAT32 we
get about 4.3 billion addresses and cluster size for this file system
can be reduced for the larger drives. The NTFS is the most efficient
in reducing slack space ; the unused portion of a cluster.
FAT/FAT16
FAT32 NTFS
Partition
Size |
Cluster
Size |
Partition
Size |
Cluster
Size |
Partition
Size |
Cluster
Size |
127
MB |
2
KB |
8
GB |
4
KB |
<
= 0.5 GB |
512
B |
255
MB |
4
KB |
16
GB |
8
KB |
1
GB |
1
KB |
511
MB |
8
KB |
32
GB |
16
KB |
2
GB |
2
KB |
1
GB |
16
KB |
>32
GB |
32
KB |
>2
GB |
4
KB |
2
GB |
32
KB |
|
|
|
|
4
GB |
64
KB |
|
|
|
|
Consider
COOKIES – most are only 1 KB in size yet, on a 20 GB hard drive
(which is considered small in today's latest computers) a cookie
would waste 15 KB of storage in the FAT32 File System, but only
3 KB in the NTFS system. When you consider it does not take a lot
of Web surfing to pick up tens of cookies, you can see how much
of a memory hog that can be with the older file system. It is also
a good reason to dump those cookies, periodically.
The
NTFS File System
Key
elements of the NTFS system are designed to be more secure in an
enterprise. Home users benefit with better reliability and flexibility
in a home network. Granting access to individual files allows family
members to share and yet keep unauthorized eyes off sensitive files.
The file organization stores file attributes within a Master File
Table (MFT). These metadata files include marking bad clusters on
the hard drive, a Cluster Allocation Bitmap to apportion space as
needed, and an MFT Mirror file. The mirror file is a copy of the
first 16 MFT files as a backup. The MFT is on the first 12% of the
drive and the 16 sys op files are at the beginning of that space.
The MFT Mirror is in the center of the other 88%. With Win XP, Microsoft
added a Quota Table to NTFS. It gives you control over the size
a directory can occupy. Handy when a family member tries to make
a giant jukebox out of the family PC.
FRAGMENTATION
When
you start out with a fresh new drive, whether it is a brand new
device or a recently created partition, there is plenty of room
(typically) for many, many files. A run-of-the-mill Word document
will average maybe 50 KB, not much memory space when you consider
Gigabytes (1 GB = 1,000,000 KB). Now consider the cluster size (let's
use the NTFS system) of 4 KB. Fifty divided by four = 12.5 – thus
we need 13 clusters to store the document. Changing your mind about
the whole thing, you delete the file and proceed to write a new
document. Your “deleted document space” is available but will not
be used if there are brand new clusters available. Deleting says
the tenant no longer is in residence but does not move the furnishings
out. Eventually you would fill the new clusters and the file system
would look for those “empty” clusters. With our example of a 20
GB drive and considering only Word documents and the like, it would
take a considerable amount of time to get to that point. However,
consider how we use our computers lately. Digital photos, digital
movies, and digital music – we are no longer talking KB sized files.
It is common for these types to be MB and, in the case of movies,
even GB in size.
We
can solve limited storage by “off loading” files – burn them to
CDs and DVDs and delete them from the drives. Back to the analogy
of an apartment building – we then have newly available space but
not like when the building was first available for occupancy. Some
apartments are now occupied. A new tenant with a family of ten will
not fit in a two bedroom flat; they will have to rent more than
one address within the building – Apartment Number. It is not likely
that the apartments will be adjacent (contiguous for memory space
on a drive). The family will have to fragment itself into
multiple domiciles.
Can
you say defragmenter ?
Defragging
is the reading and re-writing of data on the drive to sort files
into contiguous space. You must have enough room on the drive to
temporarily assemble pieces of files into a holding place until
they can be written continuously. The file system is the source
of the address data and its subsequent update to the re-arranged
locations of the files. The oft-asked question is when and how often
should I defrag? Today's high speed CPUs, fast rotating drives,
extended RAM at reasonable prices all serve to alleviate the problems
of fragmentation – it is highly unlikely that you would notice a
difference in performance between before and after a defrag, with
some notable exceptions – audio, video, and photo editing. Unless
you have an unusually large cache of RAM – a Gigabyte or more –
there is a good chance that the editing programs are going to make
use of swap files (Win XP calls it a paging file ) to store
vast amounts of data including multiple Undoes, info on layers,
etc. If you had not done so recently, it might be advisable to run
a defrag program prior to a session of large file editing.
The
Future File System
In
case you think mastering the NTFS file system will be the last word
– here's a wakeup. Waiting in the wings to be introduced with Longhorn,
the next generation of Windows OS, is WinFS. WinFS is set to put
an entirely new look on Windows file management with a user interface
similar to Google. It is being built atop NTFS with additional functions.
At Comdex in November, 2003, a concept called Implicit Query was
introduced. As you composed a document, IQ would search your files
for key phrases as you type them in. Examples were a search of your
email for From: Joe when you enter “Joe” in the To: line of a new
message, read your Calendar for any appointments with Joe, and connect
to the Web if you enter words that could be part of a URL – like
an organization, geographic entity, or retail company.
The
data model for WinFS is more complex using types with
properties, fields, and relationships. A person type could
contain fields for name and address. The relationships
are to be of a relational database nature.
As
we have seen in the past, when it comes to PCs – there is nothing
more constant than change.
|