Hand-in-Hand:
File Systems and Hard Drives
by Lee Alexander
General
As
with most components of a computer system there are two aspects:
hardware and software. In order to have a "permanent"
memory, your PC employs a Hard Drive and a File System. We put “permanent"
in quotes because things can happen, even tombstones can wear out.
Where do we get the term Hard Drive? The early non-volatile memory
media were referred to as Floppy Disks, soon to be (if not already)
obsolete. The later versions of floppy disks were 3.5 inch flexible
plastic discs encased in a hard plastic case. By nonvolatile memory
we mean the data remained when the power was removed.
A
hard drive resembles a mini juke box, consisting of multiple platters
and read/write heads analogous to vinyl records and phonograph pickups.
The "hard" platter is typically a highly polished aluminum
disk coated with a very thin magnetic film on both sides. Instead
of a single pickup, there are two read/write heads for each disk,
one for each side. Disks are stacked on a single spindle and all
rotate at the same speed. Rotational speeds vary from 4500 to 15,000
rpm; today's common standard is 7200 rpm.
Formatting
As
a verb, to format has multiple interpretations: in graphics
we can change the format of an image (BMP, JPG, TIFF,
etc.), we can change the look of a document's format (font,
margins, paragraph layout, etc.). Our topic is Hard Drives and that
restricts our formatting to the physical and electronic
properties of the disks.
There
are two levels of formatting: Low Level and High Level. Low Level
formatting defines the tracks and sectors (and thus the cylinders)
of the stacked platters. This is done by the manufacturer (I will
forego the formatting of floppies). The electronics, at this point,
will label bad sectors (nothing is perfect) to avoid writing data
to these areas.
High
Level formatting defines the areas on the disk as to cylinders and
sectors and creates a file allocation table, FAT, to assign addresses
to clusters. A cluster is the smallest unit of the HD that can be
addressed. It is composed of sectors, how many depends upon the
File System. More on that subject, later. Cylinders consist of tracks
stacked on the disks as they rotate in unison. To see these
parameters go to Start | Run and enter msinfo32.exe . In
the left hand pane, expand Components, Storage, and click on Disks.
Communication
Software
The
connection between the HD and the computer is over a bus – ATA (Advanced
Technology Attachment). As the technical aspects of HDs have gone
through generational changes, so has the ATA Bus. ATA also has a
few aliases: IDE (Integrated Drive Electronics) has control features
built into the drive, EIDE (Enhanced IDE), DMA (Direct Memory Access),
or Ultra DMA. The first version, ATA-1, was a 16-bit parallel connection
introduced by Compaq in 1986. It supported two devices on the same
connection and had a transfer (read/write) speed of 4 MBps (Mega
Bytes per second). Jumping ahead, ATA-4 introduced ATAPI (ATA Protocol
Interface) which allowed other devices (CD and DVD drives) to coexist
on the same bus. ATA-5 gave us a rate of 66.6 MBps.
However,
here is a note of warning. For ATA bus connections, if two devices
are on the same bus, the speed will be dictated by the slower device.
Most PCs have two buses and drive controllers. Use this feature
to separate fast hard drives from slower removable media drives.
On
the way to ATA-6, expected to be capable of addressing more than
137 GB at a speed greater than 100 MBps, we get SATA – Serial ATA.
Counter-intuitively, we get higher transfer rates with a single
conductor than with a parallel multi-conductor ribbon cable.
RAID
The
acronym originally stood for Redundant Array of Inexpensive Disks;
then it became Redundant Array of Independent Disks; and how, Redundant
Array of Independent Devices. Once the province of enterprise systems,
this feature has now propagated to home PCs, admittedly highline
units. The basic concept of RAID was twofold: to surmount the expense
of large capacity hard drives and address limitations, and to provide
data protection by mirroring data on separate physical drives. Raid
comes in a variety of configurations:
RAID 0 – “striping” - two or more physical drives are written sequentially.
The array of drives appears as a single large logical disk to the
computer.
RAID 1 – “mirroring” - two physical drives receive the same data,
essentially backing up each other.
RAID 3 - involves error correcting technology, requiring a minimum
of three drives. Two drives are used as in RAID 0, interlacing data
to create a larger volume. The third drive stores CRC (Cyclic Redundancy
Check) checksums. A checksum is a number calculated by an algorithm
that validates a block of data. The beep you hear in a supermarket,
as an item is scanned and using a checksum algorithm, is confirmation
that the bar code was read successfully.
RAID 5- also requires a minimum of three physical drives (although
most systems use five drives). Generally used with large, low-fault-tolerant
databases, it differs from RAID 3 in that all data and error correcting
codes are distributed among all drives.
For
home computers, the most commonly used system is RAID 0 or RAID
1.
There
are two methods for implementing RAID - hardware and software. The
hardware implementation uses a special disk controller with its
own CPU to control traffic. The array of hard drives appears to
the computer as a single logical drive. The drives can be external
to the host computer (in their own case) and connected via a network
connection or special interface card.
In
the case of software RAID, the array is created through special
software. As an application, it occupies memory on the host system,
consumes CPU cycles, and is system-dependent in operation. The host
computer will take a performance hit when using software RAID.
File
Systems
Windows
presently has three File Systems: FAT (actually FAT 16) [File Allocation
Table] using a 16 bit word address scheme; FAT32 using 32 bits;
and NTFS [New Technology File System] a significant improvement
over the other schemes. In the binary machine language of computers,
a 16 bit word – 2 16 – affords 65,536 distinct numbers for
addresses. A 32 bit word – 2 32 – allows for 4,294,967,296 [4 +
billion] addresses.
NTFS
is a very complicated system that removes most of the restrictions
of FAT systems. As a Novice topic, we will try to cover just the
principal advantages of NTFS over FAT. If you crave more details
try www.digit-life.com/articles/ntfs
for an article by Dmitry Mikhailov. The English is a little
fractured but the facts and sense are all there.
FAT
is available for MS-DOS and all versions of Windows, FAT32 is available
for OS's starting with Windows 95 OSR2. NTFS is only available for
Win XP and Win 2000. For Win NT SP4 and later, it is a hit or miss
proposition. The only reason not to us NTFS is for backward compatibility
with older files. The earlier file systems never anticipated today's
HD capacities nor the use to which we would put our PCs storage.
FAT32's limit of a volume size of 2 TB (Terabytes, 1TB=1,000 GB)
was deemed almost limitless. Digital audio, imaging, photography,
and video can consume Gigabytes of storage like army ants at a picnic.
Clusters
A cluster is the smallest
unit area that can serve as a storage pigeon hole. It is comprised
of a minimum number of sectors . The file system and the
HD capacity determine that number in terms of Kilobytes. Sector size
is 512 bytes. DOS and Windows versions prior to NT/2000/XP had a maximum
|