An index
lets you impose order on a file without rearranging the file
Data file:
It is not a requirement for the record to be in orderIndex file:
The record must be in orderA simple index
is simply an array of (key, reference) pairs which is ordered by the field of the file.Reference:
Stores the address of the recordIndexing
gives us keyed access to variable-length record files
Index
is called an access path
on the fieldThe index
file occupies less disk blocks than
the data file because its entries are much smaller
dense
or sparse
A dense index
: has an index entry for every search key value
(every record) in the data file.A Sparse index
: has an index entry for only some search
key value (some records) in the data file.Primary index
, Clustering index
, Secondary index
The fields within each record
are also of variable-length
but are separated by delimiters
primary key
by concatenating the record company label code and the record’s ID number. This should form a unique identifier
rapid keyed access
, we build a simple index
with a key field
associated with a reference field
Reference field
provides the address of the first byte
of the corresponding data record.entry sequenced
: the record occurs in the order they are entered in the file.The index
is easier to use than the data file
becausefixed-length records
smaller than the data file
carry more information than
the key and reference fields
(e.g., we could keep the length of each data file record in the index as well)index is small enough
to be held in memory.Create
the original empty index and data file.Load
the index into memory before using it
Rewrite
the index file from memory after using it
Add
records to the data file and indexDelete
records from the data fileUpdate
records in the data fileas an array of records
The loading
into memory can be done sequentially
, reading a large number of index records
(which are short) at once
if the index changes
, but its rewriting does not take place or takes place
incompletely?Use a mechanism
for indicating whether or not the index is out of date.reconstructs the index from the data file
in case it is out of date.data file and the index
should be updated
In the data file
, the record can be added anywhere
However, the byte-offset
of the new record should be saved
shift all the records
that belong after the one we are inserting to open up space for the new record. However, this operation is not too costly
as it is performed in memoryThe index record
corresponding to the data record
being deleted must also be deleted.If the update changes the value of the key field
, then both the index and data file may need to be reordered.If the update does not affect the key field
, then the index does not need reordering, but the data file may.If Indexes are too large to hold in memory
, thenBinary searching requires several seeks
rather than being performed at memory speed.Index rearrangement requires shifting or sorting records
on secondary storage ==> Extremely time consuming.Solutions:
You should
hashed
organizationtree-structured
index (e.g., a B-Tree)Given the following data file Employee (NAME, SSN, Address, JOB, ………………). Suppose that: record size = 150 bytes
, Block size = 512 bytes
, Number of records in the file = 30000 Find number of file blocks
?
For an index on the SSN field. Assume the SSN field size is 9 bytes
, and the record pointer size is 7 bytes
Primary Index | Clustering Index | Secondary Index |
---|---|---|
Ordered file | Ordered file | Ordered file a secondary means of accessing a file. |
Data file is ordered on a key field (distinct value for each record) | Data file is ordered on a non-key field (no distinct value for each record) | Data file is ordered maybe on (1 - candidate key has a unique value), (2 - a non-key with duplicate values) |
One index entry for each disk block. key field value is the first record in the block, which is called the block anchor. | One index entry for each distinct value of the field. The index entry points to the first data block that contains records with that field value. | The index is an ordered file with two fields: (1 - field value.), (2 - it is either a block pointer or a record pointer.) |
Non dense (sparse) index | Non dense (sparse) index | If key -> dense — If non key -> dense or sparse index |