In this lecture, we continue to focus on file organization, but with a different motivation.
This time we look at ways to organize or re-organize files in order to improve performance.
Data Compression:
how to make files smallerReclaiming space
in files that have undergone deletions and updatesSorting Files
in order to support binary searching ==> Internal Sorting A better Sorting Method: KeySortingMary Ames
Alan Mason
123 Maple
90 Eastgate
Stillwater, OK 74075
Ada, OK 74820
Consider this file containing several fixed-length fields, including LastName, State, and Zipcode.
The fixed-length fields structure are good candidates for compression.
Irreversible Compression
is based on the assumption that some information can be sacrificed.Irreversible compression
is also called Entropy Reduction.Example:
Shrinking a raster image from 400-by-400 pixels to 100-by-100 pixels. The new image contains 1 pixel for every 16 pixels in the original imageLempel-Ziv (compress and uncompress)
Principle:
Compression of an arbitrary sequence of bits can be achieved by always coding a series of 0’s and 1’s as some previous such string (the prefix string) plus one new bit. Then the new string formed by adding the new bit to the previously used prefix string becomes a potential prefix string for future strings.Example:
Encode 101011011010101011Answer:
00010000001000110101011110101101Step 1:
Parse the input string into comma separated phrases that represent strings that can be represented by a previous string as a prefix + 1 bitStep 2:
Encode the different phrases (except the last one) using a minimal binary representation. Start with the null phrase.Step 3:
Write the string, listing 1) the code for the prefix phrase + the new bit needed to create the new phrase.Example
Example (cont)