What is a file?
A file is a suite of binary information, i.e. a suite from 0 to 1. This file may be stored in order to keep a record of this information. A text file is a file comprising characters stored in the form of octets.
This file is saved on the hard drive in the form "name_of_file.ext".
".ext" represents the extension; it is a way of recognizing the type of programme needed to open this file (note: this does not guarantee the file type: when the extension is changed the file type is not changed!).
The length of the name and the extension may vary depending on the operating system:
- 8 characters for the name and 3 for the extension in DOS and Windows 3.1
- 256 characters for the name and the extension in Windows 95, 98 and NT
- 256 in Unix systems
Therefore, in DOS or Windows 3.1, a file from Windows 9x will have a truncated name comprising the first 6 characters of the name followed by ~x where x represents a figure which is increased each time a file carries the same name. This means that if a file name "file~1" already exists, the following will be named "file~2".
Furthermore, a file contains a header which stores additional information such as the file type and in particular the size. It also contains a character at the end of the file meaning that the information after this character no longer belongs to the same file.
You may be asking what is the point of putting the file size in the file header when we already know the size?
Here are two examples that demonstrate the point of this
You have no doubt downloaded a file on the Internet only to see the browser freeze or the server hosting the file cut off communication.
If the file is a text file you will be missing just the end of the text, but if the file is a binary file (an executable program for example), running it could well be dangerous since information is missing. The operating system thus compares its actual size with the size indicated in the header to check validity of the file. This is generally referred to as integrity. In reality this check is performed using a more powerful algorithm known as CRC (cyclical redundancy check).
When a file is infected with a virus, the virus adds lines of code. Therefore the file size information in the header will no longer be accurate (unless the virus is programmed to change the header) and can be identified.