How are EOF markers handled in files

Binary files



Hex editor

Earlier we wrote texts, i.e. strings, in files. Usually strings contain only ASCII characters, that is, characters that contain numbers between 32 and 127. What is actually written to the file are these numerical values. You can easily check this with a so-called Hex editor. We will need it several times in the following for checking.

A hex editor is an editor that does not display the numerical values ​​of a file as characters (which is only useful for text files), but as numerical values. And since it usually does this in the hexadecimal system, it is called the Hex Editor. You will often find such an editor (or lister) in conjunction with convenient editors or file managers. I recommend downloading Totalcommander, a not entirely free, but very cheap and powerful file manager. You don't have to pay it right away; it only reminds you of this each time the program is started. When you open it, you have two directories in front of you. If you go to a file and then press "F3", you will get a view of the file in a lister. And if you go to "options" and "hexadecimal" in the menu, you will see something that looks like this:

The address (byte position) in the file is given on the far left in front of the colon. Each line in the display is 16 bytes. The numerical values ​​of the 16 bytes are given in the middle of each line. On the far right you can see a character representation of the bytes.

In this way, we can then "look into" a file and check what we have done if the file cannot be read with a text editor. By the end of this chapter we will be able to write our own hex editor!

First binary attempts

Real binary files not only contain bytes between 32 and 127, but all values ​​between 0 and 255. Even with the means you have learned so far, it should not be a problem to write any bytes of this type. With the CHR $ function we can convert any number between 0 and 255 into a char, a logical (even if perhaps illegible) "character". The program accordingly contains

OPEN"a.dat" FOROUTPUTAS #1 FOR i = 0 TO 255 FOR j = 0 TO 255 PRINT #1, CHR$ (j); NEXT j NEXT i CLOSE 1

no big surprises either. We simply write all bytes between 0 and 255 one after the other in a file 256 times. The semicolon at the end of the PRINT statement prevents a return sequence (Hex 0D0A) from slipping in between.

Did it work? Check it out with the Hex Editor!

Looks pretty good.

We want to read it in again to check it. But how? Sure, we open a file for reading ("FOR INPUT AS ..."). But with which instruction do we read the bytes? INPUT? Then we'd need something like lines. But there is no such thing here.

Here we get to know a new function: INPUT $. Syntax:

INPUT $ (n [, [#] file number%])

nThe number of characters (bytes) to read.
File number The number of an open file. If file number is omitted, INPUT $ reads from the keyboard.

With the INPUT $ function we can simply read in a fixed number of bytes. INPUT $ (200,1) means: "Give me the next 200 characters from file 1." Ideal for our purpose.

Now think of a program with which you can read in all 64K that we have just written out!

But don't fret if you stumble upon the message "Enter at end of file"! But read on right here:

Let's look at this variant:

OPEN"a.dat" FORINPUTAS #1 OPEN"b.dat" FOROUTPUTAS #2 FOR i = 0 TO 255 a $ = INPUT$(256, 1) FOR j = 0 TO 255 k1 $ = MID $ (a $, j + 1, 1) k = ASC(k1 $) PRINT # 2, i, j, k NEXT j NEXT i

The runtime error message described above hails with INPUT $. What does that mean? Well, it's very simple: Each file has a special byte at the end, which marks that the file has ended here. We already got to know the EOF () function. It doesn't do anything other than pay attention to whether this particular byte has just been read. If so, it returns TRUE. We therefore call the byte EOF byte. This is the value Hex 1A, Dec 26. If the execution of INPUT $ now reads out the 27th byte, then it recognizes the EOF marker. INPUT $ gives the instruction to read out further bytes, i.e. BEHIND the EOF character. And of course that is forbidden. Unless we can say to the program: "Listen, dear program, there is NO EOF character! There are no special control characters at all. Forget it!" And we do that by opening the file in BINARY mode:

OPEN "a.dat" FOR BINARY AS # 1

And then it works and we can get a flawless list in the b.dat file!

Binary files - what for?

One example is graphic files. We can not only create graphics on the screen, but also write them to a file. The advantage of this is that we are not limited in the size and color depth of the file. Do you want to create a 3000x3000 dot fractal? No problem! Write the corresponding binary data in a graphic file and then view this file with an image viewer.

Of course, you have to know the special format of the graphic file. There are different formats, depending on the type of compression. The simplest format is of course the one without compression, the bmp format. I wanted to deal with this in this chapter for demo purposes as well, but haven't gotten around to it yet. Never mind: We'll deal with it in detail in Part III.

Random access to binary files

"Random Access" does not mean "random access", but "random access", one could also say: "Targeted access". It is the opposite of sequential access, in which only what is currently being read is currently read. With sequential access you have to think of the file like a tape. There is a file pointer. This plays the role of the sound head. When reading or writing, a small piece is always read, then the "sound head", i.e. the file pointer, moves forward. We can exert the only influence on the position of the file pointer, the "sound head", by closing and reopening the file, thereby moving the file pointer to the beginning of the file.

With Random Access it looks different: Here we can move the file pointer to every byte in the file. So we can basically access the file in the same way as we can access the main memory. All we have to do is say at which address we want to read a byte from the file or to which address we want to write a byte in the file. And we have to be extremely careful that the address is not outside the file, i.e. before the beginning or after the end of the file!

Create the swap file

If we want to access a file byte by byte, we have to think of the whole thing as accessing a working memory. And in order to be able to access such a memory, it must first physically exist: We have to create the file first. The best way to do this is with the well-known OPEN FOR OUTPUT command. Then we write a number of bytes. And close the file again. This file now represents our "working memory". Such an outsourced memory is called "swap" memory or simply as "swap".

Access to the swap file

To access the swap file byte by byte, we use the GET and PUT commands and the BINARY: OPEN access mode

With GET it is the same, only that

So, now a small example program:

OPEN"R: \ a.txt" FOROUTPUTAS #1 FOR i = 0 TO 255 a $ = CHR$ (i) PRINT # 1, a $; NEXT i CLOSE (1) OPEN"R: \ a.txt" FORBINARYAS #1 FOR i = 1 TO 25 j = i * 10 GET # 1, j + 1, a $ PRINT j, ASC(a $) NEXT i CLOSE (1)

As before, this little program writes 256 bytes to the swap file. Then this is opened binary and we now read out a byte from every 10th address.

What can you use swap files for?

In modern software development, swap files are no longer as important as they used to be, as it is easily possible to keep many hundreds of megabytes in main memory. But they still exist: in measurement data processing and statistics or in video and audio processing.

Application example 1: Maps

QBasic itself only allows the use of up to 160K main memory and each variable, i.e. each array or each string can only be a maximum of 64K in size. In games, however, a lot of graphics are often needed, especially so-called "Maps", maps on which the player can move. The player only ever sees a section, which he then moves over the actual map. The map itself can contain many thousands by many thousands of pixels. It is now possible without any problems to save this map in a swap file and use GET and PUT to always load the exact image section into main memory that the player is currently seeing. We don't want to pursue this any further here, but the following variant of the above sample program shows how you can create and use a 4 MB swap file with QBASIC:

DIM i ASLONGOPEN"R: \ a.txt" FOROUTPUTAS #1 FOR i & = 0 TO 4 * 1024 & * 1024-1 a $ = CHR$(RND(255)) PRINT # 1, a $; NEXT i CLOSE (1) OPEN"R: \ a.txt" FORBINARYAS #1 FOR i = 1 TO 25 j = i * 10 GET # 1, j + 1, a $ PRINT j, ASC(a $) NEXT i CLOSE (1)

The special thing about the little program is the first line with the "DIM" and the "&" signs behind the numbers in the third line. Well, both of these ensure that not only 2 bytes of memory are provided for the variable i, but 4 bytes, which then allows a number range from 0 to around 4 billion (4 GB) and not just 0 to 65000, i.e. 0 to 64K . We will come back to these "types" and "declarations" in more detail later.

Application example 2 / exercise: Text files with random access

We had already tackled the problem earlier: When we encountered "fast line management" in the previous chapter: How do I write an editor that can quickly insert another line between the 2000th and 2001th line? And don't have to move megabytes of data in the process? The answer was: we need an index that contains pointers to each row. Back then, the pointers were the indexes of an array. Now we can use the byte position in a file as a pointer. We collect the pointers in a separate array, the index. Here we can look up at which byte position line 0, 1, 7 or 134 begins. The following graphic shows the scheme.

In this way we can easily write an editor that manages texts as large as the main memory! Get to work!