File Handling : Sequential Files


                                             SEQUENTIAL FILE HANDLING
File Handling:
Ø  COBOL is generally used to process Record based files.
Ø  In Record Based Files following is the meaning of the terms used:
o   File:  Collection of one or more occurrences (instances) of a record type (template). A file will have several Records.
o   Record: Collection of logically related fields which record information about a particular object. A record will have several fields describing a same object.
o   Field: It’s the unit of information about a particular object.
Ø  Understand that there is difference between the occurrence of record (i.e. the values of a record) and record template  (i.e. the structure of the record).
Each record occurrence in a file will have a different value but every record in the file will have the same structure.
Ø  There are two record based organizations of files:
1)         Sequential Files (Also called as Serial) Files.
2)       Direct Access Files.

How is File processed by a program?

A record based file may contain millions of records. To process the data should be loaded into RAM.
It’s not possible to process the entire file in one go. Hence the file is read one record at a time and processed one record at a time.
However the program does not know how big the record is and what are the fields in the record. Hence it’s the programmer’s responsibility to declare the record structure to the program. The computer uses the programmer's description of the record to set aside sufficient memory to store one instance of the record. This memory space set aside for each file is called “record buffer” For each file being processed by the program there should be a similar record buffer defined.

A record buffer is capable of storing the data recorded for only one instance of the record. To process a file a program must read the records one at a time into the record buffer and then process it.

Thus moral is “record buffer” acts connection between the program and file and its programmers responsibility to declare it in the program.

Ø  While reading a file each record instance is copied (read) from the file, into the record buffer and then processed.
Ø  While writing a file each record is placed into the record buffer and then written to the file from there.
Ø  While transferring a record from an input file to an output file we must read the record into the input record buffer, transfer it to the output record buffer and then write the data to the output file from the output record buffer.

How does a programmer define the file structure to the Program?

The Record buffer for every file used in program is described in the FILE SECTION by making use of FD(File Descriptor) entry . The FD entry contains the letters FD followed by the internal name that programmer assigns. This internal name is assigned by the FILE-CONTROL paragraph of the ENVIRONMENT DIVISION.

This internal filename is assigned using the SELECT…ASSIGN clause. The internal file name (This name is not actual name of file. This name of the file is used only inside the cobol program) used in a file's FD entry is connected to an external file (Actual physical file on disk or tape) by means of the SELECT and ASSIGN clause. The SELECT and ASSIGN clause is an entry in the FILE-CONTROL paragraph in the INPUT-OUTPUT SECTION in the ENVIRONMENT DIVISION.

Suppose we have a file with below type of data in our input file.
---------------------------------------
0000001mahadik Su03031988AX2BM
0000002kalamkaDa03041876AS2B
.
.
.
---------------------------------------
Following table will give a clear idea of how data is arranged in the file.

Fields
Digit/character count
Example
Student id
7
0000001
Name
10
mahadik Su
Date of birth
8
3031988
course code
4
AX2B
Gender
1
M

For the data shown we can create a record structure as below:

01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName       PIC X(10).
   02 DateOfBirth       PIC 9(8).
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

The record description above is correct as far as it goes. It reserves the correct amount of storage for the record buffer. However its restricted as we can’t access Month, day of year individually.

To allow us to access these fields individually we need to declare the record as follows:

01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName.
      03 Surname        PIC X(8).
      03 Initials       PIC XX.
   02 DateOfBirth.
      03 YOBirth        PIC 9(4).
      03 MOBirth        PIC 99.
      03 DOBirth        PIC 99.
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

Now that we have decided on the structure of the record buffer following is how we code the FD entry and the SELECT ASSIGN clause.

ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
   SELECT StudentdataFile ASSIGN TO “STUDENTS.TXT”.
DATA DIVISION.
FILE SECTION.
FD StudentdataFile.
01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName.
      03 Surname        PIC X(8).
      03 Initials       PIC XX.
   02 DateOfBirth.
      03 YOBirth        PIC 9(4).
      03 MOBirth        PIC 99.
      03 DOBirth        PIC 99.
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

In the above STUDENTS.TXT is the actual physical file name and “StudentdataFile” is the internal file name and the cobol program will be referring to the file throughout.
Also note that the FD entry uses the internal file name that we created.

Moral is “FD entry is used to describe the program the structure of the record buffer. The Program uses this structure do reserve the space in memory and also understand what fields the file contains. The SELECT…ASSIGN clause does the task of connecting the record buffer with the actual physical file”


How to Code the SELECT… ASSIGN clause?

Ø  This is defined in the FILE-CONTROL paragraph of INPUT-OUTPUT SECTION of environment division.

Ø  Syntax:
SELECT Filename ASSIGN TO ExternalFile
ORGANIZATION is RECORD/LINE SEQUENTIAL

Ø  This assigns the internal “Filename” to the actual file “ExternalFile”.
Ø  On mainframe platform the ExternalFile is the actual DDNAME that we specify in the JCL. Note that’s it’s the DDNAME that we put in the JCL and not the actual dataset name.

Ø  On windows and unix platform the file “ExternalFile” is assumed on the same directory from where the program is run . Else we can also provide the fully qualified path in the SELECT…ASSIGN clause.

Ø  Note the ORGANIZATION clause. This is used to indicate the organization of the file being accessed.
If the file is SERIAL(sequential) we use ORGANIZATION as SEQUENTIAL.
If the file in INDEXED then we use INDEXED and if file is RELATIVE then we use RELATIVE.

Ø  Advantage of assigning internal file name is that  programs more readable and more easy to maintain. If the location of the file, or the medium on which the file is held, changes then the only change we need to make to our program, is to change the entry in the SELECT and ASSIGN clause.

Ø  ON windows and unix platform we have two  types of sequential organization
1.         Line Sequential:  This is used in case of normal files where each record is followed by a  the carriage return and line feed characters.
Example:
---------------------------------------
0000001mahadik Su03031988AX2BM
0000002kalamkaDa03041876AS2BF
.
.
.
---------------------------------------

2.       Record Sequential: This is used when the file contains stream of bytes. The records are one after another. There is no carriage return in between records. Only the fact that we know the record length helps us identify the record.
                    Example:
                    ---------------------------------------
                    0000001mahadik Su03031988AX2BM0000002kalamkaDa03041876AS2B...
                    ---------------------------------------

Ø  Another clause that is generally used with SELECT…ASSIGN clause is the “ACCESS MODE IS SEQUENTIAL/RANDOM/DYNAMIC”. However with sequential files we can only use the ACCESS MODE as SEQUENTIAL which is also the Default and hence we can skip writing it

What after coding the FD entry and SELECT…ASSIGN clause? How to I actually work with records in the files?

To access file and data inside them we make use of the verbs that are provided by COBOL language.
To write programs that process Sequential Files we use following verbs - the OPEN, CLOSE, READ, WRITE, REWRITE.


OPEN verb:
Before we start working with files , we need to open the file.
The file can be opened in any of the following mode: INPUT/OUTPUT/EXTEND/I-O
Opening a file does not transfer any data to the record buffer, it simply provides access.

Syntax:
OPEN INPUT/OUTPUT/EXTEND/I-O Filename.

INPUT: To read from the file we need to use INPUT mode. When a file is opened for INPUT, the Next Record Pointer is positioned at the beginning of the file.

WRITE: To write into the file we need to use the OUTPUT mode. If records already exists in the file then the file will be overwritten.

EXTEND: To continue appending data to existing file we need to use the EXTEND mode.
When the file is opened for EXTEND, the Next Record Pointer is positioned after the last record in the file

I-O: To update (Read and then write it back) , we need to use the WRITE mode.


CLOSE verb:
Every file that we open using any of the modes must be closed.
The Syntax for closing the file is simple.
CLOSE followed by the filename.

Example: CLOSE Filename.

READ verb:
Once the file is opened in INPUT mode we can read the records using READ verb, one record at a time.
The READ copies a record occurrence/instance from the file and places it in the record buffer.(defined using FD) and then we can access it.

Syntax for READ for sequential files:
READ filename [INTO Identifier]
AT END StatementBlock
END-READ.

Ø  When the READ reaches the end of the file , AT END is triggered and the StatementBlock following the AT END is executed. This happens when the READ tries to read after the last record.

Ø  INTO Identifier clause, causes the data to be read into the record buffer and then copied from there, to the Identifier, in one operation. When this is used there are two copies of data. One in the Record buffer(FD) and another in the identifier. This is equivalent to READ the contents using READ verb and then moving it to the identifier.

Ø  When reading Sequential files we should specify the “AT END” clause to indicates what has to be done after the file is done reading.

Ø  Also note the END-READ delimiter, This indicates where the READ command ends.
    Do not put a period at the any of the statements in the “AT END” clause. IF we put a period it then it will assume that period as END of the READ verb.

Ø  Also note that READ verb uses the Filename to write and not the record name.

WRITE verb to perform write on sequential file:

Ø  The WRITE verb is used to copy data from the record buffer (RAM) to the file.
Ø  Syntax:
                    WRITE RecordName [FROM Identifier]

Ø  To WRITE data to a file we must move the data to the record buffer (declared in the FD entry) and then WRITE the contents of record buffer to the file.

Ø  FROM is used the data contained in the Identifier is copied into the record buffer and is then written to the file. The WRITE..FROM is the equivalent of a MOVEIdentifier TO RecordBuffer statement followed by a WRITE RecordBuffer statement.

Ø  Note that we WRITE a record and Read a FILE. The reason we read a file but write a record, is that a file can contain a number of different types of record.
    When we read a record from the transaction file we don't know which of the types will be supplied; so we must - READ Filename. It is the programmers responsibility to discover what type of record has been supplied.
    When we write a record to the a file we have to specify which of the record types we want to write; so we must - WRITE RecordName.

Ø  If file is opened in OUTPUT the write operation overwrites the existing file.
If the file is opened in EXTEND mode then the write adds to the existing file.

REWRITE verb for rewriting sequential file:
This is used to REWRITE a record. This is used to update the record.
For this we need to read the RECORD using READ verb then change the contents of the record and the perform the REWRITE operation to update the record.
For this we need to open the file in I-O mode.

Synatax: REWRITE RecordName [FROM identifier]

Note that with sequential files to rewrite a record its is first necessary to read the records, This will set the Next record pointer to correct position and then we can execute the REWRITE command.
The process will be slightly different when using the Direct access files.


Example Program:
Write a program to READ sequentially from a file A, Write sequentially to file B using OUTPUT mode.
Write sequentially to File C that is already populated using EXTEND mode.
Then rewrite records in FILE A to change all “A” in the File to “B”.
[PENDING]


###############################################################



More on Sequential file processing:

The records in a Sequential file are organized in a serial manner, however records could be ordered or unordered.
Ø  Ordered sequential file means the records are stored in the sequence of some key value.
Ø  Unordered sequential file means that records are not arranged in any sequence.
Note that in ordered sequential file it’s not the system that does anything to maintain the sequence and the integrity of the file. It’s the programmer’s responsibility to maintain the sequence.

The way records are organized (ordered or unordered) puts lot of affect on way the records are processed.

Processing Unordered Sequential files:

Ø  In unordered sequential files data can be added to the end of the file by using the EXTEND mode.
Inserting records to unordered sequential files is very easy as they can be added to the file without having to maintain any sequence.
                    Example: Add data from file A to unordered File B using EXTEND mode
                    [Pending ]

Ø  But deleting and updating records in the sequential files is quite difficult. Updates can be still done by making use of REWRITE verb, however it’s not possible to directly delete record from sequential file.
    Only way to delete records from a sequential file is the create new file without those records that are not needed.


                    Example : Delete records from File B that are present in File A.
                    Both the files A and B are unordered
                    [Pending]

Processing Ordered File:

Ø  Ordered files are the files where records are stored in specific order of some field. Note that in sequential ordered files this sequence is not maintained by the system. Hence it’s the programmer that does this maintenance through the program.

Ø  As with unordered files the deletions of the records from sequential ordered files is difficult task. We need to create a new file without the records that should be deleted.

Ø  Inserts is the difficult task with ordered sequential files as we need to preserve the order and insert records to specific positions only. Here also we create new output file with old as well as new records in correct order.

Following is the pseudo logic that we need to use to maintain the sequence.

READ FILEA
READ FILEB
PERFORM UNTIL FILEA-EOF AND FILEB-EOF
If KEYA < KEYB
   MOVE DATA-A TO DATA-B
    WRITE FILEC
   READ FILEA
ELSE
IF KEYB < KEYA
 MOVE DATA-B TO DATA-A
   WRITE FILEC
READ FILEB
END-IF
END-PERFORM.

Example: Create a program to read from File A and File B and create output file in correct sequence.
[PENDING]

1 comment:

  1. Hello, thank you for this.I can't find the code for this example u explained:

    Example Program:
    Write a program to READ sequentially from a file A, Write sequentially to file B using OUTPUT mode.

    Write sequentially to File C that is already populated using EXTEND mode.

    Then rewrite records in FILE A to change all “A” in the File to “B”.

    ReplyDelete