SEQUENTIAL FILE HANDLING
File
Handling:
Ø COBOL is generally used to process Record based
files.
Ø In Record Based Files following is the meaning of
the terms used:
o File: Collection
of one or more occurrences (instances) of a record type (template). A file will
have several Records.
o Record: Collection of logically related fields which
record information about a particular object. A record will have several fields
describing a same object.
o Field: It’s the unit of information about a particular
object.
Ø Understand that there is difference between the
occurrence of record (i.e.
the values of a record) and record template (i.e. the structure of the record).
Each record occurrence in a file will have a
different value but every record in the file will have the same structure.
Ø There are two record based organizations of
files:
1)
Sequential
Files (Also called as Serial) Files.
2)
Direct
Access Files.
How is File
processed by a program?
A record based file may contain millions of
records. To process the data should be loaded into RAM.
It’s not possible to process the entire file in
one go. Hence the file is read one record at a time and processed one record at
a time.
However the program does not know how big the
record is and what are the fields in the record. Hence it’s the programmer’s
responsibility to declare the record structure to the program. The computer
uses the programmer's description of the record to set aside sufficient memory to store one
instance of the record. This memory space set aside for each file is called “record
buffer” For each file
being processed by the program there should be a similar record buffer defined.
A record buffer is capable of storing the data
recorded for only one instance of the record. To process a file a program must
read the records one at a time into the record buffer and then process it.
Thus moral is
“record buffer” acts connection between the program and file and its
programmers responsibility to declare it in the program.
Ø While reading a file each record instance is
copied (read) from the file, into the record buffer and then processed.
Ø While writing a file each record is placed into
the record buffer and then written to the file from there.
Ø While transferring a record from an input file to
an output file we must read the record into the input record buffer, transfer
it to the output record buffer and then write the data to the output file from
the output record buffer.
How does a
programmer define the file structure to the Program?
The Record buffer for every file used in program
is described in the FILE SECTION by
making use of FD(File Descriptor) entry . The FD entry contains the letters FD
followed by the internal name that programmer assigns. This internal name is
assigned by the FILE-CONTROL paragraph of the ENVIRONMENT DIVISION.
This internal filename is assigned using the SELECT…ASSIGN clause. The internal
file name (This name is not actual name of file. This name of the file is used
only inside the cobol program) used in a file's FD entry is connected
to an external file (Actual physical file on disk or tape) by means of
the SELECT and ASSIGN clause. The SELECT and ASSIGN clause is an entry in
the FILE-CONTROL paragraph in the INPUT-OUTPUT SECTION in the
ENVIRONMENT DIVISION.
Suppose we have a file with below type of data in
our input file.
---------------------------------------
0000001mahadik Su03031988AX2BM
0000002kalamkaDa03041876AS2B
.
.
.
---------------------------------------
Following table will give a clear idea of how
data is arranged in the file.
Fields
|
Digit/character count
|
Example
|
Student id
|
7
|
0000001
|
Name
|
10
|
mahadik Su
|
Date of birth
|
8
|
3031988
|
course code
|
4
|
AX2B
|
Gender
|
1
|
M
|
For the data shown we can create a record
structure as below:
01 StudentRec.
02
StudentId PIC 9(7).
02
StudentName PIC X(10).
02
DateOfBirth PIC 9(8).
02
CourseCode PIC X(4).
02
Gender PIC X.
The record description above is correct as far as
it goes. It reserves the correct amount of storage for the record buffer.
However its restricted as we can’t access Month, day of year individually.
To allow us to access these fields individually
we need to declare the record as follows:
01 StudentRec.
02
StudentId PIC 9(7).
02
StudentName.
03
Surname PIC X(8).
03
Initials PIC XX.
02
DateOfBirth.
03
YOBirth PIC 9(4).
03
MOBirth PIC 99.
03
DOBirth PIC 99.
02
CourseCode PIC X(4).
02
Gender PIC X.
Now that we have decided on the structure of the
record buffer following is how we code the FD entry and the SELECT ASSIGN
clause.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT
StudentdataFile ASSIGN TO “STUDENTS.TXT”.
DATA DIVISION.
FILE SECTION.
FD StudentdataFile.
01 StudentRec.
02 StudentId PIC 9(7).
02
StudentName.
03
Surname PIC X(8).
03
Initials PIC XX.
02
DateOfBirth.
03
YOBirth PIC 9(4).
03
MOBirth PIC 99.
03
DOBirth PIC 99.
02
CourseCode PIC X(4).
02
Gender PIC X.
In the above STUDENTS.TXT is the actual physical
file name and “StudentdataFile” is the internal file name and the cobol program
will be referring to the file throughout.
Also note that the FD entry uses the internal file
name that we created.
Moral is “FD
entry is used to describe the program the structure of the record buffer. The
Program uses this structure do reserve the space in memory and also understand
what fields the file contains. The SELECT…ASSIGN clause does the task of
connecting the record buffer with the actual physical file”
How to Code the SELECT… ASSIGN clause?
How to Code the SELECT… ASSIGN clause?
Ø This is defined in the FILE-CONTROL paragraph of
INPUT-OUTPUT SECTION of environment division.
Ø Syntax:
SELECT
Filename ASSIGN TO ExternalFile
ORGANIZATION
is RECORD/LINE SEQUENTIAL
Ø This assigns the internal “Filename” to the
actual file “ExternalFile”.
Ø On mainframe platform the ExternalFile is the
actual DDNAME that we specify in the JCL. Note that’s it’s the DDNAME that we
put in the JCL and not the actual dataset name.
Ø On windows and unix platform the file
“ExternalFile” is assumed on the same directory from where the program is run .
Else we can also provide the fully qualified path in the SELECT…ASSIGN clause.
Ø Note the ORGANIZATION clause. This is used to
indicate the organization of the file being accessed.
If the file is SERIAL(sequential) we use
ORGANIZATION as SEQUENTIAL.
If the file in INDEXED then we use INDEXED and if
file is RELATIVE then we use RELATIVE.
Ø Advantage of assigning internal file name is that
programs more readable and more easy to maintain. If the
location of the file, or the medium on which the file is held, changes then the
only change we need to make to our program, is to change the entry in the SELECT and ASSIGN clause.
Ø ON windows and unix platform we have two types of sequential organization
1.
Line Sequential:
This is used in case of normal files where each
record is followed by a the carriage return and line feed characters.
Example:
---------------------------------------
0000001mahadik Su03031988AX2BM
0000002kalamkaDa03041876AS2BF
.
.
.
---------------------------------------
2.
Record Sequential: This is used when the file contains stream of bytes. The
records are one after another. There is no carriage return in between records.
Only the fact that we know the record length helps us identify the record.
Example:
---------------------------------------
0000001mahadik
Su03031988AX2BM0000002kalamkaDa03041876AS2B...
---------------------------------------
Ø Another clause that is generally used with SELECT…ASSIGN
clause is the “ACCESS MODE IS
SEQUENTIAL/RANDOM/DYNAMIC”. However with sequential files we can only use
the ACCESS MODE as SEQUENTIAL which is also the Default and hence we can skip
writing it
What after
coding the FD entry and SELECT…ASSIGN clause? How to I actually work with
records in the files?
To access file and data inside them we make use
of the verbs that are provided by COBOL language.
To write programs that process Sequential Files we
use following verbs - the OPEN, CLOSE, READ, WRITE, REWRITE.
OPEN verb:
Before we start working with files , we need to
open the file.
The file can be opened in any of the following
mode: INPUT/OUTPUT/EXTEND/I-O
Opening a file does not transfer any data to the
record buffer, it simply provides access.
Syntax:
OPEN
INPUT/OUTPUT/EXTEND/I-O Filename.
INPUT: To read from the file we need to use INPUT mode.
When a file is opened for INPUT, the Next Record Pointer is positioned at the beginning of the file.
WRITE: To write into the file we need to use the OUTPUT
mode. If records already exists in the file then the file will be overwritten.
EXTEND: To continue appending data to existing file we
need to use the EXTEND mode.
When the file is opened for EXTEND, the Next Record Pointer is
positioned after the last record in the file
I-O: To update (Read and then write it back) , we
need to use the WRITE mode.
CLOSE verb:
Every file that we open using any of the modes
must be closed.
The Syntax for closing the file is simple.
CLOSE followed by the filename.
Example: CLOSE Filename.
READ verb:
Once the file is opened in INPUT mode we can read
the records using READ verb, one record at a time.
The READ copies a record occurrence/instance from
the file and places it in the record buffer.(defined using FD) and then we can
access it.
Syntax for READ for sequential files:
READ
filename [INTO Identifier]
AT END
StatementBlock
END-READ.
Ø When the READ reaches the end of the file , AT END is triggered
and the StatementBlock following the AT END is executed. This happens when the READ tries to read
after the last record.
Ø INTO Identifier clause, causes the data to be read into the record
buffer and then copied from there, to the Identifier, in one operation. When this is
used there are two copies of data. One in the Record buffer(FD) and another in
the identifier. This is equivalent to READ the contents using READ verb and
then moving it to the identifier.
Ø When reading Sequential files we should specify
the “AT END” clause to indicates what has to be done after the file is done
reading.
Ø Also note the END-READ delimiter, This indicates
where the READ command ends.
Do not
put a period at the any of the statements in the “AT END” clause. IF we put a
period it then it will assume that period as END of the READ verb.
Ø Also note that READ verb uses the Filename to
write and not the record name.
WRITE verb
to perform write on sequential file:
Ø The WRITE verb
is used to copy data from the record buffer (RAM) to the file.
Ø Syntax:
WRITE RecordName [FROM Identifier]
Ø To WRITE data
to a file we must move the data to the record buffer (declared in the FD entry)
and then WRITE the contents of record buffer to the file.
Ø FROM is used the data contained in the Identifier is copied into the record buffer and is
then written to the file. The WRITE..FROM is
the equivalent of a MOVEIdentifier TO RecordBuffer statement followed by a WRITE RecordBuffer statement.
Ø Note that we WRITE a record and Read a FILE. The
reason we read a file but write a record, is that a file can contain a number
of different types of record.
When we
read a record from the transaction file we don't know which of the types will
be supplied; so we must - READ Filename. It is
the programmers responsibility to discover what type of record has been
supplied.
When we
write a record to the a file we have to specify which of the record types we
want to write; so we must - WRITE RecordName.
Ø If file is opened in OUTPUT the write operation
overwrites the existing file.
If the file is opened in EXTEND mode then the write adds to the existing file.
If the file is opened in EXTEND mode then the write adds to the existing file.
REWRITE verb
for rewriting sequential file:
This is used to REWRITE a record. This is used to
update the record.
For this we need to read the RECORD using READ
verb then change the contents of the record and the perform the REWRITE
operation to update the record.
For this we need to open the file in I-O mode.
Synatax: REWRITE RecordName [FROM identifier]
Note that with
sequential files to rewrite a record its is first necessary to read the
records, This will set the Next record pointer to correct position and then we
can execute the REWRITE command.
The process will
be slightly different when using the Direct access files.
Example Program:
Write a program to READ
sequentially from a file A, Write sequentially to file B using OUTPUT mode.
Write sequentially to
File C that is already populated using EXTEND mode.
Then rewrite records in
FILE A to change all “A” in the File to “B”.
[PENDING]
###############################################################
###############################################################
More on Sequential file
processing:
The records in a
Sequential file are organized in a serial manner, however records could be ordered or
unordered.
Ø Ordered sequential file
means the records are stored in the sequence of some key value.
Ø Unordered sequential
file means that records are not arranged in any sequence.
Note that in ordered
sequential file it’s not the system that does anything to maintain the sequence
and the integrity of the file. It’s the programmer’s responsibility to maintain
the sequence.
The way records are
organized (ordered or unordered) puts lot of affect on way the records are
processed.
Processing Unordered
Sequential files:
Ø In unordered sequential
files data can be added to the end of the file by using the EXTEND mode.
Inserting records to
unordered sequential files is very easy as they can be added to the file
without having to maintain any sequence.
Example: Add data from file A to unordered File B
using EXTEND mode
[Pending ]
Ø But deleting and
updating records in the sequential files is quite difficult. Updates can be
still done by making use of REWRITE verb, however it’s not possible to directly
delete record from sequential file.
Only
way to delete records from a sequential file is the create new file without
those records that are not needed.
Example : Delete records from File B that are
present in File A.
Both the files A and B are unordered
[Pending]
Processing Ordered File:
Ø Ordered files are the
files where records are stored in specific order of some field. Note that in
sequential ordered files this sequence is not maintained by the system. Hence it’s
the programmer that does this maintenance through the program.
Ø As with unordered files
the deletions of the records from sequential ordered files is difficult task. We
need to create a new file without the records that should be deleted.
Ø Inserts is the difficult
task with ordered sequential files as we need to preserve the order and insert
records to specific positions only. Here also we create new output file with old
as well as new records in correct order.
Following is the pseudo
logic that we need to use to maintain the sequence.
READ FILEA
READ FILEB
PERFORM UNTIL FILEA-EOF
AND FILEB-EOF
If KEYA < KEYB
MOVE DATA-A TO DATA-B
WRITE FILEC
READ FILEA
ELSE
IF KEYB < KEYA
MOVE DATA-B TO DATA-A
WRITE FILEC
READ FILEB
END-IF
END-PERFORM.
Example: Create a
program to read from File A and File B and create output file in correct
sequence.
[PENDING]
Hello, thank you for this.I can't find the code for this example u explained:
ReplyDeleteExample Program:
Write a program to READ sequentially from a file A, Write sequentially to file B using OUTPUT mode.
Write sequentially to File C that is already populated using EXTEND mode.
Then rewrite records in FILE A to change all “A” in the File to “B”.