Flat Files

Many managers would feel more comfortable if clinical data could be stored and viewed in a format with which they are already familiar, an Excel spreadsheet, for example (see Figure 11.1). At first glance, the spreadsheet format seems ideal: each row constitutes a different patient record, and each column a different field or variable. But as we start to fill in a mockup of our spreadsheet, two difficulties arise: first, as the number of columns exceeds the width of the screen, we may easily forget just where a particular data item is located; second, as the trial continues, we begin to accumulate multiple records for each patient—pretreatment or baseline, one-week follow-up, one-month follow-up, and so forth. Will we run out of space?

Recurrent Ischemia

PAGE Date Adverse Event R2/R3

2PATID EVENT PAGE

3002-1121 1MON 4005-1121 6MON

8/15/98 FATIGUE 8/15/98 FATIGUE

5002-1122 6MON 117

117_HOSP 1/31/99 EPIGASTRIC PAIN

6002-1122 YEAR 7002-1124 2WEK

8/26/99 UTI, HEMATURIA 9/30/98 Allergic Reaction

FIGURE 11.1 Spreadsheet as an Example of a Flat File.

The first of these difficulties is correctable, not by Excel, but by a more-advanced flat-file manager that would allow us to search for columns by name.

The second difficulty presents more of a challenge, particularly when the different follow-ups involve different examinations and, thus, different sets of variables. While each patient's baseline record contains a host of information including demographic variables, baseline data, and laboratory values, the various follow-ups may contain only a few data items. On the other hand, the adverse event record contains many items that are not in the baseline record. When we create a column for each variable that "might" occur, the result is a worksheet made up primarily of space-consuming blank entries.

Obviously we will need several spreadsheets to store our data, perhaps one for each record type or each set of screens. But then how are we to link them in such a way that we can search and retrieve information from several worksheets at a time? Moreover, as the number and size of our worksheets grow, access times increase and corrections become more difficult.

Suppose that a follow-up exam file includes fields for the date, patient name, patient ID, patient address, plus the observations on that patient on that date. Each record must repeat the name, ID, and address of the patient increasing the amount of storage required and perhaps doubling or even tripling the time required for data retrieval.

A 10-column spreadsheet with 2000 entries requires about 200Kbytes of storage and takes only a few seconds to sort. But a typical clinical database requires 200,000Kbytes of storage and 1000 to 10,000 seconds to sort if the sorting methods used by Excel (one of the fastest spreadsheets) are employed.

If the patient's address changes, it will have to be changed in multiple locations or risk irresolvable inconsistencies. If the patient's name is spelled differently in different places (e.g., Phil Good, Phillip Good), then we may fail to retrieve all the necessary records.

In summary, a flat-file database like a spreadsheet contains only one record structure, many of whose fields will be empty. Access to data is done in a sequential manner; access times are slow because the entire file must be scanned to locate the desired data. Complex queries—"how many patients who were heavy smokers suffered non-Q-wave MI's during the first three months after the stent was implanted?"—are virtually impossible as there are no links among separate records.

Other problems with a flat-file database include data redundancy, the difficulty of locating and updating records as the file size increases, and the near impossibility of maintaining data integrity.28 When the regulatory agency makes unexpected requests, will we be able to respond quickly?

Was this article helpful?

0 0

Post a comment