Database Compilation

A database is a collection of facts and information about the recurrent manifestation of some event, occurrence, incident, survey response or other commonality. Thus, one may compile a database of product failure reports, climatological information such as daily high temperatures by date and locality, consumer attitudes toward a specific product, voting history by election and precinct, or any other topic that admits repetitive documentation. One speaks of the database as a file of individual records, all of which are characterized by the same fields of information.

Among the issues we address in compiling a database are:

  • Relevance
  • Accuracy
  • Completeness
  • Ease of manipulation.

Some databases we have compiled over the years were derived from the responses to a survey designed to obtain information specific to a unique problem or application. In a few cases, a survey was conducted to expand the scope of an existing database. Frequently, a project required that a database be designed to link two or more existing collections of records.

Over the years, we have worked with and for many government statistical agencies including the Bureau of the Census, the National Center for Health Statistics and the National Oceanographic and Atmospheric Administration. We have established working relationships with state agencies, universities and trade associations compiling statistical data.

We used Vehicle Identification Numbers (VINs) to link records of auto fires from insurance companies, automobile manufacturer’s files, government accident reports and other specialized sources. The effects of climate and weather on exterior housing siding were analyzed by linking meteorological data files from the government with insurance claims files of homeowners reporting siding problems on the basis of Zip codes.

In a study of the effects of Agent Orange, we compiled an extensive database drawn from information we abstracted from military service records, and from death certificates we identified and obtained from state vital records registrars. We linked these data with estimates of Agent Orange to which armed services personnel were subjected based on records of locations where Agent Orange was used and the intensity of its application. We developed a spraying algorithm to estimate the areas covered and the doses received on the ground. The linking was based on the location of the units to which the personnel had been assigned at the time spraying had taken place.

In most applications, there is a requirement to eliminate duplicate records and delete records that are not germane. For example, in the vehicle fire application described earlier, the VIN enabled us to verify the make, model and year of the automobile to insure that the record referred to a vehicle within the definitions of a class action suit. Also, the VIN was the key to identify duplicate records which may have been recorded under the names of both husband and wife, or individual and business.