Cleaning operations
Raw Data
=======
The data collection phase was then followed by the data processing stage accomplished through the following procedures:
1- Data coding
This stage involved turning the text describing occupation, economic activity, educational attainment and geographic localities into numeric codes. Since one of the major objectives of this project was to compare data with the results of the 1988 labor survey, the research team decided to use the 1986 coding manuals for occupations and economic activities, despite the fact that CAPMAS has issued more recent coding manuals. However, for the coding of localities (administrative units) and educational attainment, the 1996 coding manuals were used, while making sure that the equivalent codes for 1986 be obtained.
2-Office checking
Office checkers had many tasks to do. First, they had to review the consistency of replies throughout the different sections of the questionnaires for each household. Second, they had to translate the options chosen under "other" according to the lists generated by the coding team. Third, they had to prepare the questionnaires for the data entry stage. This included adding -9 and ?? in places of missing data4, deleting replies that were not applicable and making sure that the person number is written on all pages of the individual questionnaires as well as project numbers in the family enterprise questionnaire. The last task for the office checking team was to provide a list of the total production of each field interviewer and reviewer by counting household questionnaires, number of individuals interviewed (six year old and above) and number of family enterprises for each reviewer and interviewer.
3-Data Entry
Data entry started before the end of the office checking stage. It lasted from February 16 till April 8, 1999 and took place at CAPMAS premises within the Statistics Department using the PCs and the LAN provided by ERF. This is not a regular arrangement since CAPMAS has a department for computer data processing. However, the arrangement proved to be significantly more efficient, specifically in comparison to the 1988 experience where the data processing stage took more than a year (Fergany, 1990:9).
4-Data Validation
The data validation process works as follows: First, the program produces lists of likely or mandatory errors in each questionnaire, identifying the question number and the individual person number (pn). The four supervisors, with consultation with the two reviewers, read the program message carefully and consult the questionnaire for data validation. One of two measures takes place: either change the data upon reviewing the questionnaire, or hand-write a note on the list that although there could be an inconsistency in the data provided, the case at hand is a unique case and hence data should remain as is. The reviewer and supervisor both sign their names on the program printout beside the message and the decision they reached. If changes need to be done, data entry clerks are given directions to input them.
During the data validation stage, the program pointed to discrepancies in the way occupations and economic activities were coded. As noted earlier, the ERF team decided to use the 1988 coding system to ensure comparability of data. However, the program pinpointed some inconsistent codes in relation to data in other parts of the questionnaire. The discrepant codes were mistakenly done according to the 1996 coding manual. As a result, two of CAPMAS specialists in coding were stationed at the data entry room to screen the coding for occupations and economic activities in all questionnaires. Moreover, CAPMAS programmers designed a program that would point out all coding for occupation and economic activity done using the 1996 coding system.
As a final data cleaning process, CAPMAS programmers re-applied the skip patterns and range rules to the data set at the end of the validation rules. The new program was designed to ensure that when some records were changed during the validation process, the new changes were in accordance with the rest of the information for the same record.