The Republic of Iraq was once considered a leader in household expenditure and income surveys. Its first was conducted in 1946, with follow-up surveys in 1954 and 1961. After the establishment of the Central Statistical Organization (CSO, the precursor to COSIT), household expenditure and income surveys were carried out every three to five years (in 1971/1972,1976, 1979, 1984/1985, 1988, and 1993), covering all Iraqi governorates (except the 1993 survey, which could not cover the three governorates in Kurdistan Region of Iraq—Sulaimaniya, Erbil, and Duhouk). At the beginning of July 2002, CSO began a socio-economic household survey for 2002/2003 that again excluded those in Kurdistan Region. The survey was designed for a full year, but CSO lost most of its survey questionnaires and the database because of the war and its aftermath. The only usable data were for the months of July, August, and September 2002.
With no complete household or expenditure surveys undertaken in more than 14 years, the Central Organization for Statistics and Information Technology (COSIT) and the Kurdistan Region Statistics Organization (KRSO) launched fieldwork on the Iraq Household Socio-Economic Survey (IHSES) on November 1, 2006. The survey was carried out over a full year, covering all governorates including those in Kurdistan Region.
The World Bank provided financial support in addition to technical consultation in defining project objectives, the questionnaire, sample design, and the output tables. The Bank also provided substantial technical support for capacity building of COSIT and KRSO staff involved in fieldwork implementation, preparation of data entry programs, and analysis of the survey indicators using the Statistical Package for the Social Sciences (SPSS).
The Iraqi side prepared the fieldwork implementation plan and mechanism; contributed to the questionnaire and sample design; selected the households; prepared and trained the fieldworkers; updated the lists and maps; and implemented the fieldwork, data entry, and results generation.
IHSES constitutes the first component of the Poverty Reduction Strategy Project, which the Republic of Iraq is implementing in cooperation with the World Bank. The overall project consists of four components: (i) data collection (IHSES), (ii) poverty and inequality assessment, (iii) analysis of impact of proposed policies, and (iv) a poverty reduction strategy.
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL ORGANIZATION FOR STATISTICS AND INFORMATION TECHNOLOGY (COSIT)
In order to develop an effective poverty reduction policies and programs, Iraqi policy makers need to know how large the poverty problem is, what kind of people are poor, and what are the causes and consequences of poverty. Until recently, they had neither the data nor an official poverty line. (The last national income and expenditure survey was in 1988.)
In response to this situation, the Iraqi Ministry of Planning and Development Cooperation established the Household Survey and Policies for Poverty Reduction Project in 2006, with financial and technical support of the World Bank. The project has been led by the Iraqi Poverty Reduction Strategy High Committee, a group which includes representatives from Parliament, the prime minister's office, the Kurdistan Regional Government, and the ministries of Planning and Development Cooperation, Finance, Trade, Labor and Social Affairs, Education, Health, Women's Affairs, and Baghdad University.
The Project has consisted of three components:
- Collection of data which can provide a measurable indicator of welfare, i.e.the Iraq Household Socio Economic Survey (IHSES).
- Establishment of an official poverty line (i.e. a cut off point below which people are considered poor) and analysis of poverty (how large the poverty problem is, what kind of people are poor and what are the causes and consequences of poverty).
- Development of a Poverty Reduction Strategy, based on a solid understanding of poverty in Iraq.
The survey has four main objectives. These are :
• To provide data that will help in the measurement and analysis of poverty.
• To provide data required to establish a new consumer price index (CPI) since the current outdated CPI is based on 1993 data and no longer applies to the country’s vastly changed circumstances.
• To provide data that meet the requirements and needs of national accounts.
• To provide other indicators, such as consumption expenditure, sources of income, human development, and time use.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing household surveys in several Arab countries.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
V1.0: A cleaned and a harmonized version of the survey dataset, produced by the Economic Research Forum for dissemination.
V2.0: A cleaned and a harmonized version of the survey dataset, including all variables in V1.0 in addition to a number of new/detailed-composite coded version of the variables considered essential on the household as well as the individual level, produced by the Economic Research Forum for dissemination.
All documentation available for the original survey provided by the Statistical Agency, and for the harmonized datasets produced by the Economic Research Forum, has been published, along with a copy of all international classifications of expenditures, occupations and economic activities used during the harmonization process.
However, as far as the datasets are concerned, the Economic Research Forum produces and releases only the harmonized versions in both SPSS and STATA formats.
Household: Includes geographic, social, and economic characteristics of households, namely, household composition, dwelling characteristics, ownership of assets indicators, heads' and spouses' characteristics, annual household expenditure and income.
Individual: Includes demographic, migration, education, labor and health characteristics, as well as annual income for household members identified as earners. Moreover, fathers' and mothers' characteristics are generated for household members if possible.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Producers and sponsors
Economic Research Forum
Central Organization for Statistics and Information Technology (COSIT)
Kurdistan Regional Statistics Office (KRSO)
Government of Iraq
Funded the study
Multi-country trust fund
Funded the study
The World Bank
Funded the study
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL ORGANIZATION FOR STATISTICS AND INFORMATION TECHNOLOGY (COSIT)
----> A.Total sample size and stratification
The total effective sample size of the IHSES 2007 is 17,822 households. The survey was nominally designed to visit 18,144 households - 324 in each of 56 major strata. The strata are the rural, urban and metropolitan sections of each of Iraq's 18 governorates, with the exception of Baghdad, which has three metropolitan strata. The IHSES 2007 and the MICS 2006 survey intended to visit the same nominal sample. Variable q0040 indicates whether this was indeed the case.
----> B. Sample frame
The 1997 population census frame was applied to the 15 governorates that participated in the census (the three governorates in Kurdistan Region of Iraq were excluded). For Sulaimaniya, the population frame prepared for the compulsory education project was adopted. For Erbil and Duhouk, the enumeration frame implemented in the 2004 Iraq Living Conditions Survey was updated and used.
The population covered by IHSES included all households residing in Iraq from November 1, 2006, to October 30, 2007, meaning that every household residing within Iraq's geographical boundaries during that period potentially could be selected for the sample.
----> C. Primary sampling units and the listing and mapping exercise
The 1997 population census frame provided a database for all households. The smallest enumeration unit was the village in rural areas and the majal (census enumeration area), which is a collection of 15-25 urban households. The majals were merged to form Primary Sampling Units (PSUs), containing 70-100 households each. In Kurdistan, PSUs were created based on the maps and frames updated by the statistics offices. Villages in rural areas, especially those with few inhabitants, were merged to form PSUs.
Selecting a truly representative sample required that changes between 1997 and the pilot survey be accounted for. The names and addresses of the households in each sample point (that is, the selected PSU) were updated; and a map was drawn that defined the unit's borders, buildings, houses, and the streets and alleys passing through. All buildings were renumbered. A list of heads of household in each sample point was prepared from forms that were filled out and used as a frame for selecting the sample households.
----> D. Sampling strategy and sampling stages
The sample was selected in two stages, with groups of majals (Census Enumeration Areas) as Primary Sampling Units (PSUs) and households as Secondary Sampling Units. In the first stage, 54 PSUs were selected with probability proportional to size (pps) within each stratum, using the number of households recorded by the 1997 Census as a measure of size. In the second stage, six households were selected by systematic equal probability sampling (seps) within each PSU. To these effects, a cartographic updating and household listing operation was conducted in 2006 in all 3,024 PSUs, without resorting to the segmentation of any large PSUs. The total sample is thus nominally composed of 6 households in each of 3,024 PSUs.
-----> E. Sample Points Trios, teams and survey waves
The PSUs selected in each governorate (270 in Baghdad and 162 in each of the other governorates) were sorted into groups of three neighboring PSUs called trios -- 90 trios in Baghdad and 54 per governorate elsewhere. The three PSUs in each trio do not necessarily belong to the same stratum.
The 12 months of the data collection period were divided into 18 periods of 20 or 21 days called survey waves. Fieldworkers were organized into teams of three interviewers, each team being responsible for interviewing one trio during a survey wave. The survey used 56 teams in total - 5 in Baghdad and 3 per governorate elsewhere. The 18 trios assigned to each team were allocated into survey waves at random.
The 'time use' module was administered to two of the six households selected in each PSU: nominally the second and fifth households selected by the seps procedure in the PSU.
----> F. Time-use sample
The IHSES questionnaire on time use covered all household members aged 10 years and older. A subsample of one-third of the households was selected (the second and fifth of the six households in each sample point).
The second and fourth visits were designated for completion of the time-use sheet, which covered all activities performed by every member of the household.
A more detailed description of the allocation of sample across governorates is provided in the tabulaion report document available among external resources in both English and Arabic.
Deviations from the Sample Design
----> Exceptional Measures
The design did not consider the replacement of any of the randomly selected units (PSUs or households). However, sometimes a team could not visit a cluster during the allocated wave because of unsafe security conditions. When this happened, that cluster was then swapped with another cluster from a randomly selected future wave that was considered more secure. If none were considered secure, a sample point was randomly selected from among those that had been
visited already. The team then visited a new cluster within that sample point. (That is, the team visited six households that had not been previously interviewed.) The original cluster as well as the new cluster were both selected by systematic equal probability sampling.
This explains why the survey datasets only contain data from 2,876 of the 3,024 originally selected PSUs, whereas 55 of the PSUs contain more that the six households nominally dictated by the design.
The wave number in the survey datasets is always the nominal wave number, corresponding to the random allocation considered by the design. The effective interview dates can be found in questions 35 to 39 of the survey questionnaires.
Remarkably few of the original clusters could not be visited during the fieldwork. Nationally, less than 2 percent of the original clusters (55 of 3,024) had to be replaced. Of the original clusters, 20 of 54 (37 percent) could not be visited in the stratum of “Kirkuk/other urban” and 19 of 54 (35 percent) could not be visited in “Ninevah/other urban.” The other strata had far fewer clusters that could not be visited. In the city of Baghdad, all original clusters were visited; in the stratum “Baghdad/rural,” only 6 of 54 original clusters (11 percent) could not be visited and had to be replaced.
The required sample size of 54 clusters in Kirkuk was obtained through two means. First, eight new clusters were selected in previously visited sample points, using the approach described above. Second, 12 clusters were selected in new residential areas that had not existed at the time of the original sample frame. These 12 clusters were selected from among the newly identified PSUs using the same two-phase sampling method that had been used for the original clusters. All 54 clusters in Kirkuk were visited during the normal fieldwork period (that is, during waves 1 to 18). To account for the new residential area, the population of Kirkuk (used for constructing weights) was increased by 38,000, bringing the revised population to 1,129,000.
In Sulaimaniya, a new residential area was added that had not existed at the time of the original sample frame. Eighteen additional clusters were selected from among the newly identified PSUs with the same two-phase sampling method used for the original clusters. This brought the total number of clusters in Sulaimaniya to 72. The additional 18 clusters were visited after the completion of wave 18. The fieldwork for these additional 18 clusters is referred to as waves 19 and 20. The population of Sulaimaniya (used for constructing weights) was not increased because the population of the new residential areas moved from within the same governorate.
To identify the 30 PSUs resulting from these deviations in the survey datasets, their original 'cluster numbers' (ranging from 0001 to 3024) were increased by 5000.
In Erbil and Duhouk governorates, waves 3, 4, and 5 could not be implemented as planned for logistical reasons. The fieldwork for these three waves was deferred until wave 18 was completed. The fieldwork period was compressed by eliminating breaks so that the work of the three waves was completed in the time normally allocated to two waves. These additional waves are referred to as waves 19 and 20.
A more detailed description of the allocation of the original clusters that could not be visited is provided in table I-1 in the tabulaion report document available among external resources in both English and Arabic.
IHSES reached a total of 18,144 households. Interviews were fully carried out for 98.62 percent of these households.
The highest interview rates were in Missan (99.8 percent), Al-Muthanna (99.7 percent), and Al-Najaf (99.6 percent) governorates. The lowest were in Duhouk (92.4 percent), Diala (92.8 percent), and Al-Anbar (94.3 percent) governorates. Among the 1.39 percent of interviews that were not fully completed, 0.55 percent were partially achieved; no usable information was obtained from 0.06 percent; 0.33 percent refused the interviews; 0.33 percent of the households could not be found; 0.01 percent of the houses could not be found; 0.08 percent of the housing units were found to be unoccupied; and 0.03 percent of the housing units turned out to be seasonal.
The table below gives the response rates by stratum:
It should be noted that Baghdad has three metropolitan strata by design, whereas an additional metropolitan stratum appeared in Suleimaniya for reasons explained in the field "Deviations from Sample Design".
In Kirkuk the response rate is lower than average in the rural stratum and higher that 100 percent in the metropolitan stratum as a result of the special replacement procedures used there (certain unsecure rural PSUs were replaced by metropolitan PSUs - see field "Deviations from Sample Design".)
The selection probability p[hij] of household (hij) in PSU (hi) of stratum h is given by
p[hij] = k[h] n[hi] m[hi] / N[h] n'[hi]
k[h] is the number of PSUs selected in stratum h;
n[hi] is the number of households in PSU hi, as per the 1997 Census;
N[h] is the total number of households in stratum h (also as per the 1997 Census;)
m[hi] is the number of households selected in PSU hi; and
n'[hi] is the number of households in PSU hi, as per the 2006 listing operation.
k[h] is always 54, except in the extra metropolitan stratum in Suleimaniya (18 PSUs,) and in the three Kirkuk strata (55 rural PSUs, 55 urban PSUs, and 64 metropolitan PSUs -- see field "Deviations from Sample Design.")
The nominal value of m[hi] is 2 for the time use module and 6 for all other modules.
The 'probability weight' w[hij] of househild hij is the inverse of its selection probability p[hij].
In the survey datasets, the probability weights so obtained were affected by governorate-wise coefficients intended to have the estimated populations match the corresponding projections used by the national food ration system.
Dates of Data Collection
Initially planned data collection period
Extension of data collection in Kurdistan region
Data Collection Mode
The interviewers were supervised by 56 local supervisors along with regular supervision visits by central supervisors.
Data Collection Notes
----> A. Field visit schedule
A time schedule was prepared to follow up on the recording of the daily household expenditures and to ensure accurate completion of the five-part questionnaire. Seven field visits were scheduled for each household. The schedule covered all tasks—from the first visit, when the daily expenditure diary was handed over to the household, to recovering the diary on the final visit.The interviewers delivered their finished questionnaires to the data entry operators for processing. When errors, gaps, or inconsistencies emerged, the data entry operators issued rejection reports. Interviewers would then revisit the households according to the schedule.
A more detailed description on the schedule of visits for collecting, entering, and correcting data is provided in the tabulaion report document available among external resources in English.
----> B. Wave timetable
The survey was in the field from October 30, 2006, through November 8, 2007. Each interviewer worked 360 days. The first interviewer began on October 30, 2006, and ended on October 24, 2007. The third interviewer began on November 14, 2006, and ended on November 8, 2007. The end of the survey corresponded to completion of the third interviewer’s work.
An 18-wave timetable was prepared for the interviewer teams.
----> C. Training
The training of the main trainers was carried out in three phases. The first phase was carried out in Beirut in June 2006, including seven days of theoretical training. The second phase was implemented in Iraq. Trainees received applied training, with each trainee filling out all parts of the survey questionnaire for two randomly selected households. The third phase was implemented in Amman in July 2006. The main trainer teams were represented by the regional and governorate coordinators. They discussed the key challenges to be encountered in taking the questionnaire to the field, as well as the training of trainers who would then instruct the fieldworkers.
In September 2006, nine centers were opened across Iraq to train local supervisors, field interviewers, and data entry operators. The training, which was specifically designed and highly tailored to the circumstances of Iraq, continued for 23 days. Trainees received theoretical and applied lessons in data collection and data entry. Questionnaires completed during the training were used to test the data entry program.
Training centers were opened in Sulaimaniya, Erbil, Kirkuk, Ninevah, Baghdad (two centers), Al-Najaf, Al-Qadisiya, and Thi Qar. Altogether, 168 interviewers, 56 local supervisors, 56 data entry operators, and 18 governorate secretaries were trained. A number of staff from the statistics offices in the governorates were also trained (three from each governorate, five from Baghdad) as well as alternate field staff to cover emergencies and dropouts.
----> D. Decentralized data entry, field follow-up, and supervision forms
Fieldwork consisted of seven visits to each of nearly 18,000 households during 18 waves lasting 20 days each over 12 months. Given the breadth and complexity of this undertaking, a solid and continuous follow-up system was essential.
As soon as Part 1 of the questionnaire was completed and checked by a supervisor, it was handed off to the team’s data entry operator. The data entry operator entered the collected information and produced an approval/rejection report flagging anomalies. Reports were returned for follow-up and necessary corrections while the interviewers were still in the field working on Part 2 of the questionnaire. The completed Part 2 and corrected Part 1 was then returned to the data entry staff, with further rejection reports and follow-up as needed. This cycle was continuous for all parts of the survey.
The IHSES Core Team responsible for fieldwork supervision worked closely with World Bank technical consultants. Careful and continuous attention was paid to ensuring highly accurate indicators. When mistakes were detected, corrective measures were drafted and circulated to each governorate. To facilitate field follow-up, office review and data processing were decentralized to the governorate centers so that many potential mistakes were avoided during each wave cycle.
IHSES follow-up in the field was systematic but flexible, depending on the evidence provided by the following evaluation forms (Annex 5):
• Form 1. Office check of the questionnaires
• Form 2. Interviewer’s performance
• Form 3. Reinterview
• Form 4. Governorate coordinator
• Form 5. Regional supervisor’s regional control and checking form
• Form 6. Operations room assessment of the work performed in the governorates
In a nutshell, the IHSES collected data during a 12-month period, using 56 field teams distributed through all 18 Iraqi governorates. Each team consisted of one local supervisor, three interviewers and one data entry operator - the latter being responsible of data entry at the governorate office (see "Data Processing" field.) The 12 months of fieldwork were divided into 18 "waves" of 20 or 21 days each. Each field team was responsible for completing three clusters during one wave. As in each cluster 6 households were selected (see "Sampling" field,) the three interviewers of a team completed 18 households in 20-21 days.
Information on food purchases was recorded on a diary during 10 days in each household. During this period interviewers had to visit each household at least 7 times, to make sure that this diary was being properly recorded. During some of those visits, they administered other parts of the questionnaire in independent booklets called "forms." Data entry of the forms started a few days after the first visit to the household, and printouts with the inconsistencies found by the data entry program were sent back to the field teams, who corrected the inconsistencies during the following visits.
A pilot survey was carried out before the beginning of fieldwork to identify and solve operational hurdles.
For security reasons, some clusters in some governorates could not be visited when planned. Besides this, in parts of Kurdistan the fieldwork started with some delay. For those reasons, the fieldwork on those particular areas was expanded for 2 extra waves (19 and 20.)
After the end of each wave, the field teams transferred the datasets to the Data Manager at the survey's Operation Room. The decentralization of data entry and the integration of computer-based quality control to fieldwork allowed the Operations Room in Baghdad to assess and monitor the action of the field teams directly, without any need of intermediate management levels. Government coordinators and central supervisors were used, however, to facilitate logistics and finance.
Central Organization for Statistics and Information Technology
----> A. Preparation
A socio-economic survey questionnaire implemented by COSIT in 2002 served as version zero in creating the 2007 IHSES questionnaire. Version zero went through nine subsequent iterations before the final version emerged on June 6, 2006. Two rounds of pre-testing were carried out in September and November 2005. Revisions were made based on feedback from
the field team, World Bank experts, and others. Seven other iterations took place before the final version was implemented in a pilot survey in March 2006. The questionnaire was revised again after the pilot survey. This process culminated with the final version of the questionnaire that was adopted and implemented for the actual survey.
----> B. The pre-test
A pre-test was necessary to test the questionnaire and the related field manual, and to determine the actual requirements for implementing the survey. The pre-test was carried out in two rounds in Baghdad and Diala governorates. A sample of 12 households, selected across social levels, was tested from September 22–24 in Baghdad and in the rural areas of Diala.
The second round was conducted on October 31 and November 1 among 20 households in urban areas of Baghdad and rural areas of Diala.
COSIT prepared detailed reports covering implementation, teamwork, interview results, the time required to collect data, and comments on the questionnaire and manual. These reports were shared with the World Bank, which helped with a comprehensive questionnaire revision in coordination with technical consultants. A team of central supervisors and the staff of the Department of Living Conditions Statistics participated in the implementation of the pre-test.
----> C. Pilot survey
A pilot survey was conducted on March 15, 2006, to identify deficiencies and to ensure solid procedures for technical implementation and logistics. The pilot survey was carried out in Baghdad, Al-Qadisiya, Basrah, Sulaimaniya, and Duhouk.
A sample of 216 households was selected. Thirty-six households were selected in the urban and rural areas of each governorate (except Baghdad, where 72 households were selected because of its population weight). The reference period for household consumption expenditure was 10 days. Fieldwork was conducted over 18 days. This allowed all sections of the questionnaire to be completed and the household diary expenditure data to be exported as planned.
COSIT conducted a training course in Baghdad on March 6–9 for pilot survey staff. Seventy-one COSIT staff members participated, including 6 central supervisors, 5 governorate coordinators, 12 local supervisors, 36 interviewers, and 12 data entry operators. In addition, senior COSIT personnel from the Living Conditions Department participated. The field
manual was explained. Data entry was performed at the centers of the pilot survey governorates, where COSIT provided the instructional equipment and materials. Following the survey, COSIT prepared a comprehensive report on technical and logistical challenges encountered during the implementation process. The recommendations in COSIT’s report were approved, after which the questionnaire and manual were amended in coordination with the Bank consultants.
----> D. Questionnaire parts
The questionnaire consists of five parts, each with several sections.
Part One—Socio-Economic Data
Section 1: Household Roster
Section 2: Rations Received and Consumption of Provisions
Section 3: Housing
Section 4: Education
Section 5: Health
Section 6: Activities, Entertainment, and Hobbies
Section 7: Job Search and Past Employment
Part Two—Monthly, Quarterly, and Annual Expenditures
Section 8: Expenditures on Nonfood Services and Commodities (past 30 days)
Section 9: Expenditures on Nonfood Services and Commodities (past 90 days)
Section 10: Expenditures on Nonfood Services and Commodities (past 12 months)
Part Three—Expenditure, Income, and Other
Section 11: Daily Expenditure on Repetitive Food and Nonfood Commodities
Section 12: Jobs during the Previous 12 Months
Section 13: Wage Earnings
Section 14: Nonwage Earning Activities
Section 15: Income from Property and Transfers
Section 16: Durable Goods
Section 17: Loans, Credits, and Assistance
Section 18: Risk
Part Four—Diary of Daily Expenditure on Food Commodities
Part Five—Time-Use Sheet
Data editing took place at a number of stages throughout the processing, including:
A. Software packages
The data processing system for the IHSES survey was constructed primarily with CSPro, a specialized package widely used for census and household surveys. In addition, Visual Basic was used to build the user’s menu for the system.
Validation rules were established for most fields, with screens to control the entered data. The objectives of these validation rules are to:
• Ensure accurate entry and editing of the questionnaire data.
• Check that all rules and instructions for filling out the questionnaire are followed—for example, skipping between fields and filtering the data.
• Provide capacity to detect, follow up, and correct inconsistencies.
Data entry, editing, and data processing employed the following programs:
• Data entry: CSPro was primarily used to write the system. Screens were built to conform with the numbering of the questionnaire items and the field names.
• Data editing and consistency: CSPro was used to create rejection reports in the three languages used in the survey (Arabic, Kurdish, and English). The programs were prepared to detect and report a total of 315 abnormal situations in the data.
• Exporting data to the system to produce output tables: SPSS was used to produce output tables. A separate program was designed to transfer the raw data into the SPSS databases for statistical analysis. The exporting process produced files corresponding to the parts of the questionnaire.
• Processing for remaining rejections: The STATA software package was used to create programs to check and correct unresolved errors or rejections in the data files after the fieldwork had ended. These programs relied on mathematical and statistical methods and comparisons among households and governorates. They were able to identify outliers
and adjust values automatically. When these data checks were complete, the files were converted from STATA to SPSS in order to create the output tables.
• Remote access: Log-Me-In service through the Internet was used, allowing the data management team at a central location to follow up and download files from the data entry computers in the field.
B. Stages of data processing
To ensure accuracy and consistency, the data were edited at the following stages:
• Interviewer: Doublechecks all answers on the household questionnaire, confirming that they are clear and correct. Writes in codes by hand for each field. Some calculations are made within the questionnaire.
• Local supervisor: Checks to make sure that questionnaire has been completed correctly before being forwarded to the data entry operator.
• Data management: During data entry, rejected items are flagged through editing and a consistency check program, based on validation rules and price ranges specified in the program. These controls are repeated, first during the entry sessions and then when the data is entirely entered. The same entry program is used, with adaptations for interactive work and for batch-runs without entry operators.
• Statistical analysis: After exporting the data files from CSPro to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or nonlogical values, in addition to auditing some variables.
• World Bank consultants in coordination with the COSIT data management team: The World Bank technical consultants use additional programs in SPSS and STATA to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected
- The SPSS package is used to clean and harmonize the datasets.
- The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
- All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
- A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
- A post-harmonization cleaning process is then conducted on the data.
- Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
The data collected in the field was entered, wave after wave, separately in each governorate. All the rejections issued by the entry programs were dealt with within each team. At the end of each of the 18 waves, the data was sent to (or centrally picked up from) the Data Management Team (DMT), which re-checked the information and sent back for fixing any incomplete or unacceptable data.
Then, the final consolidated data for a wave was exported to SPSS into a set of files delivered to the Data Analysis Unit (DAU) in a pack known as "generation 1" of the wave. DAU identified specific issues for the data and requested further fixes from DMT of cleaned up the outliers and unacceptable cases. This activity produced a "generation 2" of the SPSS databases, which was used as input for adding variables such as expenditure and income aggregates, new classifications of households and persons, including unemployment descriptors, for producing a "generation 3". The latter was used for creating a last "generation 4" of the databases, adding consumption aggregates, the classification of households by poverty status and other poverty-related variables.
To deal with all the data management responsibilities, the DMT produced or acquired a number of software tools for better supporting the project.
The core piece of software, a data entry program (developed in CSPro 3.01), allowed entry operators to enter and validate the information collected in the field, with strong consistency checks for improving the quality of the data. Main controls included: (1) ranges for numeric variables, (2) demographic consistency within the household including full control on education, health and labor data, (3) check unitary values and measurement units for acquired items, (4) extensive use of control subtotals for critical sections, (5) check the household metadata against the sample, and (6) balance of calories per capita based on food transactions. The screens and error messages were displayed in three languages (Arabic, Kurdish and English) depending on the choice of the data entry operator.
Time use sheets collected for 1/3 of the surveyed households were converted into text files using scanners in each governorate. In spite the difficulties opposed by the variety of formats and scan devices available, scanning was the only choice for recording the activities declared by the interviewees at a scale of one quarter hour along 24 hours a day.
An export module, also in CSPro, was included for transferring data into SPSS and Stata. During the export process, the same consistency checks of the data entry program were run again, plus other controls that checked the completion of the work in each governorate after each wave. The scripted export module reduced the data to just 12 interlinkable files.
Friendly menus written in Visual Basic allowed for a simplified utilization of the different components of the entry tool.
Starting 7th wave, the data files of some governorates could be accessed and retrieved from a central location using remote internet access via LogMeIn. Remaining governorates kept sending their files by email, since there ware technical problems that the data management team could not solve for security constraints.6. Processing ends when data has been verified by both Data Management and DAU
Estimates of Sampling Error
The estimation of standard errors must account for the design features
The following variables, included in all datasets, are needed for the estimation of standard errors:
xweight : sampling weight
xstrat: sampling stratum
xcluster: primary sampling unit
Warning: Variable 'xbeea', also present in all datasets, identifies rural, urban and metropolitan environments for tabulation purposes; it is sometimes wrongly referred to as 'stratum', but it should not be used for the estimation of sampling errors. The variable that needs to be used for these purposes is 'xstrat', which identifies the 57 sampling strata, defined as the rural, urban and metropolitan sectors of each of each of the 18 governorates, with the exception of Baghdad (which has three metropolitan sectors,) and Suleimaniya (which has two.)
Estimates of sampling errors for the survey most important results are presented in Annex 1 in the tabulaion report document available among external resources in both English and Arabic.
Economic Research Forum
Economic Research Forum (ERF) - 21 Al-Sad Al-Aaly St., Dokki, Giza, Egypt
To access the micro data, researchers are required to register on the ERF website and comply with the data access agreement.
The data will be used only for scholarly research, or educational purposes. Users are prohibited from using data acquired from the Economic Research Forum in the pursuit of any commercial or private ventures.
Licensed datasets, accessible under conditions.
The users should cite the Economic Research Forum and the Central Organization for Statistics and Information Technology as follows:
OAMDI, 2016. Harmonized Household Income and Expenditure Surveys (HHIES), http://erf.org.eg/data-portal/. Version 2.0 of Licensed Data Files; IHSES 2006/2007- Central Organization for Statistics and Information Technology (COSIT). Egypt: Economic Research Forum (ERF).
Disclaimer and copyrights
The Economic Research Forum and the Central Organization for Statistics and InformationTechnology have granted the researcher access to relevant data following exhaustive efforts to protect the confidentiality of individual data. The researcher is solely responsible for any analysis or conclusions drawn from available data.
(c) 2016, Economic Research Forum | (c) 2007,COSIT, Iraq
DDI Document ID
Economic Research Forum
Cleaning and harmonizing raw data received from the Statistical Agency