## Egypt - Harmonized Survey of Young People in Egypt, HSYPE 2014

Data Appraisal

Estimates of Sampling Error ----> SYPE estimates for sampling error estimation The accuracy of survey estimates indicates how close the estimate is to its corresponding population value. The difference between the survey estimate and its population value is called the error of the survey estimate. The total error results from the two types of error, namely the sampling error and non-sampling error. The sampling error arises when the whole population is represented by a part (sample) of it. All other errors in the survey estimate are called non-sampling error which may occur for different reasons during implementation of the sample survey. These reasons may include target population misidentification, questionnaire problems, respondents' bias, processing error, and time period bias among others. Sampling error can be estimated statistically while it is difficult to measure the non-sampling error. ----> Sampling error estimation Standard error (SE) is usually used to measure the sampling error. The standard error is the square root of the variance. Standard error calculation is straightforward in the case of simple random sample. However, since the SYPE data result from a stratified multistage sample design, a more complex formula is used. The STATA SVY module is used to calculate the standard error for key estimates of SYPE 2014. ----> Precision measures Precision measures of survey data estimates may include: • Standard error (described in previous paragraph) • Coefficient of variation • Confidence Interval • Design effect ----> Coefficient of variation (CV) Coefficient of variation is the relative standard error. It is calculated as the ratio of the estimate sampling error to its value. The reliability of the survey estimates are questionable if CV exceeds 20%. ----> Confidence interval (CI) A confidence interval is used to express the uncertainty of survey estimates. The sampling (standard) error is employed to construct a confidence interval of the parameter of interest. A 95% confidence interval means that when the same sampling method with the same sample size and design is used to select different samples and construct a confidence interval for each sample, the true population parameter is expected to fall within the confidence interval in 95% of all samples. However, if the lower bound of the confidence interval for a positive parameter is negative, it must be considered zero. Similarly, whenever the upper bound of a confidence interval exceeds 1, it must be considered 1. ----> Design effect (DEFF) The design effect (DEFF) measures how much worse the given sample design is than a simple random sample (SRS) of the same size. The DEFF is defined as the ratio of the standard error of the used design to the standard error of SRS of the same sample size. The DEFF shows how much information is gained (or lost) by using the present survey compared to SRS. A DEFF value of 2 indicates that a double size of simple random sample is needed to get the same amount of information obtained by the present sample. A DEFF value of 1 indicates that the present sample conveys the same amount of information obtained by the SRS. Design effect (DEFF) is usually greater than 1. However, in some cases DEFF is less than 1, which may be due to the presence of outliers and/or small sample size. ----> Precision estimates for SYPE 2014 key indicators Standard error and other precision measures are calculated for several selected SYPE 2014 key estimates. The selected indicators, the type of each estimate (mean, proportion, rate), and the base population are displayed in Table 1 in the final report available among the external resources. For each selected indicator, the indicator estimate value (Estimate), its standard error (SE), the 95% confidence limits (estimates+/- 2 SE), the coefficient of variation (CV=SE/estimate), and the design effect. For more information on the design effect, see tables S1-S39 and the listing on page 222 in the final report available among the external resources. It is worth mentioning here that the stratifying variable used for variance estimation is defined as the intersection of variables of urban-rural residence (urban, rural, informal urban area) and geographic region (Urban Governorates, urban Lower Egypt, rural Lower Egypt, urban Upper Egypt, rural Upper Egypt, and Frontier Governorates). This is the same stratification scheme that was followed in designing the SYPE sample where an independent sample was selected from each of the 10 substrata defined by the intersection of the two mentioned variables. The purpose of such stratification is to create the most possible homogeneous strata with regard to the survey variables; hence more precise survey estimates would be attained. Going through the precision measures tables the standard errors for indicators are smaller for population then for subpopulation. Table 31(in the final report available among the external resources) for example shows that the standard error for the percentage of married male youth in urban residence (0.012) is smaller than the SE of the same indicator in the urban Upper Egypt (0.029). Consequently, the 95% CI for the percentage of urban married male youth is (0.188, 0.237), which is smaller than the corresponding 95% CI for the same indicator in the urban Upper Egypt which is (0.184, 0.297). ** More information on the sampling errors of key indicators of SYPE is available in Appendix B in the final report available among the external resources. |