Sampling procedure
Algeria:
National sample size: 2,036 young people.
Final sample size in the data set after cleaning: 2,036. No observations were disregarded.
Sampling frame: General Census of population and Housing 2008 provides a list of 1,198 districts and 21,502 households.
Sampling design: sub-sample of the Survey on Employment in household (Enquête Emploi auprès des Ménages), conducted by the Office National des Statistiques in 2013.
Morocco:
National sample size: 2,000 young people.
Final sample size in the data set after cleaning: 1,854 young people. After data cleaning, 146 observations had to be excluded as they were out of the age range.
Sampling Frame: data published by the High Commissioner for Planning (Haut-Commissariat au Plan, HCP) after the census carried out in 2004 and 2014 (www.hcp.ma/file/111366 <http://www.hcp.ma/file/111366>).
Sampling design: For details about the sampling design in Morocco please refer to ““SAHWA Documentation Report-Final in the documents” in the documents.
Tunisia:
National sample size: 2,000 young people.
Final sample size in the data set after cleaning: 2,000 young people. No observations were removed disregarded.
Sampling Frame: census data of the Institute of National Statistics updated on 2014.
Sampling design: For details about the sampling design in Tunisia please refer to ““SAHWA Documentation Report-Final in the documents” in the documents
Egypt:
Initial national sample size before cleaning: 2,006.
Final sample size in the data set after cleaning: 1,970. During the cleaning process, 36 observations had to be disregarded for being out of the age range. Sampling framework: The primarily unit of selection (PSU) are selected from the master sample of the Central Agency of Public Mobilization and Statistics, based on the General Census 2006.
Sampling design: Random selection from the SYPE 2014 sample, after excluding the borders/Frontiers governorates. The sample was selected in two stages. First, about 140 PSUs from the pool of 451 PSU of the SYPE 2014 sample were selected. In the second stage, all individuals aged 15-29 (as of 2016) in the selected PSU were interviewed.
Lebanon:
National sample size: 2,000 young people.
Final sample size in the data set after cleaning: 2,000. No observations were removed.
Sampling Frame: due to the lack of a household-based sampling frame, the Lebanese Statistical Agency uses geographical blocks in localities as the sampling units of analysis. These blocks are considered the basic unit in the sampling procedure. The Census of Buildings, Dwellings and Establishments (2004) and the Lebanese Household Budget Survey (2004) were used as sample frames.
Sampling design: a 2-stage sampling design based on the Census of Buildings, Dwellings and Establishments (2004) was used. The Lebanese territory is divided into 6 administrative regions, and these regions are divided into 26 smaller units or Caza. At the same time, each Caza contains the primary sampling unit, the blocks, bordered by streets and other barriers. The selection of the sampling units in each block is carried out according to the Census of Buildings (2004).
Step 1: selection in the first stage consists of a random selection of block
Step 2: random selection of primary sampling unit in each block.
Weighting
The SAHWA YS dataset contains two different types of weighting variables: design or sampling weights and population weights.
Design or sampling weights (dweight variable in the SAHWA dataset): during the sampling process, most countries of study -Algeria, Egypt, Morocco and Tunisia- used complex sampling designs (either stratified or cluster). Using these kinds of design means some individuals or subregions may have a higher probability of being selected as part of the sample, which may lead to over-representation of some groups of respondents or sub-regions. Weighting statistical analyses corrects for these inequalities, obtaining results not affected by possible sample bias and representative of the total population.
To correct for inequal probabilities of selection in the target population due to complex sample designs, a design weight for each individual was calculated as:
dweightij= 1/ Probability of selection of the ith individual in the jth country
Due to the self-weighted sampling design, no weighting is necessary in Lebanon. Therefore, all design weights for Lebanon are 1.
Population weights: In cross-national surveys, equal sample sizes in unequal target population sizes across countries may lead to over-representation of smaller countries at the expense of larger ones. We call target population size the total number of population under study. For the SAHWA YS, the target population is all young people between 15 and 29 years old. Population weights (pweight variable in the SAHWA dataset) adjust the data to ensure that each country is represented in proportion to its actual target population size. These weights aim to correct for the bias introduced by almost equal sample sizes but very different target population sizes.
They are calculated as pweight= Target population in country/ Sample size in country *10
Depending on the purpose of the investigation, the researcher needs to use one or both weighting variables:
• When analyzing data from a single country, only the design weight needs to be used.
• When analyzing data from two or more countries and the research interest is to compare scores across countries, only design weight needs to be applied.
• When analyzing data to describe a group of countries or a region, without distinguishing across countries, both design and population weights need to be applied.
For further details please read the “SAHWA Documentation Report-Final" in the documentation tab.