  MTH222 Version: Spring 2004

Location: Lazenska 4 Campus, Room: 001
Time: Thursdays 2:45 - 5:30 pm

Course Content

WeekDateMaterial to be covered
1February 5Introductory Session
2February 12Data Presentation
3February 19Central Tendency and Dispersion
4February 26Sampling and Data Collection
5March 4Index Numbers
6March 11Correlation
7March 18Chi-Squared Test
8April 1Introduction to Probability
9April 8Binomial Probability Distribution
10April 15Normal Probability Distribution
11April 22Introduction to Statistical Inference
12April 29Revision Session
13May 6Final Exam
14May 13Project Report Presentations

Week 1 - February 5, 2004

Introductory Session

Students will be informed about the requirements of the course and given advice how to achieve the best results. Instructions will be given as to the Group Project.

The broader social context of statistical work will be illustrated on Police and Social Service statistics.

Week 2 - February 12

Data Presentation

Key Concepts:

Raw data,
sample size n,
variables and their types,
grouped data,
categorization of data,
rules of unambiguity and comprehensiveness,
frequency tables,
pie charts,
bar charts (simple, compounded, multiple),
histograms and frequency polygons.

Table one - Student Housing

The following table of raw data comes from an internal coursebook used by a college in the North of England. I have included it for several reasons, one being that its shortcomings are quite illustrative as well. The table is a result of an internal survey of the housing situation conducted on a sample of 36 students.

Can you figure out the wording of the questionnaire used?

GenderAgeAccommodation TypeMonthly Rent
M18Lodgings125
M20Rented Property130
F18Own or Parents'120
F19Dormitory128
F18Dormitory137
F21Rented Property141
M25Own or Parents'162
F18Lodgings153
M18Dormitory136
M18Dormitory143
F19Rented Property129
M18Rented Property138
M19Rented Property135
M19Rented Property141
F20Dormitory152
F18Lodgings140
F20Lodgings133
M21Lodgings129
M18Rented Property157
M18Caravan100
F19Rented Property146
M19Rented Property135
F18Dormitory115
F18Dormitory110
M20Own or Parents'134
F20Dormitory127
M18Dormitory132
F18Rented Property135
F20Lodgings126
F28Own or Parents'110
M18Own or Parents'112
M19Rented Property130
F21Lodgings140
M23Dormitory135
F18YWCA145
M20Lodgings132

1. Identify the types of variables and categories of data.

2. Are the rules of categorizing data kept?

3. Set up a frequency table for each variable. (Treat rent as a continuous variable.)

4. Cross-tabulate sex and age, and age and accommodation.

5. Draw compounded and multiple bar charts for the above tables.

6. Draw a pie chart for the variable of accommodation.

7. Draw a histogram and a frequency polygon for rent.

8. The data tell us a lot about the character of the college where they were collected as well as the community in which it is located. Try to describe these.

Example of a particularly deceptive truncated graph.

Exercise

The following is a table of distances (in miles) travelled by a sample 40 company vehicles in a given week:

 138 164 150 144 125 149 157 146 158 140 147 136 148 152 144 168 126 138 176 163 119 154 165 146 173 142 147 135 153 140 135 161 145 135 142 150 156 145 128 133

Group the data into seven intervals and draw up a histogram. Answers

Students will announce members of the Group Project teams.
Homework: Francis: Student exercises 1, 4, 6, 8, 9 on page 45

Week 3 - February 19

Central Tendency and Dispersion

Statistical measures of representative value (values typical of a data set) and spread of data

Key Concepts:
Central values (averages), mean, mode, median, range,
quartiles, interquartile range, ogive,
standard deviation,
coefficient of dispersion,
skew.

The following refers to the table of Student Housing

1. Find the mode, mean, and median for the variable of age.
Is there a skew? Which is of the measures is the most representative one in this case?

2. Find the mean of both raw and grouped data for rent.
Compare the results and explain the difference. Is there a skew? Which is the modal class?

3. Draw up an ogive for rent to determine the median.

4. Determine the range of age and rent.

5. Find the respective interquartile ranges.

6. What is the relation between the median and the quartiles?

7. Find the standard deviation of age using the raw data and of rent using the grouped data.

8. Determine the coefficient of deviation for both of the above.

9. What do the measures of dispersion tell us about the data?

Make sure you understand the role played by central values and measures of spread in statistics. It is absolutely critical for your understanding of the second half of the semester.

Exercises

The following refers to the table of distances travelled by company vehicles which was used in last week´s class.

1. Calculate the mean, median, and mode by using

(i) the raw data
(ii) the frequency distribution

2. What do these statistics tell us about the distribution of data?

3. Using the raw data, calculate:

the range, interquartile range, variance, and standard deviation.

4. Calculate the standard deviation using the frequency distribution. Answers

Group Project: Each team will announce the subject chosen for their research.
Homework: Francis: all exercises on pages 75-6, 83-4, and exercises 5a and 6 on page 109.

Week 4 - February 26

Sampling and Data Collection

Students will bring draft questionnaires to be used in the Group Projects for discussion.

Key Concepts:
Population census, representative sample, primary and secondary data, sampling methods, simple and stratified random sample, non-random sampling methods (quota, cluster, multi-stage, systematic, convenience sampling), data collection methods (postal, telephone, face-to-face, observation), questionnaire design, dichotomous and multiple-choice questions, processing open-ended questions, sources of error and bias, planning a statistical survey.

Stages of in a Statistical Survey

A. Planning Stage

1. Define the aims of your survey and the state key hypothesis.

2. Determine the population and identify the possible location of its members.
Check any relevant secondary data and population parameters.

3. Choose a suitable sample size and sampling scheme.

4. Decide upon the method of data collection.

5. Design a questionnaire. Remember that the questions are the operational definitions of your future variables.

6. Select and train any personnel involved in data collection.

B. Fieldwork

7. Select the sample and collect the data.

8. Follow up non-responses if possible.

9. Collate and code the answers.

10. Screen the data for recording errors and outliers.

11. Carry on any statistical computations.

12. Identify and note any possible sources of error and bias.

13. Interpret the practical implication of your results.

C. Publication

14. Elaborate on any relevant background information you have upon the subject.

15. Explain your methodology and include the questionnaire.

16. Summarize the data and elaborate on your conclusions. The 2001 Census data on the Czech Republic in Czech and in English

Exercise

Find the table of Student Housing and the subsequent summaries.

1. What sampling method was most probably used? Was is appropriate?
2. Which method of data collection was most probably used? Answers

Homework: Francis: exercises 1-4 on page 16

Week 5 - March 4

Index Numbers

Measuring change in economic and business indicators.

Key Concepts:
Base period,
simple price index (price relative),
simple aggregate index,
weighted indices,
Paasche and Lespayres index,
changing the base period,
comparing indices with different base periods.

Description of the PX 50 Index
Composition of PX-D

Table Two - Index Numbers/Metals

ProductPrice p0Price p1Price p2Quantity q0Quantity q1Quantity q2
Copper600650700100150170
Zinc300600650101111

p0 - price in the base year, q0 - quantity in the base year

1. Find the simple price index for each commodity in years 1 and 2.

2. Find the simple quantity index for the same.

3. Find the simple aggregate price index.

4. Explain the importance of weighting indices.

5. Calculate Laspayres index and explain its limitations.

6. Calculate Paasche index and show its weaknesses.

7. Explain what reasons there may be for changing the base period.

8. There is a simple price index for platinum with the value of:
in year 0: 156,
in year 1: 166,
in year 2: 165.
Compare the development in the price of platinum and the other three metals.

Exercise

Product NamePrices (in US dollars) Quantities (in thousands)
199519961997199519961997
Thingums2.502.703.00697072
Buckles4.004.505.00433933
Sew-ons1.601.802.00505556

1. Taking 1995 as the base year, find for 1996 and 1997

Simple price index (price relative) for each product,

Simple quantity index for each product,

Simple aggregate price index, Lespayres index, Paasche index

2. According to Laspayres index, what has been the percentage rise between

(i) 1995 and 1996
(ii) 1995 and 1997
(iii) 1996 and 1997?

3. The trade unions have compiled a wage index for the avarage worker in this factory.

Its base is in 1990 and its later values were 1995 = 119, 1996 = 128, 1997 = 138.

Adjusting the base year, use Laspayres´ index above and the development of worker's

wages to see if the wage rises were the main factor that contributed to the price rises.

4. What was the mean annual gross income the fatory received from the sales of the above three products? And the standard deviation?

5. Draw a compound bar chart showing the quantities of the three products made over the three years. Answers

Homework: Francis: exercises 3 and 7 on page 179, suggested reading Part 5, Ch. 18 to 20.

Week 6 - March 11

Correlation
Measuring an association between two variables.

Minitest 1

Key Concepts:
Dependent and independent variables,
scatter diagrams,
linear and non-linear correlation,

rank and product-moment correlation coefficient,
critical values,
causality.

Exercises

Critical Values for Spearman Rank Coefficient: Table 1, Table 2
Critical Values for Pearson Product-Moment Coefficient: Table

1. Two groups of respondents were asked to try a set of seven brands of washing powder and rank them in order of preference. The results were as follows:

BrandABCDEFG
Group 11347625
Group 22437516

(a) Draw a scatter diagram to illustrate a possible link.

(b) Calculate a suitable correlation coefficient.

(c) Comment on the agreement between the two groups.

2. Eight people apply for a job and are assessed by an interview and by an aptitude test.

CandidateABCDEFGH
Interviewer's rank87432156
Percentage in test5030706070806060

(a) Calculate a suitable correlation coefficient.
(b) Comment on the agreement between the two forms of assessment.

3. Research grants are believed to be distributed in line with the number of successfully
completed projects in one year. The following data was recorded for seven institutions.

 Value of grant in millions Kč Number of projects published 30 40 60 10 30 50 40 5 4 9 2 4 6 8

(a) Draw a scatter diagram (be careful about x and y variables).

(b) Calculate the product-moment coefficient.

Here is what they replied (both in millions of dollars):

 Advertising Revenue 15 4.5 8 88 0.9 71 5 255 7.7 6.3 305 25 99 301 38 587 13 896 46 689

Using Pearson's correlation coefficient, show what link there is if any.

5. In the table of Student Housing try to identify a possible link between the student's age and rent paid. Use Pearson's coefficient.

6. In the table Index Numbers/Metals try to determine the relation between price and consumption in three separate calculations. Answers

Homework: Francis: exercises 3 and 7 on page 147

Week 7 - March 18

Chi-Squared Distribution c2

Looking for patterns in a crosstabulation of two sets of categorical data.

c 2 = S(o-e)2
e
with (r-1)(c-1) degrees of freedom

Key Concepts:
Null hypothesis,
confidence levels,
degree of freedom,
expected values,
calculated chi-square (test statistic),
critical values,
relative discrepancies against expectation,
identifying patterns

Exercises

Table 1 or Table 2

1. From two tosses of a fair coin, the following outcomes were found

20714612186 560

Test the hypothesis that the coin is unbiased at 5% significance level.

2. A random sample of 30 men and 70 women were asked to test if they could tell between butter and margarine. The results were

GenderCould tellCouldn't tell
Male21 9
Female4525

Do these results prove that men are better at their ability to distinguish between these two types of fat?

3. A national survey of consumers (USA, 1999)
concerning plans to purchase a new car in the next year revealed the following

Disposable family income
(in thousands of dollars)
Plan to purchaseNot decidedDon't planTotal
Under 206045308413
20 to 6013060425615
60 to 9916052361553
over 10011551303469
Total44520813972050

(a) Test the null hypothesis that there is no difference in the purchase plans of the consumers in the various income levels.
(b) Eliminate the under 20,000 class, so that there are only 1637 responses to be analysed and test the hypothesis again.

4. In the table of school leavers in the Probability exercises test for a possible association between the age left school and the future career.

5. In the table of Student Housing see if there is any link between gender or age on the one hand and accommodation type on the other. Answers

Reading and exercises: there is nothing in Francis, therefore, students are required to do some independent library work and find some relevant literature.

Advice: The chi-squared test is a very popular technique used in Group Projects.

March 25

Mid-Term Break

Students are expected to make some progress on their Group Projects.

Week 8April 1

Introduction to Probability

Mathematical expression of chance.

Key Concepts:
Events,
experimental and a-priori probability,
mutually exclusive and complementary events,
certainty and impossibility,
Venn diagrams,
independent events,

Exercises

1. Records of an airline show that a third of its stewardesses are blonde, a half are graduates, and three quarters have more than a year's service behind them.
What is the probability that a randomly selected stewardess is blonde and a graduate with less than a year service (assuming independence)?

2. Three balls are drawn from a bag containing 3 red, 4 white and 5 black balls. Calculate the probabilities that the three balls are

(a) all black
(b) 1 red, 1 white and 1 black.

3. The probability that student A solves a problem is two thirds and for student B it is four fifths. If both students try, what chance is there that the problem will be solved?

4. The probability that a customer buys spaghetti is 0.1. If the customer buys spaghetti, the probability that he/she will also buy spaghetti sauce is 0.4. What is the probability that the customer will buy both spaghetti and sauce?

5. Consider the following table of school-leavers

S = left school at 16 H = left at a higher ageTotal
E = going into full-time education141832
J = going into employment9644140
O = other (unemployed etc.)151328
Total12575200

Find the probability that a school-leaver selected at random:

(a) went into full-time education
(b) went into employment
(c) went into either education or employment
(d) left at the age of 16
(e) left at a higher age
(f) left at 16 and went into full-time education
(g) left at 16 or went into full-time education Answers

Homework: Francis: exercises 1, 2, 6, 8 on page 298, and 5, 7 on page 309
recommended reading Chapters 29, 30, Sections 6-10 in 31

Week 9 - April 8

Binomial Probability Distributions

Probability of values of a random discrete variable,
Minitest 2

Key Concepts:
Parameters of a distribution,
uniform and binomial probability distributions of discrete variables,
factorial numbers,
binomial coefficients,
the concepts of success (favoured event) and failure.

Exercises

1. A student is late for, on average, one lecture in four. If she has five lectures a week, what is the probability that she will

(a) be late for three of them?
(b) always come on time?
(c) be late for each lecture?

2. A manufacturer of Christmas crackers knows that three crackers in every twenty contain no paper hat. The crackers are sold in boxes of six. What is the probability that a randomly chosen box of crackers contains:

no crackers without paper hats?
(b) exactly one cracker with no paper hat?
(c) fewer than three crackers without paper hats?
(d) three or more crackers without paper hats?
(e) six crackers without paper hats in them?

3. A company has found that on average 20% of its 50 accounts are outstanding at the end of the month.
What is the probability that at the end of a particular month:

(a) 10 accounts are outstanding, (b) 50 accounts are outstanding, (c) 20 are outstanding

4. If x is a binomial random variable, calculate the probability of x for each case:

(a) n = 4, x = 1, p = 0.3
(b) n = 3, x = 2, p = 0.8
(c) n = 2, x = 0, p = 0.25
(d) n = 5, x = 2, p = 0.33

5. Let x be a random varible with the following probability distribution:

x0123
P(x)
0.4
0.3
0.2
0.1 Answers

6. Consider an experiment of tossing an unbiased coin five times. Assess the probability of each of the possible outcomes.

Week 10April 15

Normal Probability Distributions

Key Concepts:
Parameters of a normal probability distribution of a continuous variable,
Gaussian curve,
standard normal distribution,
the effects of
m and s,
area under the normal distribution
,
z
values,
upper and lower tail probabilities,
approximation of the normal to the binomial.

Exercises
Table of Critical Values

1. In the standard normal distribution work out what percentage of z will

(a) lie between 0 and 1.
(b) be greater than 1.
(c) be less than 1.
(d) lie between 0.5 and 1.
(e) be less than 2.
(f) lie between -1 and 1.
(g) be less than -1.
(h) be more than -0.5.

Find the following probabilities:

(a) P (z < -0.46)
(b) P (z > -0.46)
(c) P (z < 1.37)
(d) P (z > 1.855)
(e) P (z > 1.083)
(f) P (-1.2 < z < 2.15)

2. Find the z values for the probabilities where the:

(a) upper tail probability is 0.05.
(b) lower tail one is 0.01.
(c) middle 95% lie.
(d) middle 99% lie.

3. A nationwide company finds that the daily sales average 1050 units per salesperson with the standard deviation of 380.

(a) What proportion of salespersons sell fewer than 300 units a day?
(b) How much must one sell to be among the top 10% of the salesforce?

4. Last year, 10% of students failed an exam with the average mark of 50 and pass mark of 40.

What is the standard deviation?

5. A market research team finds that only 1% of its approaches to potential customers result in a sale. What is the probability that out of 150 approaches:

(a) 3 sales result
(b) no more that 2 sales result
(c) more than 2 sales result
(d) no sales result Answers       Alternative versions of the Standard Normal Distribution Table Version 1 and Version 2

Homework: Francis: exercises 3, 7 on page 323, and 6, 8 on page 336
suggested reading - relevant parts of Chapters 32 to 34

Group Projects
By now, students should have collected all the data and started processing the them. If that is not the case, it is a reason for concern and speeding up the work!

Week 11 - April 22

Introduction to Statistical Inference

Making claims about the population on the basis of sample data. Testing hypotheses about the population mean m.

Key Concepts:
Sampling distribution of the mean,
standard error, confidence intervals for the population mean,
significance levels, type I (
a) and type II (b) error,
null hypothesis (H
o),
alternative hypotheses (H
1),
one-tailed and two-tailed tests,

test statistic.

Exercises

1. A manufacturer knows that the variation in the tread life of the radial tyres they make is given by a standard deviation of 3000 miles. From a sample of 100 tyres tested, the sample mean is 32346 miles.

(a) Construct a 99% confidence interval estimate for the population mean tread life.
(b) Suppose that an estimate having the precision of ±100 miles is required. With what confidence can this precision be quoted?
(c) How many more tyres would have to be tested to have 99% confidence in an estimate of the same precision as in (b)?

2. The mean lifetime of 100 fluorescent light bulbs produced by a company is 1570 hours. The population standard deviation is known to be 120 hours. Test the hypothesis that the population mean lifetime of all the lights produced by the company is 1600 hours.

3. The data below refer to a random sample of 30 suburban homes in and around Prague and their valuations (in millions of crowns):

 3.8 4.9 5.4 2.9 6.2 6 4.7 2.4 5.1 6.2 3.5 3.4 3 4.3 4.3 7.2 6.1 5 5 5.2 4.1 3.1 3.6 5.9 4.6 4.3 4.8 1.9 4.2 6.5

(a) Estimate the population mean and variance of house prices in the region.
(b) Find 95% and 99% confidence intervals for the true mean price. It may be assumed that the standard deviation is a million crowns.
(c) A recent report suggested that the mean price is actually 5.1 million crowns. Test the claim assuming the standard deviation is a million crowns. Answers

Homework: Francis: Student exercises 10 and 15 in Chapter 34 Section 24
suggested reading Chapter 34, Sections 15 to 23

Week 12 - April 29

Revision Session

Make sure you bring all your problems and queries for discussion. There will not be another chance before the final exam and the project presentations.

Week 13 - May 6

Final Examination

Students are advised to arrive at least 10 minutes before the start. Late arrivals will not be admitted. Use the toilets before the exams - nobody will be allowed to re-enter the examination room. All bags, mobile phones, coats, books and notes must be placed along the front wall. The only things the student may take to the desk are a calculator, a ruler, pens and pencils (of several different colors), spare batteries and drinking water. Other items only at the discretion of the examiner. Mobile phones must be switched off - any disruption of the exam may lead to the student's disqualification. Questions are allowed only in the first ten minutes of the exams - so read the exam questions carefully before you start answering. The desks must be placed at least two meters (six feet) apart. No communication is allowed including swapping calculators. No dictionaries are allowed. Students may leave the room at any moment but cannot reenter.

Maximum time is 120 minutes. Failure to return the answer book in time may result in disqualification. The minimum requirement for the exam is 20 points. A lower score will result in an incomplete grade.

Only two out of the three questions will be graded. Any attempt to answer additional questions is a waste of time! Complete formulae sheets are provided attached to the answer books along with all the necessary statistical tables. Illegible answers, nonsensical sentences and incomplete graphs will be ignored. Extra points can be gained for well organized answers, clear explanations and logical interpretation of results.

Week 14 - May 13

Group Project Presentations
Click here if you want to make a query: Jansen Raichl, February 1, 2004