MTH222 Version: Spring 2004 Location: Lazenska 4 Campus, Room: 001 |
---|
Course Content
Week | Date | Material to be covered |
---|---|---|
1 | February 5 | Introductory Session |
2 | February 12 | Data Presentation |
3 | February 19 | Central Tendency and Dispersion |
4 | February 26 | Sampling and Data Collection |
5 | March 4 | Index Numbers |
6 | March 11 | Correlation |
7 | March 18 | Chi-Squared Test |
8 | April 1 | Introduction to Probability |
9 | April 8 | Binomial Probability Distribution |
10 | April 15 | Normal Probability Distribution |
11 | April 22 | Introduction to Statistical Inference |
12 | April 29 | Revision Session |
13 | May 6 | Final Exam |
14 | May 13 | Project Report Presentations |
Week 1 - February 5, 2004
Introductory Session
Students will be informed about the requirements of the course and given advice how to achieve the best results. Instructions will be given as to the Group Project.
The broader social context of statistical work will be illustrated on Police and Social Service statistics.
Data Presentation
Key Concepts:
Raw data,
sample size n,
variables and their types,
grouped data,
categorization of data,
rules of unambiguity and comprehensiveness,
frequency tables,
pie charts,
bar charts (simple, compounded, multiple),
histograms and frequency polygons.
Table one - Student Housing
The following table of raw data comes from an internal coursebook used by a college in the North of England. I have included it for several reasons, one being that its shortcomings are quite illustrative as well. The table is a result of an internal survey of the housing situation conducted on a sample of 36 students.
Can you figure out the wording of the questionnaire used?
Gender | Age | Accommodation Type | Monthly Rent |
---|---|---|---|
M | 18 | Lodgings | 125 |
M | 20 | Rented Property | 130 |
F | 18 | Own or Parents' | 120 |
F | 19 | Dormitory | 128 |
F | 18 | Dormitory | 137 |
F | 21 | Rented Property | 141 |
M | 25 | Own or Parents' | 162 |
F | 18 | Lodgings | 153 |
M | 18 | Dormitory | 136 |
M | 18 | Dormitory | 143 |
F | 19 | Rented Property | 129 |
M | 18 | Rented Property | 138 |
M | 19 | Rented Property | 135 |
M | 19 | Rented Property | 141 |
F | 20 | Dormitory | 152 |
F | 18 | Lodgings | 140 |
F | 20 | Lodgings | 133 |
M | 21 | Lodgings | 129 |
M | 18 | Rented Property | 157 |
M | 18 | Caravan | 100 |
F | 19 | Rented Property | 146 |
M | 19 | Rented Property | 135 |
F | 18 | Dormitory | 115 |
F | 18 | Dormitory | 110 |
M | 20 | Own or Parents' | 134 |
F | 20 | Dormitory | 127 |
M | 18 | Dormitory | 132 |
F | 18 | Rented Property | 135 |
F | 20 | Lodgings | 126 |
F | 28 | Own or Parents' | 110 |
M | 18 | Own or Parents' | 112 |
M | 19 | Rented Property | 130 |
F | 21 | Lodgings | 140 |
M | 23 | Dormitory | 135 |
F | 18 | YWCA | 145 |
M | 20 | Lodgings | 132 |
Tasks
1. Identify the types of variables and categories of data.
2. Are the rules of categorizing data kept?
3. Set up a frequency table for each variable. (Treat rent as a continuous variable.)
4. Cross-tabulate sex and age, and age and accommodation.
5. Draw compounded and multiple bar charts for the above tables.
6. Draw a pie chart for the variable of accommodation.
7. Draw a histogram and a frequency polygon for rent.
8. The data tell us a lot about the character of the college where they were collected as well as the community in which it is located. Try to describe these.
Example of a particularly deceptive truncated graph.Exercise
The following is a table of distances (in miles) travelled by a sample 40 company vehicles in a given week:
138 | 164 | 150 | 144 | 125 | 149 | 157 | 146 |
158 | 140 | 147 | 136 | 148 | 152 | 144 | 168 |
126 | 138 | 176 | 163 | 119 | 154 | 165 | 146 |
173 | 142 | 147 | 135 | 153 | 140 | 135 | 161 |
145 | 135 | 142 | 150 | 156 | 145 | 128 | 133 |
Group the data into seven intervals and draw up a histogram.
Students will announce members of the Group Project teams.
Homework: Francis: Student exercises 1, 4, 6, 8, 9 on page 45
Central Tendency and Dispersion
Statistical measures of representative value (values typical of a data set) and spread of data
Key Concepts:
Central values (averages), mean, mode, median, range,
quartiles, interquartile range, ogive,
standard deviation,
coefficient of dispersion,
skew.
Tasks
The following refers to the table of Student Housing
1. Find the mode, mean, and median for the variable of age.
Is there a skew? Which is of the measures is the most representative one in this case?
2. Find the mean of both raw and grouped data for rent.
Compare the results and explain the difference. Is there a skew? Which is the modal class?
3. Draw up an ogive for rent to determine the median.
4. Determine the range of age and rent.
5. Find the respective interquartile ranges.
6. What is the relation between the median and the quartiles?
7. Find the standard deviation of age using the raw data and of rent using the grouped data.
8. Determine the coefficient of deviation for both of the above.
9. What do the measures of dispersion tell us about the data?
Make sure you understand the role played by central values and measures of spread in statistics. It is absolutely critical for your understanding of the second half of the semester.
Exercises
The following refers to the table of distances travelled by company vehicles which was used in last week´s class.
1. Calculate the mean, median, and mode by using
(i) the raw data
(ii) the frequency distribution
2. What do these statistics tell us about the distribution of data?
3. Using the raw data, calculate:
the range, interquartile range, variance, and standard deviation.
4. Calculate the standard deviation using the frequency distribution.
Answers
Group Project: Each team will announce the subject chosen for their research.
Homework: Francis: all exercises on pages 75-6, 83-4, and exercises 5a and 6 on page 109.
Sampling and Data Collection
Students will bring draft questionnaires to be used in the Group Projects for discussion.
Key Concepts:
Population census, representative sample, primary and secondary data, sampling methods, simple and stratified random sample, non-random sampling methods (quota, cluster, multi-stage, systematic, convenience sampling), data collection methods (postal, telephone, face-to-face, observation), questionnaire design, dichotomous and multiple-choice questions, processing open-ended questions, sources of error and bias, planning a statistical survey.
Stages of in a Statistical Survey
A. Planning Stage
1. Define the aims of your survey and the state key hypothesis.
2. Determine the population and identify the possible location of its members.
Check any relevant secondary data and population parameters.
3. Choose a suitable sample size and sampling scheme.
4. Decide upon the method of data collection.
5. Design a questionnaire. Remember that the questions are the operational definitions of your future variables.
6. Select and train any personnel involved in data collection.
B. Fieldwork
7. Select the sample and collect the data.
8. Follow up non-responses if possible.
9. Collate and code the answers.
10. Screen the data for recording errors and outliers.
11. Carry on any statistical computations.
12. Identify and note any possible sources of error and bias.
13. Interpret the practical implication of your results.
C. Publication
14. Elaborate on any relevant background information you have upon the subject.
15. Explain your methodology and include the questionnaire.
16. Summarize the data and elaborate on your conclusions.
Exercise
Find the table of Student Housing and the subsequent summaries.
1. What sampling method was most probably used? Was is appropriate?
2. Which method of data collection was most probably used?
Homework: Francis: exercises 1-4 on page 16
read Chapter 2, Sections 1-25
Index Numbers
Measuring change in economic and business indicators.
Key Concepts:
Base period,
simple price index (price relative),
simple aggregate index,
weighted indices,
Paasche and Lespayres index,
changing the base period,
comparing indices with different base periods.
Table Two - Index Numbers/Metals
Product | Price p_{0} | Price p_{1} | Price p_{2} | Quantity q_{0} | Quantity q_{1} | Quantity q_{2} |
---|---|---|---|---|---|---|
Copper | 600 | 650 | 700 | 100 | 150 | 170 |
Lead | 200 | 250 | 300 | 50 | 50 | 60 |
Zinc | 300 | 600 | 650 | 10 | 11 | 11 |
p0 - price in the base year, q0 - quantity in the base year
Tasks
1. Find the simple price index for each commodity in years 1 and 2.
2. Find the simple quantity index for the same.
3. Find the simple aggregate price index.
4. Explain the importance of weighting indices.
5. Calculate Laspayres index and explain its limitations.
6. Calculate Paasche index and show its weaknesses.
7. Explain what reasons there may be for changing the base period.
8. There is a simple price index for platinum with the value of:
in year 0: 156,
in year 1: 166,
in year 2: 165.
Compare the development in the price of platinum and the other three metals.
Exercise
Product Name | Prices (in US dollars) | Quantities (in thousands) |
---|
1995 | 1996 | 1997 | 1995 | 1996 | 1997 | |
---|---|---|---|---|---|---|
Thingums | 2.50 | 2.70 | 3.00 | 69 | 70 | 72 |
Buckles | 4.00 | 4.50 | 5.00 | 43 | 39 | 33 |
Sew-ons | 1.60 | 1.80 | 2.00 | 50 | 55 | 56 |
1. Taking 1995 as the base year, find for 1996 and 1997
Simple price index (price relative) for each product,
Simple quantity index for each product,
Simple aggregate price index, Lespayres index, Paasche index
2. According to Laspayres index, what has been the percentage rise between
(i) 1995 and 1996
(ii) 1995 and 1997
(iii) 1996 and 1997?
3. The trade unions have compiled a wage index for the avarage worker in this factory.
Its base is in 1990 and its later values were 1995 = 119, 1996 = 128, 1997 = 138.
Adjusting the base year, use Laspayres´ index above and the development of worker's
wages to see if the wage rises were the main factor that contributed to the price rises.
4. What was the mean annual gross income the fatory received from the sales of the above three products? And the standard deviation?
5. Draw a compound bar chart showing the quantities of the three products made over the three years.
AnswersHomework: Francis: exercises 3 and 7 on page 179, suggested reading Part 5, Ch. 18 to 20.
Correlation
Measuring an association between two variables.
Key Concepts:
Dependent and independent variables,
scatter diagrams,
linear and non-linear correlation,
rank and product-moment correlation coefficient,
critical values,
causality.
Exercises
Critical Values for Spearman Rank Coefficient: Table 1, Table 21. Two groups of respondents were asked to try a set of seven brands of washing powder and rank them in order of preference. The results were as follows:
Brand | A | B | C | D | E | F | G |
---|---|---|---|---|---|---|---|
Group 1 | 1 | 3 | 4 | 7 | 6 | 2 | 5 |
Group 2 | 2 | 4 | 3 | 7 | 5 | 1 | 6 |
(a) Draw a scatter diagram to illustrate a possible link.
(b) Calculate a suitable correlation coefficient.
(c) Comment on the agreement between the two groups.
2. Eight people apply for a job and are assessed by an interview and by an aptitude test.
Candidate | A | B | C | D | E | F | G | H |
---|---|---|---|---|---|---|---|---|
Interviewer's rank | 8 | 7 | 4 | 3 | 2 | 1 | 5 | 6 |
Percentage in test | 50 | 30 | 70 | 60 | 70 | 80 | 60 | 60 |
(a) Calculate a suitable correlation coefficient.
(b) Comment on the agreement between the two forms of assessment.
3. Research grants are believed to be distributed in line with the number of successfully
completed projects in one year. The following data was recorded for seven institutions.
Value of grant in millions Kč | 30 | 40 | 60 | 10 | 30 | 50 | 40 |
---|---|---|---|---|---|---|---|
Number of projects published | 5 | 4 | 9 | 2 | 4 | 6 | 8 |
(a) Draw a scatter diagram (be careful about x and y variables).
(b) Calculate the product-moment coefficient.
4. Directors of ten large companies were asked about their advertising budget and sales in 2000.
Here is what they replied (both in millions of dollars):
Advertising | 15 | 4.5 | 8 | 88 | 0.9 | 71 | 5 | 255 | 7.7 | 6.3 |
---|---|---|---|---|---|---|---|---|---|---|
Revenue | 305 | 25 | 99 | 301 | 38 | 587 | 13 | 896 | 46 | 689 |
Using Pearson's correlation coefficient, show what link there is if any.
5. In the table of Student Housing try to identify a possible link between the student's age and rent paid. Use Pearson's coefficient.
6. In the table Index Numbers/Metals try to determine the relation between price and consumption in three separate calculations.
Homework: Francis: exercises 3 and 7 on page 147
Suggested reading Chapter 14
Chi-Squared Distribution c2
Looking for patterns in a crosstabulation of two sets of categorical data.
c ^{2} = S(o-e)^{2} e |
---|
Key Concepts:
Null hypothesis,
confidence levels,
degree of freedom,
expected values,
calculated chi-square (test statistic),
critical values,
relative discrepancies against expectation,
identifying patterns
Exercises
Table 1 or Table 21. From two tosses of a fair coin, the following outcomes were found
Two Heads | Head/Tails | Tails/Head | TwiceTails | Total |
---|---|---|---|---|
207 | 146 | 121 | 86 | 560 |
Test the hypothesis that the coin is unbiased at 5% significance level.
2. A random sample of 30 men and 70 women were asked to test if they could tell between butter and margarine. The results were
Gender | Could tell | Couldn't tell |
---|---|---|
Male | 21 | 9 |
Female | 45 | 25 |
Do these results prove that men are better at their ability to distinguish between these two types of fat?
3. A national survey of consumers (USA, 1999)
concerning plans to purchase a new car in the next year revealed the following
Disposable family income (in thousands of dollars) | Plan to purchase | Not decided | Don't plan | Total |
---|---|---|---|---|
Under 20 | 60 | 45 | 308 | 413 |
20 to 60 | 130 | 60 | 425 | 615 |
60 to 99 | 160 | 52 | 361 | 553 |
over 100 | 115 | 51 | 303 | 469 |
Total | 445 | 208 | 1397 | 2050 |
(a) Test the null hypothesis that there is no difference in the purchase plans of the consumers in the various income levels.
(b) Eliminate the under 20,000 class, so that there are only 1637 responses to be analysed and test the hypothesis again.
4. In the table of school leavers in the Probability exercises test for a possible association between the age left school and the future career.
5. In the table of Student Housing see if there is any link between gender or age on the one hand and accommodation type on the other.
Answers
Reading and exercises: there is nothing in Francis, therefore, students are required to do some independent library work and find some relevant literature.
Advice: The chi-squared test is a very popular technique used in Group Projects.
Mid-Term Break
Students are expected to make some progress on their Group Projects.
Introduction to Probability
Mathematical expression of chance.
Key Concepts:
Events,
experimental and a-priori probability,
mutually exclusive and complementary events,
certainty and impossibility,
Venn diagrams,
independent events,
addition and multiplication of probabilities.
Exercises
1. Records of an airline show that a third of its stewardesses are blonde, a half are graduates, and three quarters have more than a year's service behind them.
What is the probability that a randomly selected stewardess is blonde and a graduate with less than a year service (assuming independence)?
2. Three balls are drawn from a bag containing 3 red, 4 white and 5 black balls. Calculate the probabilities that the three balls are
(a) all black
(b) 1 red, 1 white and 1 black.
3. The probability that student A solves a problem is two thirds and for student B it is four fifths. If both students try, what chance is there that the problem will be solved?
4. The probability that a customer buys spaghetti is 0.1. If the customer buys spaghetti, the probability that he/she will also buy spaghetti sauce is 0.4. What is the probability that the customer will buy both spaghetti and sauce?
5. Consider the following table of school-leavers
S = left school at 16 | H = left at a higher age | Total | |
---|---|---|---|
E = going into full-time education | 14 | 18 | 32 |
J = going into employment | 96 | 44 | 140 |
O = other (unemployed etc.) | 15 | 13 | 28 |
Total | 125 | 75 | 200 |
Find the probability that a school-leaver selected at random:
(a) went into full-time educationHomework: Francis: exercises 1, 2, 6, 8 on page 298, and 5, 7 on page 309
recommended reading Chapters 29, 30, Sections 6-10 in 31
Binomial Probability Distributions
Probability of values of a random discrete variable,
Minitest 2
Key Concepts:
Parameters of a distribution,
uniform and binomial probability distributions of discrete variables,
factorial numbers,
binomial coefficients,
the concepts of success (favoured event) and failure.
Exercises
1. A student is late for, on average, one lecture in four. If she has five lectures a week, what is the probability that she will
(a) be late for three of them?
(b) always come on time?
(c) be late for each lecture?
2. A manufacturer of Christmas crackers knows that three crackers in every twenty contain no paper hat. The crackers are sold in boxes of six. What is the probability that a randomly chosen box of crackers contains:
no crackers without paper hats?
(b) exactly one cracker with no paper hat?
(c) fewer than three crackers without paper hats?
(d) three or more crackers without paper hats?
(e) six crackers without paper hats in them?
3. A company has found that on average 20% of its 50 accounts are outstanding at the end of the month.
What is the probability that at the end of a particular month:
(a) 10 accounts are outstanding, (b) 50 accounts are outstanding, (c) 20 are outstanding
4. If x is a binomial random variable, calculate the probability of x for each case:
(a) n = 4, x = 1, p = 0.3
(b) n = 3, x = 2, p = 0.8
(c) n = 2, x = 0, p = 0.25
(d) n = 5, x = 2, p = 0.33
5. Let x be a random varible with the following probability distribution:
x | 0 | 1 | 2 | 3 |
---|---|---|---|---|
P(x) |
Does x have a binomial distribution? Explain your answer.
6. Consider an experiment of tossing an unbiased coin five times. Assess the probability of each of the possible outcomes.
Normal Probability Distributions
Key Concepts:
Parameters of a normal probability distribution of a continuous variable,
Gaussian curve,
standard normal distribution,
the effects of m and s,
area under the normal distribution,
z values,
upper and lower tail probabilities,
approximation of the normal to the binomial.
1. In the standard normal distribution work out what percentage of z will
(a) lie between 0 and 1.
(b) be greater than 1.
(c) be less than 1.
(d) lie between 0.5 and 1.
(e) be less than 2.
(f) lie between -1 and 1.
(g) be less than -1.
(h) be more than -0.5.
Find the following probabilities:
(a) P (z < -0.46)
(b) P (z > -0.46)
(c) P (z < 1.37)
(d) P (z > 1.855)
(e) P (z > 1.083)
(f) P (-1.2 < z < 2.15)
2. Find the z values for the probabilities where the:
(a) upper tail probability is 0.05.
(b) lower tail one is 0.01.
(c) middle 95% lie.
(d) middle 99% lie.
3. A nationwide company finds that the daily sales average 1050 units per salesperson with the standard deviation of 380.
(a) What proportion of salespersons sell fewer than 300 units a day?
(b) How much must one sell to be among the top 10% of the salesforce?
4. Last year, 10% of students failed an exam with the average mark of 50 and pass mark of 40.
What is the standard deviation?
5. A market research team finds that only 1% of its approaches to potential customers result in a sale. What is the probability that out of 150 approaches:
(a) 3 sales result
(b) no more that 2 sales result
(c) more than 2 sales result
(d) no sales result
Homework: Francis: exercises 3, 7 on page 323, and 6, 8 on page 336
suggested reading - relevant parts of Chapters 32 to 34
Group Projects
By now, students should have collected all the data and started processing the them. If that is not the case, it is a reason for concern and speeding up the work!
Introduction to Statistical Inference
Making claims about the population on the basis of sample data. Testing hypotheses about the population mean m.
Key Concepts:
Sampling distribution of the mean,
standard error, confidence intervals for the population mean,
significance levels, type I (a) and type II (b) error,
null hypothesis (Ho),
alternative hypotheses (H1),
one-tailed and two-tailed tests,
test statistic.
Exercises
1. A manufacturer knows that the variation in the tread life of the radial tyres they make is given by a standard deviation of 3000 miles. From a sample of 100 tyres tested, the sample mean is 32346 miles.
(a) Construct a 99% confidence interval estimate for the population mean tread life.
(b) Suppose that an estimate having the precision of ±100 miles is required. With what confidence can this precision be quoted?
(c) How many more tyres would have to be tested to have 99% confidence in an estimate of the same precision as in (b)?
2. The mean lifetime of 100 fluorescent light bulbs produced by a company is 1570 hours. The population standard deviation is known to be 120 hours. Test the hypothesis that the population mean lifetime of all the lights produced by the company is 1600 hours.
3. The data below refer to a random sample of 30 suburban homes in and around Prague and their valuations (in millions of crowns):
3.8 | 4.9 | 5.4 | 2.9 | 6.2 | 6 |
4.7 | 2.4 | 5.1 | 6.2 | 3.5 | 3.4 |
3 | 4.3 | 4.3 | 7.2 | 6.1 | 5 |
5 | 5.2 | 4.1 | 3.1 | 3.6 | 5.9 |
4.6 | 4.3 | 4.8 | 1.9 | 4.2 | 6.5 |
(a) Estimate the population mean and variance of house prices in the region.
(b) Find 95% and 99% confidence intervals for the true mean price. It may be assumed that the standard deviation is a million crowns.
(c) A recent report suggested that the mean price is actually 5.1 million crowns. Test the claim assuming the standard deviation is a million crowns.
Homework: Francis: Student exercises 10 and 15 in Chapter 34 Section 24
suggested reading Chapter 34, Sections 15 to 23
Revision Session
Make sure you bring all your problems and queries for discussion. There will not be another chance before the final exam and the project presentations.
Final Examination
Students are advised to arrive at least 10 minutes before the start. Late arrivals will not be admitted. Use the toilets before the exams - nobody will be allowed to re-enter the examination room. All bags, mobile phones, coats, books and notes must be placed along the front wall. The only things the student may take to the desk are a calculator, a ruler, pens and pencils (of several different colors), spare batteries and drinking water. Other items only at the discretion of the examiner.
Mobile phones must be switched off - any disruption of the exam may lead to the student's disqualification. Questions are allowed only in the first ten minutes of the exams - so read the exam questions carefully before you start answering. The desks must be placed at least two meters (six feet) apart. No communication is allowed including swapping calculators. No dictionaries are allowed. Students may leave the room at any moment but cannot reenter.
Maximum time is 120 minutes. Failure to return the answer book in time may result in disqualification.
The minimum requirement for the exam is 20 points. A lower score will result in an incomplete grade.
Only two out of the three questions will be graded. Any attempt to answer additional questions is a waste of time! Complete formulae sheets are provided attached to the answer books along with all the necessary statistical tables. Illegible answers, nonsensical sentences and incomplete graphs will be ignored. Extra points can be gained for well organized answers, clear explanations and logical interpretation of results.
Group Project Presentations
(Read the guidelines carefully)
All students must be present throughout this session. Print your reports one day before in order to prevent possible delays. The presentations will be videotaped and so must be at a professional level and well rehearsed. There is a time limit of 15 minutes after which the lecturer will terminate the presentation. Please, speak slowly and clearly, mention only the key points, do not show too many graphs and figures, but instead explain your motivation, background knowledge, the actual process of collecting the answers and the possible practical use of the results in simple English. Say anything that is interesting to make sure your audience keeps listening. Students from other groups may gain up to three points each for asking relevant questions.