I. Descriptive Statistics
and Probability Why
statistics?
- Present and describe
information - Chs. 1, 2, 3
- Draw conclusions on a
larger population based on sample - Chs. 4, 14,
5, 6, 7, 8, 9, 10
- Improve processes -
Ch. 15
- Obtain forecasts for
variables of interest - Chs. 11, 12, 13
A. Collecting and
Presenting Data
1. Definitions
a. Population
- totality of items under consideration
b. Sample -
subset of population selected for analysis
c. Parameter
- summary measure that describes a characteristic
of a population
d. Statistic
- summary measure that describes a characteristic
of a sample from a population
e. Descriptive
statistics - describes the characteristics of
a set of data
f. Inferential
statistics - estimate characteristics of a
population based on sample results
2. Data
a. Sources
(1) Published
sources - government, industrial, or
individual
.
(2)
Experimentation
.
(3) Survey
.
(4)
Observation
.
(5) Point of
service
.
b. Types of data
(1) Time-series
- hold unit constant, vary across time
(2) Cross-section
- hold time constant, vary across units
(3) Categorical
data - categories
(4) Numerical
data - numeric results
(a) Discrete
data - results from a counting
process
(b) Continuous
data - results from a measuring
process
.
3. Samples
a. Concepts
(1) Frame
- list of all items from which sample will be
drawn
(2)
Replacement
- Sampling
with replacement - observation
returned to frame
- Sampling
without replacement - observation
not returned to frame
(3) Types of
samples
- Probability
sample - sample chosen on basis
of known probabilities
- Nonprobability
sample - probability of sample
being chosen unknown
(4) Randomness
- Random
number table - Table E.1, p. 832 -
833
.
.
.
Excel
function: =RANDBETWEEN(#1,#2)
.
.
.
.
.
.
b. Sampling
methods
(1) Simple
random sample - each item equally likely
to be chosen
(2) Systematic
sample - choose every kth item from a
list
Easier to
do if data already in the form of a list
Also
easier if one item produced at a time
(3) Stratified
sample - divide into categories, random
sample from each category
Want
sample to match characteristics of
population
(4) Cluster
sample
- Divide
population into clusters
- Choose
clusters at random
- Random
sample from each cluster
Should be
homogeneous across clusters,
heterogeneous within clusters
Less
costly if observations scattered
geographically
4. Sources of error
a. Coverage
error - exclude part of population
Selection bias
Ex. - Literary
Digest
How Polls are Conducted
b. Nonresponse
error - some people dont respond
Ex. -
Call screening
Upper and
lower classes less likely to respond
c. Sampling
error - wrong individuals chosen by chance
d. Measurement
error
(1) Question
wording - ambiguous or leading
Ex.
- Unemployment rate
Microsoft Rigged the Survey?
(2)
Interviewers effect on respondent - try
to please interviewer
"Halo"
effect
Ex.
- Race
(3) Effort
made by respondent - exaggeration, lack of
effort
Ex.
- TV ratings, consumer surveys
Key ethical issue
is intent - okay if errors made
unintentionally, unethical if deliberately done
5. Presenting data
a. Ordered array -
raw data in rank order
Use Sort
function: Data | Sort
.
.
.
b. Stem-and-leaf
display
.
.
.
Stem-and-leaf
option in PHStat
c. Frequency
distribution - table of class groupings or
categories
.
.
.
(1) Need
sufficient number of classes (5 - 15, 3 - 10)
(2) Class
interval
Width =
range / number of classes
.
.
.
.
.
Better to
round
(3) Class
boundaries
Avoid
overlapping
- Use Histogram
function of Excel: Tools | Data
Analysis | Histogram
.
.
.
.
.
.
d. Relative
frequency distribution
.
.
.
.
.
.
Percentage
distribution - convert relative frequencies
to percentages
e. Cumulative
distribution
.
.
.
.
.
.
Cumulative
relative frequency distribution
Cumulative
percentage distribution
f. Summary table
Frequency
distribution for categorical data
.
.
.
.
.
g. Contingency
table (cross-classification table)
Two
simultaneous categorical variables
.
.
.
.
.
.
Use PivotTable
function of Excel
6. Graphical
presentation
a. Types of graphs
(1) Histogram
.
.
.
.
.
.
.
Use Histogram
function of Excel: Tools | Data
Analysis | Histogram
(2) Percentage
polygon
.
.
.
.
.
.
.
(3) Cumulative
percentage polygon (ogive)
.
.
.
.
.
.
.
(4) Pareto
diagram
.
.
.
.
.
.
.
Use Histogram
function of Excel: Tools | Data
Analysis | Histogram
b. Principles of
graphical excellence
(1)
Well-designed presentation of data that
provides substance, statistics, and design
(2)
Communicates complex ideas with clarity,
precision, and efficiency
(3) Gives the
viewer the largest number of ideas in the
shortest time with the least ink
(4) Almost
always involves several dimensions
(5) Requires
telling the truth about the data
|