Graduate Business Administration 502

INFORMATION AND ANALYSIS

Fall 2003
 
| HOME | SYLLABUS | CALENDAR | ASSIGNMENTS | ABOUT PROF. GIN |
 
I. Descriptive Statistics and Probability

Why statistics?

  • Present and describe information - Chs. 1, 2, 3
  • Draw conclusions on a larger population based on sample - Chs. 4, 14, 5, 6, 7, 8, 9, 10
  • Improve processes - Ch. 15
  • Obtain forecasts for variables of interest - Chs. 11, 12, 13

A. Collecting and Presenting Data

1. Definitions

a. Population - totality of items under consideration

b. Sample - subset of population selected for analysis

c. Parameter - summary measure that describes a characteristic of a population

d. Statistic - summary measure that describes a characteristic of a sample from a population

e. Descriptive statistics - describes the characteristics of a set of data

f. Inferential statistics - estimate characteristics of a population based on sample results

2. Data

a. Sources

(1) Published sources - government, industrial, or individual

.

(2) Experimentation

.

(3) Survey

.

(4) Observation

.

(5) Point of service

.

b. Types of data

(1) Time-series - hold unit constant, vary across time

(2) Cross-section - hold time constant, vary across units

(3) Categorical data - categories

(4) Numerical data - numeric results

(a) Discrete data - results from a counting process

(b) Continuous data - results from a measuring process

.

3. Samples

a. Concepts

(1) Frame - list of all items from which sample will be drawn

(2) Replacement

  • Sampling with replacement - observation returned to frame
  • Sampling without replacement - observation not returned to frame

(3) Types of samples

  • Probability sample - sample chosen on basis of known probabilities
  • Nonprobability sample - probability of sample being chosen unknown

(4) Randomness

  • Random number table - Table E.1, p. 832 - 833

.

.

.

  • Spreadsheet

Excel function: =RANDBETWEEN(#1,#2)

.

.

.

.

  • Calculator

.

.

b. Sampling methods

(1) Simple random sample - each item equally likely to be chosen

(2) Systematic sample - choose every kth item from a list

Easier to do if data already in the form of a list

Also easier if one item produced at a time

(3) Stratified sample - divide into categories, random sample from each category

Want sample to match characteristics of population

(4) Cluster sample

  • Divide population into clusters
  • Choose clusters at random
  • Random sample from each cluster

Should be homogeneous across clusters, heterogeneous within clusters

Less costly if observations scattered geographically

4. Sources of error

a. Coverage error - exclude part of population

Selection bias

Ex. - Literary Digest

How Polls are Conducted

b. Nonresponse error - some people don’t respond

Ex. - Call screening

Upper and lower classes less likely to respond

c. Sampling error - wrong individuals chosen by chance

d. Measurement error

(1) Question wording - ambiguous or leading

Ex. - Unemployment rate

Microsoft Rigged the Survey?

(2) Interviewer’s effect on respondent - try to please interviewer

"Halo" effect

Ex. - Race

(3) Effort made by respondent - exaggeration, lack of effort

Ex. - TV ratings, consumer surveys

Key ethical issue is intent - okay if errors made unintentionally, unethical if deliberately done

5. Presenting data

a. Ordered array - raw data in rank order

Use Sort function: Data | Sort

.

.

.

b. Stem-and-leaf display

.

.

.

Stem-and-leaf option in PHStat

c. Frequency distribution - table of class groupings or categories

.

.

.

(1) Need sufficient number of classes (5 - 15, 3 - 10)

(2) Class interval

Width = range / number of classes

.

.

.

.

.

Better to round

(3) Class boundaries

Avoid overlapping

  • Use Histogram function of Excel: Tools | Data Analysis | Histogram

.

.

.

.

.

.

d. Relative frequency distribution

.

.

.

.

.

.

Percentage distribution - convert relative frequencies to percentages

e. Cumulative distribution

.

.

.

.

.

.

Cumulative relative frequency distribution

Cumulative percentage distribution

f. Summary table

Frequency distribution for categorical data

.

.

.

.

.

g. Contingency table (cross-classification table)

Two simultaneous categorical variables

.

.

.

.

.

.

Use PivotTable function of Excel

6. Graphical presentation

a. Types of graphs

(1) Histogram

.

.

.

.

.

.

.

Use Histogram function of Excel: Tools | Data Analysis | Histogram

(2) Percentage polygon

.

.

.

.

.

.

.

(3) Cumulative percentage polygon (ogive)

.

.

.

.

.

.

.

(4) Pareto diagram

.

.

.

.

.

.

.

Use Histogram function of Excel: Tools | Data Analysis | Histogram

b. Principles of graphical excellence

(1) Well-designed presentation of data that provides substance, statistics, and design

(2) Communicates complex ideas with clarity, precision, and efficiency

(3) Gives the viewer the largest number of ideas in the shortest time with the least ink

(4) Almost always involves several dimensions

(5) Requires telling the truth about the data