Data Assignments

STATISTICS

You should be familiar with ideas of mean, variance and standard deviation, the normal curve, and correlation. You may consult any book you wish, but for most of you the most convenient introduction is via a web course designed by Professor David Lane and others. Assignments below are "chapters" on that link.

http://psych.rice.edu/online_stat/

I: Introduction to Statistics (Due September 21)

I (Introduction)

II (Graphing Distributions)

II: Descriptive Statistics (Due September 28)

III (Summarizing Distributions)

VI: (Normal Distributions)

III: Correlations

IV: Describing Bi-Variate Data

CRIME STATISTICS

Propose an hypothesis that can be tested using collective statistics, i.e., statistics based on violent crimes or violence incidence aggregated across various demographic variables. You might, for example, want to test an hypothesis that cities with certain characteristics are more prone to violent crime. For this assignment you will have to consult various data bases. Probably the best starting point is the data collected by the FBI every year and known as the Uniform Crime Statistics. These data are available in book form (Fondren Library has the book in government documents or the Government Printing Office store downtown often has them), as a CD or on-line. (http://www.fbi.gov/ucr/ucr.htm#cius). These crime data may not be enough, and in that case you will need to consult other sources. For example, if your hypothesis was that cities with high poverty rates also have higher rates of violent crime, you would have to get data on poverty rates elsewhere. Again, such data can be found on-line and in a variety of reference books.

Your hypothesis can be as elaborate as you wish, and certainly there is a premium on those that are interesting and creative. But the main purpose of this assignment is to give you an opportunity to explore various sources of data and to have some experience manipulating the data you uncover. Therefore you should adjust your ambitions to your skills and time, keeping in mind that we usually learn the most when we push ourselves to develop new skills.

The paper need not be elaborate. You will need to state your hypothesis clearly and to provide some rationale for it. For example, why would you expect that violence rates should be correlated with poverty? Generating a reasonable hypothesis may require you to explore a bit in the library or on-line for relevant theories or speculations, but again the focus of this assignment is on data and not theories so you need not provide anything like an exhaustive review of past literature. You do need to be clear about your data sources and why you chose them.

You can present the data in a variety of ways. If you are comfortable with correlations, you may certainly present the data that way. So you could correlate (across cities) poverty rates and violence rates. Obviously there are any number of statistical packages that will do the dirty work for you, and many are available in data labs on campus. Alternatively you can also get Excel (and most other spreadsheet programs) to execute correlations if you have the relevant analysis pack installed -- most versions of Excel (or Microsoft Office) have the analysis pack but it is typically not installed by default. If you do not feel comfortable with correlations you can present the data in tables, in a form usually called cross-tabs. In the example below I have divided the cities into the highest poverty half and lowest poverty half and also into highest and lowest halves for violence. Then it is a relatively simple matter to see what percentage of the total cities fall into each of the 4 cells. Alternatively one could simply put number of cities into each cell, but it usually makes for a cleaner picture to convert to percentages. Cross-tab tables can be converted into correlations, but we can settle for visual inspection as a test of the hypothesis. In this case it is obvious that there is a relationship between poverty levels and violence rates across cities.

 

High Violence Rate Cities

Low Violence Rate Cities

High Poverty Cities

40
10

Low Poverty Cities

10
40

Note that you are not limited to only only predictor variable. You could (but need not) have a more complex hypothesis such as poverty is related to violence but only for cities in the South. In such a case you would need to present correlations separately for Southern and Northern cities or present a more complex cross-tab table.

 

High Violence Rate Cities

Low Violence Rate Cities

Southern Cities

   

High Poverty

20
05

Low Poverty

05
20

Northern Cities

High Poverty

12
13

Low Poverty

13
12

 

Again, in this case it is obvious that the data support the hypothesis.

 

Main Course Page