júlí 13, 2011

Working with variables

Creating new variables

Variables are created by the command generate

Syntax:  generate [variablename]

generate [newvar] = [expression]

While creating a vairiable, you can too assign an initial value by use of = operator.

eg generate true=1

generate fullname = last + “, “ + first

Renaming variables

This is done to assign a new variable to an existing variable

Syntax: rename [old name] [new name]

Here I am changing the variable name from “old name” to “new name”

Replacing variable values

when you need to replace the existing values of a variable, the re is provision through the command replace

Syntax: replace [variable name] =[value] [conditions]  [options]

e.g replace true=0 in 1/10 this will replace value of true to 0 for all variables 1 to 10

Recoding variables

suppose you had this data

.tab age

Age Frequency
18 2
19 6
20 3
21 4

now you want to recode the variable age into age groups, say 18 to 19, 20 to 21

this will be achieved through this command

.recode age (18 19 = 1 “18 to 19”) ///
(20 21= 2 “20 to 21”) ///
(else=.), generate(agegroups) label(agegroups)

As we see, recode helps reorganize data.

.tab age

Age Frequency
18 to 19 8
20 to 21 7

Labeling variables

to label a variable, first you have to define the label then implement it on the variable.

This is how its done in Stata

Syntax: label define [value/variable] [Lable] …………….

label values [variable] [label]

eg

label define genderlabel 1 male 2 female

label values gendervariable genderlabel

Deleting variables

If you had a temporary variable, or you are doing data cleanup, sometimes its necessary to delete some variables.

Syntax: drop [variable name] [options] Note: If you just execute drop without variable name of options, you will be deleting all variable in the data set

eg. drop age if age==.

júlí 04, 2011

Three way crosstabs

We have seen descriptive statistics and in this post, I am going to highlight how to do a cross-tabulation using more than two variables.

this is achievable by using the tabstat command

One can specify the statistics to show and with the help of bysort command, you can show cross-tabulations involving more than one variable.

Syntax:

tabstat variable[s], statistics(statistics) by(conditional variable)

Example 1:

tabstat age sat score heightin readnews, statistics(mean median sd var count range min max) by(gender)

Eaxmple 2:

bysort age:tab ed_level major- this examples first sorts the records by age and then cross-tabulates the dataset variables(ed_level and major)

Example 3:

bysort studentstatus: tab gender major, sum(sat) –This adds a fourth variable

Get data set here