júní 21, 2011

General command syntax

Most of the Stata commands can be shortened. For example, instead of typing summarize, Stata will also accept gen. The help screen demonstrates for each command
how it can be abbreviated, by showing underlined letters in the syntax section of the help.

Stata syntax follows mostly the following basic structure:

Syntax:

[by varlist1:] command [varlist2] [=exp] [if] [in] [using filename] [, options]

where square brackets shows optional qualifiers.

example:

bysort gender: tabulate age if weight < 50, nolabel

A variable list (varlist) is a list of variable names with blanks in between. There are a number of shorthand conventions to reduce the amount of typing. For instance:
myvar                                          Just one variable
myvar var1 var2            Three variables
myvar*                                       All variables starting with myvar
*var                                             All variables ending with var
my*var                                      All variables starting with my and ending with var
my~var                                      A single variable starting with my and ending with var
my?var                                     All variables starting with my and ending with var with one other character between
myvar1-myvar6                     myvar1, myvar2, ..., myvar6 (probably)
this-that                                  All variables in the order of the variables window this through that
The * character indicates to match one or more characters. All variables matching the pattern are returned. The ~ character also indicates to match one or more characters, but unlike *, only one variable is allowed to match. If more than one variable match, an error message is returned. The ? character matches a single character. All variables matching the pattern are returned. The - character indicates that all variables in the dataset, starting with the variable to the left of the - and ending with the variable to the right of the – are to be returned. Any command that takes varlist understands the keyword _all to mean all variables. Some commands are using all variables by default if none are specified (e.g., summarize shows summary statistics for all variables, and is equivalent to summarize _all).

júní 20, 2011

Using by/if/in

Now we have used the previous commands on the data set without specific or orderly sequence.

Suppose you want to describe, browse, or even summarize a group of variables in the data set? Did I really intend to ask you this or should I be answering this?

Its possible to analyze just a sample or portion of data set by use of the following conditional options

  • IF
  • IN
  • BY

IF

Executes a command when the set condition is met

Syntax: command  [variable(s)] if (condition)

example: 1) summarize age if (age>10)

  2)list make price if foreign==1

  3)list make price if price > 10000 & price <.

Explanation: 1) will only execute the command summarize on variable age on only the occurrences where age is greater than ten.

              2) Will list prices of only foreign cars .(to load this data, enter sysuse auto in the command window)

             3) Will list only prices greater than 10000 and value is not missing

Other possible operators include less than(<), less than or equal to(<=), greater than or equal(>=), not equal to(!=), if missing

Now that I have mentioned “missing”, let me tell you something about “missing”. If a variable of type numeric has no value, Stata uses dot(.) to show this. If the variable was of type string, it is shown by a blank(“ “).

IN

Used to specify  a range

Syntax: command summarize [variable] in 1/20

This will display a summary of variable 1 through to 20

BY

Primarily used to sort data

As a prefix:

When a command is prefixed with a bylist, it is performed repeatedly for each element of the variable or variables in that list, each of which must be categorical. For instance,
by foreign: summ price
will provide descr iptive statistics for both foreign and domestic cars. If the data are not already sorted by the bylist variables, the prefix bysort should be used. The option ,total will add the overall summar y.

Examples:

bysort rep78: summ price
bysort rep78 foreign: summ price

As a suffix:

summarize price, by(foreign)

júní 15, 2011

I Want to summarize data set( or Variable)

With descriptive Statistics road ending, I want to bid farewell with one more important command. The command is summarize. this gives brief summary statistics about all the variables or a specific chosen variable.

Form the menu bar, follow these steps

  1. Select Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Summary statistics.
  2. Enter or select [Variable] in the Variables field.
  3. Select Display additional statistics.
  4. Click Submit.

Syntax:

.summarize or summ

or

.summarize [variable] or summ [variable]

This will output the details of the variable including percentile, mean, Kurtosis, Skewness and Standard Deviation.

Other similar commands: browse, describe, codebook

Assumptions:

  • You have loaded data set
  • all commands are typed in the Command window
A Gentle Introduction to Stata, Second Edition
stata

júní 13, 2011

My Codebook

Either type codebook  in the Command window and press Enter or navigate the menus to Data > Describe data > Describe data contents (codebook), and click OK. We get a large amount of output that is worth investigating. Look it over to see that much can be learned from this simple command. You can scroll back in the Results window to see earlier results, if need be. I will focus on specific variables, here the variable id

Syntax:

.codebook id (I assume you have loaded our data set)

Output

codebook

The codebook commands gives the variable name(id) Variable label (ID), type(whether numeric, String, e.t.c) number of unique values(here 30) ,mean, standard deviation, and major percentiles.

Use codebook when you want to know more about a variable.

Other similar Commands: describe, summary, browse

Describing Data

We have seen how to browse data. Suppose now we want to get into more details of the data variables, variable labels, data types, data formats, e.t.c

This iis pretty easy in Stata. With the help of .describe command, one is able to catch a glimpse of whole data set or just a few variable.

Syntax:

.describe

this will give a description of all variables in the data set

(Assuming we are still using the Students Excel format already loaded here)

. describe

output

describe

For just defined variables

.describe gender this describes only the gender variable

.describe gender state this describes the gender and state variables only.

Other similar  Commands: browse, summary, codebook

júní 06, 2011

Stata is a full-featured statistical programming language for data analysis. Stata is available in several packages. It is also packaged for Business, Academic or Governmental institutions. Read more

Features and limits 


Stata package:

Small Intercooled
IC (the standard version)
SE
(an extended version)
Number of observations 1,200 unlimited unlimited
Number of Variables 99 2,047 32,767
Number of characters in a command 8,697 67,800 1,081,527
Number of options for a command 70 70 70
Length of a string variable 244 244 244
Length of a variable name 32 32 32

Comparison with other packages

R:

R is a free software package designed for use with command line only. While being a language is one of R's greatest strengths, it can make it harder to learn for those without programming experience. However, once learnt, you are no longer subject to price increases. The developer’s community ensures to constantly provide add-ons and also ensures that the software will continue to exist. R is extremely versatile in graphics, and generally good for people who really want to find out “what their data have to say”.

SAS:

SAS is the second most costly package. It can be used with, both, command line and graphical user interface (GUI). SAS is particularly strong on data management (especially with large files), and good for cutting edge research. It covers many graphical and statistical tasks. The main focus is on business customers now.

SPSS:

SPSS is the first choice for the occasional user. However, it is the most expensive of the four. SPSS is clearly designed for point-and-click usage on the GUI. A command structure exists, but it is not well defined and sometimes inconsistent. SPSS is good for basic data management and basic statistical analysis, but rather weak in graphics. In  future, SPSS might be the weakest of the four packages with regard to the scope of

statistical procedures it offers due to its main focus on business customers.

Stata:

Stata is designed for the usage by command line, but it also offers a GUI that allows for working with menus. The simple and consistent command structure makes it rather easy to learn. It is the cheapest of the packages that entail costs, and it offers additional reductions for the educational sector. Stata is relatively weak on ANOVA, but extraordinary on regression analysis and complex survey designs. Stata is completely focused on scholars. In the future, Stata may have the strongest collection of advanced statistical procedures. You can get orders from http://www.stata.com/order/

Graphical Interface

Evolving from command-driven interface, stata also operates in a graphical windowed interface/Environment.




Results window
All outputs appear in this window. Only graphics will appear in a separate window.
Command window
This is the command line where commands are typed for execution.
Variables window
All variables in the currently open dataset appear here. By clicking on a variable, its name can be transferred to the command window.
Review window
Previously used commands are listed here and can be transferred to the command
window by clicking on them.
Major Buttons
The most important button functions are the following:
Open (use): Opens a new data file.
Save: Saves the current data file.
Print results: Prints the content of the results window.
New Viewer: Opens a new viewer window, e.g. to open log-files.
New Do-file Editor: Opens a new instance of the do-file editor (same as doedit).
Data Editor: Opens the data editor window (same as edit).
Data Browser: Opens the data browser (same as browse).
Break: Allows to cancel currently running calculations.
Menu
Almost all commands can be called from the menu. However, we do not recommend to learn Stata using the menu commands since the command line will give the user much better control and allows for a much faster and more exact working process.

External links