september 28, 2011

Reading in Data

If Data you are interested in is in Stata Format, then all you need is knowing its file path. Assuming you are using PC, and data set is named “Coffee.dta”, and is located in “My documents”, then the Stata command for loading such will be

use "C:\Documents and Settings\Paul\My Documents\Coffee.dta", clear

Note: Don’t forget putting the double quotes (“”) enclosing the path of the data set.

Data Set inform of Excel(or text file)

If the data set you are interested in is .xls, Its still possible to work with it in Stata.

first, from your excel application, save the data as of type Text(Tab delimited). this will save it in a .txt file extension.

Now our coffee data set, coffee.xls becomes coffee.txt.

from Stata, issue the following command:

insheet using "C:\Documents and Settings\Paul\My Documents\coffee.txt"

The same command is issued if importing a .csv file.

insheet using "C:\Documents and Settings\Paul\My Documents\coffee.csv"

Note:

You can also save Excel data COULD be saved in the .csv format (“save as”, if not already in that format).

You can also enter the data manually (or copy and paste from Excel or almost any format) by typing

From Stata’s Command window, type edit.

This will trigger a Data Editor window. Paste your Data here or edit it manually.

If you choose this option, Stata will automatically name your variables by column, e.g. the variable first column is “var1”. To change this, you can rename your variables by typing
rename var1 price – to rename “var1” to “price”. Read more on variables renaming.



júlí 18, 2011

Combining data sets

In many empirical research projects, the raw data to be utilized are stored in a number of separate files; panel data,
time series data extracted from different databases, and the like. Stata only permits a single data set to be accessed at one time. How, then,
do you work with multiple data sets? Several commands are available, including append, merge, and joinby.

The append command combines two Stata-format data sets that possess variables in common, adding observations to the existing
variables. The same variables need not be present in both files, as long as a subset of the variables are common to the “master” and
“using” data sets. It is important to note that “PRICE" and “price” are different variables, and one will not be appended to the other.

Free hosting here

The merge command is very powerful. Like append, it works on a “master” data set—the current contents of memory—and one or more
using” data sets. One or more merge variables are specified, and both master and using data sets must be sorted on those variables.
The distinction between “master” and “using” is important. When the same variable is present in each of the files, Stata’s default behavior is
to hold the master data inviolate and discard the using dataset’s copy of that variable. This may be modified by the update option, which
specifies that non-missing values in the using dataset should replace missing values in the master, and update replace, which specifies
that non-missing values in the using dataset should take precedence.

Free hosting

A “one-to-onemerge specifies that each record in the using data set is to be combined with one record in the master data set. This would
be appropriate if you acquired additional variables for the same observations. A new variable, _merge, takes on integer values
indicating whether an observation appears in the master only, the using only, or appears in both. This may be used to determine whether
the merge has been successful, or to remove those observations which remain unmatched (e.g. merging a set of households from
different cities with a comprehensive list of ZIP codes; one would then discard all the unused ZIP code records). The _merge variable must
be dropped before another merge is performed on this data set. The unique option should be used if you believe that both data sets
should have unique values of the merge key.The unique option should be used if you believe that both data sets should have unique values of the merge key.

The merge command can also do a “match merge”, or “one-to-N” merge, in which each record in the using data set is matched with a
number of records in the master data set. If a number of the households lived in the same ZIP code, then the match would place variables from the ZIP code file on the household records, repeating where necessary. This is a very useful technique to combine aggregate data with disaggregate data without dealing with the details. Although “one-to-N” and “N-to-one” merges are commonplace and very useful, you probably never want to do a “N-to-N” merge, which will yield seemingly random results. To ensure that one data set has unique identifiers, specify the uniqmaster or uniqusing options, or use the isid command to ensure that a dataset has a unique
identifier.

Enjoy free hosting.