Stata Lab Q&A:

  • Can I increase the capacity of the Results window in Stata?

    Yes, but it uses additional memory. To permanently increase the capacity of the Results window from 32,000 characters to 2,000,000 characters give the following command in a do-file or Stata session (set scrollbufsize will take effect the next time you launch Stata):

    set scrollbufsize 2000000

go to top

  • Why doesn't Stata work correctly?

Who knows, but make sure you update everything after you have installed Stata 12 from CD by giving the commands:

update all (wait for it to finish)
update swap (renames the wstata.bin just downloaded -- Stata will restart)

go to top

  • How do I list a variable, x, in order of the frequency counts?

    Use the following Stata statements, interactively or (better) in a .do file:

*save dataset prior to contract command
contract x
sort _freq
list x _freq

A simpler, but harder to read way is:

bysort x: gen count=_N
gen count_x=1000*count+x
tab count_x

In SAS this is done with the order=freq option:

proc freq order=freq;
table x;

go to top

  • Can I increase the number of commands stored in the Review window?

    In Stata, you can increase the number of commands retained to 500 using the command:

    set reventries 500, permanently

go to top

  • How can I take random samples from an existing dataset?

The Stata website gives full description of how this is done:

http://www.stata.com/support/faqs/stat/sampling.html

Examples:

sample 10             (draw and save a 10% sample)
save samp10pct.dta,replace

sample 1000, count             (draw and save a sample of size 1000)
save samp1000.dta, replace

go to top 

  • What to do if Stata does not start when you double-click a do-file
  • Open Windows Explorer
  • Click Tools => Folder Options => File Types => Scroll down to "DO" file extension and click change.
  • If details for 'Do' extension does not say
    "Opens with stata"
    then click Change and find stata.exe
  • If it already says "Opens with stata" then click on Advanced.
  • Under Actions, there should be 3 actions:  edit and open. 
  • Highlight "open", click "Edit" and make sure the following is in the Application used to perform action" box (include the quotes):

"C:\Program Files\Stata12\stata.exe" do "%1"

and stata should be in the Application box.

go to top

go to top

  • With a Mac, how can I store the last 50 commands
    in a do-file?

#review 50

Cut and paste commands from the log into the do-file
(you will have to remove line numbers)

go to top 

  • How to merge variables from 2 or more datasets
     
    •  Example 1: Merge 2 datasets:
                        data1 and data2 * using variable id to link

clear
use data1
sort id
save data1, replace

clear
use data2
sort id
save data2, replace

merge id using data1
*keep observations only if on both datasets -- _merge code==3
keep if _merge==3
save mergedata, replace  

 

  • Example 2: Merge 3 datasets:
                     data1, data2 and data3 * using variable id to link

clear
use data1
sort id
save data1, replace

clear
use data2
sort id
save data2, replace

clear
use data3
sort id
save data3, replace

merge id using data1 data2
*keep observations if on all 3 datasets
keep if _merge1==1 & _merge2==1
drop _merge1 _merge2 _merge
save mergedata, replace

go to top

  • Analyze slopes from longitudinal data  

I have a longitudinal data set and want to regress a predictor over time within each id and attach the resulting beta coefficient and se to the id.

gen coef = .
gen seb = .
levelsof id, local(levels)
foreach l of local levels {
regress yvar time if id==`l'
mat b = e(b)
mat V = e(V)
replace coef = b[1,1] if id==`l'
replace seb = sqrt(V[1,1]) if id==`l'
}
list id coef seb

go to top   

  • How to format regression output for Word
     
    • outreg.ado

      Installation:
      ssc install outreg, replace all
      Use help ssc for details on the ssc command.
      Use help outreg for details on the outreg command.
       
    • (outreg2 is a related alternative and can be installed as above)
       
    • Description

      outreg creates an ASCII text file with columns separated with tab
      characters. The file can be converted automatically to a table in
      word processors and
      spreadsheets.

      For example, in Microsoft Word:
      Open or Insert the file created by outreg.
      Select the estimation output text that is in columns
      (not the notes at the bottom of the table or the title at
      the top, if any).

      Select Table / Convert / Text to Table.

      With some adjustment of the column widths, fonts, etc.
      the final table is ready in Word.
       
    • Example:

      Store the following commands in outreg_demo.do and run it:

      * Illustrate using built-in auto Stata dataset
      sysuse auto,clear

      * Fit regression equation
      regress mpg foreign weight headroom trunk length turn displacement

      * Allow commands to span mulitple lines
      #delimit ;
      outreg using outreg.txt, replace pvalue noaster label
      title("Title 1", "Title 2")
      ctitle("Column title")
      addnote("note1", "note2", "note3")
      ;
      #delimit cr

      Note:
      The ASCII file with the table, outreg.txt, will be found in the
      same folder as outreg_demo.do

go to top  

  • How to get higher resolution TIFF files with Stata/Macintosh
     
    • Add the height( ) option to the graph export command
      as in the following example:

      graph export eq3\figr3.tif,replace height(2000)

      This increases the number of pixels by a factor of about
      100 fold.  The maximum allowed height is 16000.

       

go to top

  • What if the do-file is too large for the do-file editor.
     
    • Some do-files, especially those with names and variable labels from large public use datasets, will cause a Stata error message stating that the do-file is too large.

      1) Try increasing memory using the set memory command.

      2) Create/edit the do-file outside Stata using NotePad (Start->Programs->Accessories->Notepad) or equivalent text editor, saving the file after finishing edits, but leave Notepad open. Then, double-click the do-file to change the Working Directory to the folder containing the do-file and data. This will run the start Stata and attempt to run the do-file. If there are errors to be fixed or edits to be made, you can make them in Notepad, save, return to Stata and type (if the do-file is named bigdo.do):
       

      • do bigdo.do
         
  • What if the number of variables exceeds 2047?
     
    • You may receive an error for one of the very large public use datasets because the number of variables exceeded 2047. One can either use Stata/SE (which we do not have) or, use SAS to create a SAS dataset, and then use StatTransfer to create a Stata dataset with a smaller number of variables.

     

  • Some Stata Tips for Working with Another Data Set

    1. Are your data contained in an Excel Spread sheet? You could;

    • a. Open the Excel file, highlight (select) the observations and variables of interest, copy and paste into the upper left cell of the Stata data editor; the variable names and values will be copied.

    Age

    sex

    interview_dt

    rate_1

    rate_2

    followup_dt

    id

    32

    M

    1/1/2006

    1

    3

    3/1/2006

    1

    15

    F

    2/13/2007

    2

    4

    5/13/2007

    2

    12

    M

    4/15/2007

    9

    1

    7/15/2007

    3

    19

    M

    9/15/2006

    4

    9

    12/15/2006

    4

    8

    F

    3/17/2007

    9

    9

    6/17/2007

    5

    6

    F

    1/4/2008

    2

    4

    4/4/2008

    6

    You will notice that sex is a string variable, the dates are not in a format that would allow you to subtract them, and the rate variables have values of 9 represent a missing value but Stata requires a “.” for a missing value.

    2. Need to change the format of certain variables?

    a. Convert string variable to numeric variable – use the “encode” command or the “destring” command

    b. Change date format to number of days so that it may be used in analysis.

    c. Change missing values coded as “9” to “.,” (missing values).

    Create the following do file to make some changes:

    codebook sex

    tab sex

    encode sex, gen(sexn)

    tab sexn

    codebook sexn

    gen interview=date(interview_dt,"MDY")

    codebook interview

    gen followup=date(followup_dt,"MDY")

    gen time = followup-interview

    stem time

    list interview_dt followup_dt time

    foreach var of varlist rate_1-rate_2{

    replace `var'=. if `var'==9

    }

    tab rate_1, missing

    tab rate_2, missing

    In the results window, you will see:

    . do "C:\practice.do”

    . codebook sex

    -----------------------------------------------------------------------------------------

    sex (unlabeled)

    -----------------------------------------------------------------------------------------

    type: string (str1)

    unique values: 2 missing "": 0/6

    tabulation: Freq. Value

    3 "F"

    3 "M"

    . tab sex

    sex | Freq. Percent Cum.

    ------------+-----------------------------------

    F | 3 50.00 50.00

    M | 3 50.00 100.00

    ------------+-----------------------------------

    Total | 6 100.00

    . encode sex, gen(sexn)

    . tab sexn

    sexn | Freq. Percent Cum.

    ------------+-----------------------------------

    F | 3 50.00 50.00

    M | 3 50.00 100.00

    ------------+-----------------------------------

    Total | 6 100.00

    . codebook sexn

    -----------------------------------------------------------------------------------------sexn (unlabeled)

    -----------------------------------------------------------------------------------------

    type: numeric (long)

    label: sexn

    range: [1,2] units: 1

    unique values: 2 missing .: 0/6

    tabulation: Freq. Numeric Label

    3 1 F

    3 2 M

    . gen interview=date(interview_dt,"MDY")

    . codebook interview

    -----------------------------------------------------------------------------------------

    interview (unlabeled)

    -----------------------------------------------------------------------------------------

    type: numeric (float)

    range: [16802,17535] units: 1

    unique values: 6 missing .: 0/6

    tabulation: Freq. Value

    1 16802

    1 17059

    1 17210

    1 17242

    1 17271

    1 17535

    . gen followup=date(followup_dt,"MDY")

    . gen time = followup-interview

    . stem time

    Stem-and-leaf plot for time

    5* | 9

    6* |

    7* |

    8* | 9

    9* | 1112

    . list interview_dt followup_dt time

    +-------------------------------+

    | intervi~t followup~t time |

    |-------------------------------|

    1. | 1/1/2006 3/1/2006 59 |

    2. | 2/13/2007 5/13/2007 89 |

    3. | 4/15/2007 7/15/2007 91 |

    4. | 9/15/2006 12/15/2006 91 |

    5. | 3/17/2007 6/17/2007 92 |

    |-------------------------------|

    6. | 1/4/2008 4/4/2008 91 |

    +-------------------------------+

    . foreach var of varlist rate_1-rate_2{

    2. replace `var'=. if `var'==9

    3. }

    (2 real changes made, 2 to missing)

    (2 real changes made, 2 to missing)

    . tab rate_1, missing

    rate_1 | Freq. Percent Cum.

    ------------+-----------------------------------

    1 | 1 16.67 16.67

    2 | 2 33.33 50.00

    4 | 1 16.67 66.67

    . | 2 33.33 100.00

    ------------+-----------------------------------

    Total | 6 100.00

    . tab rate_2, missing

    rate_2 | Freq. Percent Cum.

    ------------+-----------------------------------

    1 | 1 16.67 16.67

    3 | 1 16.67 33.33

    4 | 2 33.33 66.67

    . | 2 33.33 100.00

    ------------+-----------------------------------

    Total | 6 100.00

    end of do-file

    3. Are your data are contained in an Excel Spread sheet or a different format such as a SAS data file or SPSS data file? You could;

    a. Open the StatTransfer program in the computer lab rooms. StatTransfer allows you to transfer an input file of a certain specification (e.g., Excel, SAS, SPSS) to a Stata10 output file. Note: the second tab on the left of the StatTransfer window will allow you to select certain variables; the third tab on the left will allow you to select certain observations. By default, Stata transfers all observations and all variables and it will transfer dates into date format for you.

    . list

    +----------------------------------------------------------+

    | age sex intervi~t rate_1 rate_2 followu~t id |

    |----------------------------------------------------------|

    1. | 32 M 01 Jan 06 1 3 01 Mar 06 1 |

    2. | 15 F 13 Feb 07 2 4 13 May 07 2 |

    3. | 12 M 15 Apr 07 9 1 15 Jul 07 3 |

    4. | 19 M 15 Sep 06 4 9 15 Dec 06 4 |

    5. | 8 F 17 Mar 07 9 9 17 Jun 07 5 |

    |----------------------------------------------------------|

    6. | 6 F 04 Jan 08 2 4 04 Apr 08 6 |

    +----------------------------------------------------------+

    . codebook

    ------------------------------------------------------------------------------

    age (unlabeled)

    ------------------------------------------------------------------------------

    type: numeric (byte)

    range: [6,32] units: 1

    unique values: 6 missing .: 0/6

    tabulation: Freq. Value

    1 6

    1 8

    1 12

    1 15

    1 19

    1 32

    ------------------------------------------------------------------------------sex (unlabeled)

    -----------------------------------------------------------------------------

    type: string (str1)

    unique values: 2 missing "": 0/6

    tabulation: Freq. Value

    3 "F"

    3 "M"

    ------------------------------------------------------------------------------

    interview_dt (unlabeled)

    ------------------------------------------------------------------------------

    type: numeric daily date (long)

    range: [16802,17535] units: 1

    or equivalently: [01jan2006,04jan2008] units: days

    unique values: 6 missing .: 0/6

    tabulation: Freq. Value

    1 16802 01jan2006

    1 17059 15sep2006

    1 17210 13feb2007

    1 17242 17mar2007

    1 17271 15apr2007

    1 17535 04jan2008

    ------------------------------------------------------------------------------

    rate_1 (unlabeled)

    ------------------------------------------------------------------------------

    type: numeric (byte)

    range: [1,9] units: 1

    unique values: 4 missing .: 0/6

    tabulation: Freq. Value

    1 1

    2 2

    1 4

    2 9

    ------------------------------------------------------------------------------

    rate_2 (unlabeled)

    ------------------------------------------------------------------------------

    type: numeric (byte)

    range: [1,9] units: 1

    unique values: 4 missing .: 0/6

    tabulation: Freq. Value

    1 1

    1 3

    2 4

    2 9

    ------------------------------------------------------------------------------

    followup_dt (unlabeled)

    ------------------------------------------------------------------------------

    type: numeric daily date (long)

    range: [16861,17626] units: 1

    or equivalently: [01mar2006,04apr2008] units: days

    unique values: 6 missing .: 0/6

    tabulation: Freq. Value

    1 16861 01mar2006

    1 17150 15dec2006

    1 17299 13may2007

    1 17334 17jun2007

    1 17362 15jul2007

    1 17626 04apr2008

    .

    4. Have a large data set? Before you open it in Stata, type “set mem 35m” in the command line.

    5. Need to merge two data sets? (Two data sets with different different variables on the same individuals.) Both data sets must have the same unique id for individuals; both data sets must be sort by id.

    . use "C:\practice1.dta", clear

    . sort id

    . merge id using "C:\practice2.dta"

    . tab _merge

    _merge | Freq. Percent Cum.

    ------------+-----------------------------------

    3 | 6 100.00 100.00

    ------------+-----------------------------------

    Total | 6 100.00

    Stata creates a variable names _merge such that 1 indicates only in file 1, 2 indicates only in file 2 and 3 indicates in both files.

    +----------------------------------------------------------------------------------------+

    | id age sexn interv~w followup rate_1 rate_2 outcome1 outcome2 _merge |

    |----------------------------------------------------------------------------------------|

    1. | 1 32 M 16802 16861 1 3 Y N 3 |

    2. | 2 15 F 17210 17299 2 4 N N 3 |

    3. | 3 12 M 17271 17362 . 1 Y Y 3 |

    4. | 4 19 M 17059 17150 4 . N Y 3 |

    5. | 5 8 F 17242 17334 . . Y N 3 |

    |----------------------------------------------------------------------------------------|

    6. | 6 6 F 17535 17626 2 4 N Y 3 |

    +----------------------------------------------------------------------------------------+

    6. Need to append two data sets? (Two data sets with same variables on different individuals.)

    . use "C:\practice1.dta", clear

    . sort id

    . append using "C:\practice3.dta"

    7. Do you have multiple records for the same individual (same id). The Stata reshape command allows one to go from data in a “long” format with multiple records per person to a “wide” format with a single record per person.

    (long form)

    i j x_ij

    id year sex inc

    -----------------------

    1 80 0 5000

    1 81 0 5500

    1 82 0 6000

    2 80 1 2000

    2 81 1 2200

    2 82 1 3300

    3 80 0 3000

    3 81 0 2000

    3 82 0 1000

    (wide form)

    i ....... x_ij ........

    id sex inc80 inc81 inc82

    -------------------------------

    1 0 5000 5500 6000

    2 1 2000 2200 3300

    3 0 3000 2000 1000

    Here is the example from the Stata help for the reshape command.

    Given these data, you could use reshape to convert from one form to the other:

    . reshape wide inc, i(id) j(year) (goes long to wide)

    . reshape long inc, i(id) j(year) (goes from wide to long)

    8. Don’t forget to use the Stata help menu. It may look ominous but if you scroll down, often there are examples at the end of the help file for a certain command.

    9. Don’t forget to look back at your Biostat 621-623 lecture notes, problem sets, and Stata notes for tips.

    10. Biostat 624 requires a data analysis project of your choice so this course will be helpful to you if you are working with another data set.

    11. Are we missing a question that you may have? Please let us know.

go to top


  Home   |   Schedule   |   Classes   |   Problem Sets   |   e-Quizzes   |   Contact Us