* MLR.DO MLR * Example: Hospital Cost Data from Pagano and Gauvreau * Stata dataset: hospital.dta * Assumes files are in folder: [path]\regress * To run this program, use the following Stata commands: * cd [path]\regress ... change directory to folder regress * do MLR * OUTLINE: * Part a. Scatterplot matrix * Part b. MLR: expense per admission on salary and los * Part c. Added Variable plots -- uses regress above * Part d. MLR: Plot residuals vs Xs and predicteds for model checking * Part e. Re-fit without the outlier "Alaska" * Part f. Determine and list DBETA for the outlier * Part g. Fit linear splines for LOS and Salary -- in separate models, * w/o Alaska * Part h. Fit model with interaction for LOS*Salary * w/o Alaska * Housekeeping * Clear work space clear * Turn off -more- pause set more off * Set directory for do and log files or use "cd" to get to it * cd path/regress * Save log file on disk, use .txt so Notepad will open it capture log close log using cl4ex1.log, replace * Make sub-folder to store graph images shell md cl4ex1 * Extend linesize for log set linesize 100 * Access Stata dataset use hospital, clear * Dataset contents describe * Get variables, codes, descriptive stats codebook los salary expadm * Get stats, excluding Alaska summarize los salary expadm if state~="Alaska" * List 5 records for checking list in 1/5 * Part a. Scatterplot matrix * (Increase textsize) set textsize 150 graph matrix los salary expadm, half set textsize 100 * Save the graph on disk in ps1-1 folder as Windows metafile fig1.wmf graph export "cl4ex1\fig1.wmf", replace * Part b. MLR: expense per admission on salary and los regress expadm los salary * Part c. Added Variable plots -- uses regress above set textsize 150 avplot los , ytitle("e(expadm | X)") graph export "cl4ex1\fig2a.wmf", replace avplot salary , ytitle("e(expadm | X)") graph export "cl4ex1\fig2b.wmf", replace set textsize 100 * Part d. MLR: Plot residuals vs Xs and predicteds for model checking * * predict yhat predict e, residuals * Make the plots twoway scatter e yhat , yline(0) title("Residuals -vs- Predicted Values") ytitle("Residual") xtitle("Y-hat") graph export "cl4ex1\fig3a.wmf", replace twoway scatter e los , yline(0) title("Residuals -vs- Length of Stay") ytitle("Residual") xtitle("LOS") graph export "cl4ex1\fig3b.wmf", replace twoway scatter e salary , yline(0) title("Residuals -vs- Salary") ytitle("Residual") xtitle("Salary") graph export "cl4ex1\fig3b.wmf", replace * Part e. Re-fit without the outlier "Alaska" * Since fit will not accept strings in if statements (UGH), * must convert state(string variable) to stateno (numeric variable) encode state , gen(stateno) * Check list state stateno , nolabel regress expadm los salary if stateno ~= 2 * Part f. Determine and list DBETA for the outlier * Fit model with all the data points regress expadm los salary dfbeta los salary list state DF* if stateno == 2 * Part g. Fit linear splines for LOS and Salary -- in separate models, * w/o Alaska * Center los and salary at their means (excluding Alaska) egen losc = mean(los) if stateno~=2 replace losc = los - losc egen salaryc = mean(salary) if stateno~=2 replace salaryc = salary - salaryc * Generate non-linear terms gen l1 = (los-7) * (los>7) gen sal1 = (salary - 15000) * (salary>15000) * Fit models regress expadm losc salaryc l1 if stateno ~= 2 regress expadm losc salaryc sal1 if stateno ~= 2 * Part h. Fit model with interaction for LOS*Salary * w/o Alaska * Scale salary, divide by 1000 replace salaryc = salaryc/1000 * Generate interaction term gen losXsal = losc * salaryc * Fit model regress expadm losc salaryc losXsal if stateno~=2 * Close log file -- Only once all errors have been fixed *log close