lianxh.cn


Stata 101 - 入门指南


连玉君 (中山大学)

arlionn@163.com


👉 最新版: 网页版 | PDF 版

在线课程和资料

Stata 基本功能:

stata.com/features 🍏 🍓 务必通读 [U] User Guide

w:1200

w:550

文件[File] 菜单

界面风格设置及保存

数据/图形/统计 菜单

在 dofile 中编写代码是 Stata 的标准动作

dofile:新建和执行

dofile 编辑器设定

编辑首选项

*--------------  
*-基本回归分析  see [A4_Regress.do] 
  global path "D:/paper01/"  // 设置路径
  global D    "$path/data"
  global Out  "$path/out"  
  cd "$D"                    // 当前工作路径

  sysuse "nlsw88", clear
  global y "wage"            //存放被解释变量的全局暂元
  global x "hours tenure married collgrad age"  //help macro, 存放解释变量
  
  *-去除缺漏值
    qui reg $y $x i.race i.industry i.occupation
    keep if e(sample)
  
  *-呈现并输出基本统计量(`///' 是换行命令)
    logout, save("$Out/Table1_sum01") excel replace:   ///
            tabstat $y $x, column(stats) format(%6.2f) ///
            stats(mean sd min p50 max) 
  *-呈现相关系数矩阵
    logout, save("$Out/Table2_corr") excel replace: ///
            pwcorr_a $y $x

  *-回归分析:基本版本
    reg $y $x           //OLS 回归, basic model
      est store m1      //存储回归结果  
    reg $y $x i.race    //i.race表示种族虚拟变量, help fvvarlist
      est store m2
    reg $y $x i.race i.industry
      est store m3
    reg $y $x i.race i.occupation
      est store m4 
    *-列表呈现回归结果  see [A4_Regress, 4.7 回归结果呈现]
    local s "using $Out/Table3.csv"  //指定存储结果的 Excel 文档名称
    local m "m1 m2 m3 m4"
    esttab `m' `s', replace nogap compress ///
         ar2 scalar(N F rss)  b(%6.3f)           ///
         star(* 0.1 ** 0.05 *** 0.01)      ///
         noomit  nobase                    ///
         indicate("行业效应=*.industry" "职业效应 =*.occupation")

暂元的定义 (local)

Stata 代码习惯:dofile 模板等


命令: . lianxh 码农 重现代码 可重现

Project 功能

Stata 官方 video-PDF 实操-

  • 目的: 将同一个 project 相关的文档汇总到一个文档下,便于管理

  • 用途: 多个 dofile/data 的管理,如论文,课题或课程资料

  • 新建一个 Project:

    • [1] 打开 Do-file Editor,
    • [2] 依次点击 FileNewProject...., 保存
  • 打开一个 Project:

    • [1] 打开 Do-file Editor,
    • [2] 依次点击 File → Open → Project....

Source: stata.com

Stata 学习资源

Stata 主页资料

Stata 主页:入门介绍

  1. Introducing Stata—sample session
  2. The Stata user interface
  3. Using the Viewer
  4. Getting help
  5. Opening and saving Stata datasets
  6. Using the Data Editor
  7. Using the Variables Manager
  8. Using the Do-file Editor—automating Stata
  9. Learning more about Stata
  10. Subject index

论坛:讨论和交流

Stata Textbooks

计量教材+Stata实操

  • Baum, C. An introduction to modern econometrics using Stata[M]. Stata Press, 2006.
  • Cameron, A., P. Trivedi. Microeconometrics using stata[M]. Stata Press, 2009.
  • Acock, A. C. A gentle introduction to stata (4ed)[M]. Stata Press, 2014.

数据处理/绘图/编程

  • Kohler, U., F. Kreuter. Data analysis using stata (3th)[M]. Stata Press, 2012.
  • Long, J. The workflow of data analysis using stata[M]. Stata Press, 2009.
  • Mitchell, M. N. Data management using stata[M]. Stata Press, 2010.
  • Mitchell, M. N. Interpreting and visualizing regression models using stata[M]. 2022.
  • Weinberg, S. L., S. K. Abramowitz. Statistics using stata[M]. Cambridge: Cambridge University Press, 2020. -Link-, -PDF1-

离散变量/Logit/Count Dtata

  • Long, J., J. Freese. Regression models for categorical dependent variables using stata[M]. Stata Press, 2014, 3eds. -Link-

面板/多层次

  • Rabe-Hesketh, S., A. Skrondal. Multilevel and longitudinal modeling using stata[M]. Stata press, 2012.
  • Sul, D. Panel data econometrics: Common factor analysis for empirical researchers[M]. 2019. -Link-, -PDF1-

金融/时序

  • Boffelli, S., G. Urga. Financial econometrics using stata[M]. Stata Press, 2016.
  • Levendis, J. D. Time series econometrics: Learning through replication[M]. Springer, 2019. -Link-, -PDF1-, PDF2

生存分析/结构方程

  • Acock, A. C. Discovering structural equation modeling using stata[M]. 2013.
  • Cameron, A., P. Trivedi. Microeconometrics using stata[M]. Stata Press, 2009.

SFA

  • Kumbhakar, S. C., H. J. Wang, A. P. Horncastle, 2015, A practitioner's guide to stochastic frontier analysis using stata, Cambridge University Press. -Link-

Stata 数据处理


👉 Stata 命令:. lianxh 数据处理

导入数据

 help use         // 导入 Stata 格式数据

 help sysuse      // Stata 自带数据
 sysuse dir, all  // 数据列表

 help webuse      // 电子手册数据
 help dta_manuals // 数据列表

 help import      // 导入 Excel, csv, txt 格式的数据
      import excel data.xls, firstrow clear
      import excel "D:/data.xlsx", cellrange(A1:A10) firstrow clear

 copy "https://www.stata.com/examples/auto.csv" "auto.csv" // CSV
      import delimited auto, rowrange(3:6)

 help bcuse       // Wooldridge 书中配套数据
      bcuse wagepan, clear 

基本语法、运算符和函数


sysuse "auto.dta", clear

gen wei2len = weight/length
gen lnPrice = ln(price)
gen bad = (rep78>=4 & rep78!=.)
bysort foreign: egen sd_price = sd(price)

regress  mpg  price  weight##weight  i.foreign  i.foreign#weight

变量的简洁表示

因子变量

详情:Stata:因子变量全攻略   帮助:help fvvarlisthelp varlist

符号 含义 实例
i. 标示为类别变量 reg y x i.id i.yearyit=ai+xitβ+λt+uit\footnotesize y_{it}=a_i+x_{it}\beta + \lambda_t + u_{it}
c. 标示为连续变量
o. 略去某个类别或变量
# 交乘项 reg y D x i.D#c.x
yi=a+Diβ1+xiβ2+Dixiγ+ui\quad \footnotesize y_i = a + D_i\beta_1 + x_i\beta_2 + D_i x_i \gamma + u_i
reg y x c.x#i.yearyit=a+xitβt+uit\footnotesize y_{it}=a+x_{it}\beta_{\color{red}{t}} + u_{it}
## 两个变量及其交乘项 reg y x##zyi=a+xiβ1+ziβ2+xiziβ3+ui\footnotesize y_i=a+x_i\beta_1+z_i\beta_2 + x_iz_i\beta_3 + u_i
clear
input group   x
        1     30
        1     50
        2     40
        2     60
        2     80
        3     70
end 
. list group i.group i.group#c.x, clean  

                   1.      2.      3.   1.group#   2.group#   3.group# 
       group   group   group   group        c.x        c.x        c.x  
  1.       1       1       0       0         30          0          0  
  2.       1       1       0       0         50          0          0  
  3.       2       0       1       0          0         40          0  
  4.       2       0       1       0          0         60          0  
  5.       2       0       1       0          0         80          0  
  6.       3       0       0       1          0          0         70
. reg x i.group, noheader 

--------------------------------------------------
     x | Coefficient  Std. err.      t    P>|t|   
-------+------------------------------------------
 group |
    2  |     20.000     16.667     1.20   0.316   
    3  |     30.000     22.361     1.34   0.272   
       |
 _cons |     40.000     12.910     3.10   0.053   
--------------------------------------------------


. regfit

x =  40.00 + 0.00*1b.group + 20.00*2.group + 30.00*3.group
    (12.91) (0.00)          (16.67)         (22.36)
     N = 6, R2 = 0.43, adj-R2 = 0.05

因子变量:设定基准组的运算符及含义

基准组运算符 含义
ib#.x 使用*作为基准组,*为变量中其中一类的值
ib(#*).x 使用变量值中的第*位排序的值所对应的类别作为基准组
ib(first).x 使用变量的最小值所对应的类别作为基准组 (该项为 Stata 默认选项)
ib (last).x 使用变量的最大值所对应的类别作为基准组
ib( freq).x 使用变量值的频数最大的类别作为基准组
ibn.x 不设基准组

例 1:种族与工资

模型设定:Wagei=α+blackiβ1+hoursiβ2+blacki×hoursiβ3+ui\footnotesize Wage_i = \alpha + black_i\beta_1 + hours_i\beta_2 + black_i \times hours_i {\color{red}{\beta_3}} + u_i

. sysuse "nlsw88.dta", clear

*-传统方法
. gen black=1
. replace black=0 if race!=2
. gen black_x_hours = black*hours
. reg wage black hours black_x_hours

*-因子变量法
. reg wage 2.race c.hours 2.race#c.hours
. reg wage 2.race##c.hours //与上一行等价

------------------------------------------------------------------------------
        wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        race |
      Black  |     -1.619      1.266    -1.28   0.201       -4.101       0.864
       hours |      0.089      0.012     7.24   0.000        0.065       0.113
race#c.hours |
      Black  |      0.007      0.033     0.22   0.827       -0.057       0.071
       _cons |      4.810      0.474    10.14   0.000        3.880       5.741
------------------------------------------------------------------------------

模型设定:Wagei=α+Biβ1+hiβ2+Bi×hiβ3+hi2β4+Bi×hi2β5+ui\footnotesize Wage_i = \alpha + B_i\beta_1 + h_i\beta_2 + B_i \times h_i {\color{red}{\beta_3}} + {\color{blue}{h_i^2\beta_4 + B_i\times h_i^2\beta_5}} + u_i

. sysuse "nlsw88.dta", clear

. reg wage 2.race##c.hours##c.hours

-------------------------------------------------------------------
                wage |          Coeff       SE        t     P>|t|  
---------------------+---------------------------------------------
                race |
              Black  |         -3.461      2.274    -1.52   0.128  
               hours |          0.133      0.044     3.03   0.002  
                     |
        race#c.hours |
              Black  |          0.123      0.127     0.97   0.331  
                     |
     c.hours#c.hours |         -0.001      0.001    -1.05   0.296  
                     |
race#c.hours#c.hours |
              Black  |         -0.002      0.002    -0.95   0.344  
                     |
               _cons |          4.165      0.778     5.35   0.000  
-------------------------------------------------------------------

时序变量的表示

帮助:help varlisthelp tsvarlist
🍏 reg y L.y L(0/2).x D.L.z   {\color{red}{\rightarrow\ \ }} yt=a+ρyt1+s=03θsxts+Δzt1+ut\small y_t = a + \rho y_{t-1} + \sum_{s=0}^3 \theta_s x_{t-s} + \Delta z_{t-1} + u_t

 Operator Meaning  L.x xt11-period lag L2.x xt22-period lag F.x xt+11-period lead F2. xt+2 2-period lead  D. Δxt  =xtxt1first difference  D2. Δ2xt=(xtxt1)(xt1xt2)difference of difference L(0/2).x xt,xt1,xt2sevaral variables L.D.x Δxt1can be nested\footnotesize \begin{array}{lll} \text { Operator } & \text {Meaning } & \\ \hline \texttt { L.x } & x_{t-1} & \text{1-period lag} \\ \texttt { L2.x } & x_{t-2} & \text{2-period lag} \\ \ldots & \\ \texttt { F.x } & x_{t+1} & \text{1-period lead} \\ \texttt { F2. } & x_{t+2} & \text{ 2-period lead } \\ \ldots & \\ \texttt { D. } & \Delta x_t \ \ = x_{t}-x_{t-1} & \text {first difference } \\ \texttt { D2. } & \Delta^2 x_t = (x_{t}-x_{t-1})-\left(x_{t-1}-x_{t-2}\right) & \text{difference of difference} \\ \ldots & \\ \texttt { L(0/2).x } & x_{t}, x_{t-1}, x_{t-2} & \text {sevaral variables} \\ \texttt { L.D.x } & \Delta x_{t-1} & \text {can be nested} \\ \end{array}

. use "https://www.stata-press.com/data/r17/gxmpl1", clear
. format gnp cpi %5.1f

. list year  L(1/3).(gnp cpi)  D.cpi, clean

                   L.      L2.      L3.      L.     L2.     L3.    D. 
       year      gnp      gnp      gnp     cpi     cpi     cpi   cpi  
 ---------------------------------------------------------------------
  1.   1989        .        .        .       .       .       .     .  
  2.   1990   5837.9        .        .   124.0       .       .   6.7  
  3.   1991   6026.3   5837.9        .   130.7   124.0       .   5.5  
  4.   1992   6367.4   6026.3   5837.9   136.2   130.7   124.0   4.1  
  5.   1993   6689.3   6367.4   6026.3   140.3   136.2   130.7   4.2  
  6.   1994   7098.4   6689.3   6367.4   144.5   140.3   136.2   3.7  
  7.   1995   7433.4   7098.4   6689.3   148.2   144.5   140.3   4.2  
  8.   1996   7851.9   7433.4   7098.4   152.4   148.2   144.5   4.5 
. use "https://www.stata-press.com/data/r17/invest2", clear
. keep if time<=5&company<=2
. format invest market %4.1f
. list company time invest L(1/2).invest market D1.market

     +--------------------------------------------------------------+
     |                                L.      L2.                 D.|
     | company   time   invest   invest   invest   market    market |
     |--------------------------------------------------------------|
  1. |       1      1    317.6        .        .   3078.5         . |
  2. |       1      2    391.8    317.6        .   4661.7    1583.2 |
  3. |       1      3    410.6    391.8    317.6   5387.1     725.4 |
  4. |       1      4    257.7    410.6    391.8   2792.2   -2594.9 |
  5. |       1      5    330.8    257.7    410.6   4313.2    1521.0 |
     |--------------------------------------------------------------|
  6. |       2      1     40.3        .        .    417.5         . |
  7. |       2      2     72.8     40.3        .    837.8     420.3 |
  8. |       2      3     66.3     72.8     40.3    883.9      46.1 |
  9. |       2      4     51.6     66.3     72.8    437.9    -446.0 |
 1.  |       2      5     52.4     51.6     66.3    679.7     241.8 |
     +--------------------------------------------------------------+

数据的长宽转换 long <--> wide

           long
        +------------+                  wide
        | i  j   var |                 +----------------+
        |------------|                 | i   var1  var2 |
        | 1  1   4.1 |     reshape     |----------------|
        | 1  2   4.5 |   <--------->   | 1    4.1   4.5 |
        | 2  1   3.3 |                 | 2    3.3   3.0 |
        | 2  2   3.0 |                 +----------------+
        +------------+

       // long --> wide:

                                            j 旧变量名称
                                           /
                reshape wide year, i(i) j(j)

      //  wide --> long:

                reshape long stub, i(i) j(j)
                                           \
                                            j 新变量名称
. reshape long inc ue, i(id) j(year)  //Note: [1] sex 不发生变化,无需转换
                                      //      [2] j() 选项中填写新的变量名称

w:800

Stata 数据处理:推文

🍏 Stata 命令:

. lianxh 数据处理 面板 

推文:Stata 数据处理 (1)

推文:Stata 数据处理 (2)

推文:Stata 数据处理 (3)

Stata 程序基础



🍎 连玉君 b 站视频公开课:Stata程序的编写和发布

Stata 程序:推文

Stata 绘图

Stata 绘图:基本资料

Stata 绘图:图形模板和选项

. lianxh 模板 选项

Stata 绘图:系数可视化

.lianxh 可视

Stata绘图:直方图/柱状图/密度图

.lianxh 直方 柱状 密度

Stata 绘图:散点图

.lianxh 散点

Stata 绘图:进阶

Stata 绘图最难也是最灵活的部分就是选项的设定。

若将一幅图形的各个要素拆解开,了解对应的选项设定和帮助文件,积累一段时间,便可以熟练地绘制图形了。选项大致可以分成如下几类:

  • 图形模版:图形的整体风格 help scheme
  • 坐标轴、刻度等 help axis_options
  • 标题类:主标题、副标题、注释等 help title_options
  • 附加线类:横向或纵向附加线 help addline_options
 sysuse "sp500", clear
 replace volume = volume/1000
 keep in 1/57
*----------------------------------------Begin
#delimit ;
 twoway                        //help twoway
    (rspike hi low date, lw(*1.3))
    (line   close  date, 
       lpattern(solid) lwidth(*1.2) lcolor(blue))
   , 
   yscale(range(1100 1400))    //help axis_options
   ylabel(1100(100)1400, grid) //help axis_options
   ymtick(##5)                 //help axis_options
   xlabel(, angle(30))         //help axis_options
   ytitle("股价", place(top))  //help title_options
   xtitle("交易日")            //help title_options
   legend(order(1 "High-Low" 2 "Close") 
          ring(0) position(2) row(2)) //help legend_options
   subtitle("S&P 500", margin(b+2.5)) //help title_options
   note("数据来源: 雅虎财经!")         //help title_options
   scheme(s2mono);                    //help scheme 
#delimit cr

graph export "sp500_rspike_01.png", replace
*----------------------------------------Over
 sysuse "sp500", clear
 replace volume = volume/1000
 keep in 1/57
*----------------------------------------Begin
#delimit ;
 twoway 
    (rspike hi low date)
    (line   close  date)
    (bar    volume date, barw(.25) yaxis(2))
   , 
   yscale(axis(1) r(900 1400))
   yscale(axis(2) r(  9   45))
   ylabel(, axis(2) grid)
   ytitle("股价: 最高, 最低, 收盘",place(top))
   ytitle("交易量 (百万股)", axis(2) 
           bexpand just(left))
   xtitle(" ")
   legend(off)
   subtitle("S&P 500", margin(b+2.5))
   note("数据来源: 雅虎财经!") ;
#delimit cr
*----------------------------------------Over

graph export "sp500_rspike_02.png", replace

论文复现



更多推文

.lianxh 复现 重现 可重复

Stata 中最重要的命令

help cmd

  • help xtreg
  • help winsor2

search keywords

  • search dynamic panel data

lianxh keywords-click-

  • lianxh DID 倍分法
  • songbl 合成控制

Global style

封面图片

底部链接

顶部文字

幻灯片标题

作者信息