Stata 101 - 入门指南

连玉君 (中山大学)

Stata 基本功能: 🍏 🍓 务必通读 [U] User Guide



文件[File] 菜单


数据/图形/统计 菜单

在 dofile 中编写代码是 Stata 的标准动作


dofile 编辑器设定


*-基本回归分析  see [] 
  global path "D:/paper01/"  // 设置路径
  global D    "$path/data"
  global Out  "$path/out"  
  cd "$D"                    // 当前工作路径

  sysuse "nlsw88", clear
  global y "wage"            //存放被解释变量的全局暂元
  global x "hours tenure married collgrad age"  //help macro, 存放解释变量
    qui reg $y $x i.race i.industry i.occupation
    keep if e(sample)
  *-呈现并输出基本统计量(`///' 是换行命令)
    logout, save("$Out/Table1_sum01") excel replace:   ///
            tabstat $y $x, column(stats) format(%6.2f) ///
            stats(mean sd min p50 max) 
    logout, save("$Out/Table2_corr") excel replace: ///
            pwcorr_a $y $x

    reg $y $x           //OLS 回归, basic model
      est store m1      //存储回归结果  
    reg $y $x i.race    //i.race表示种族虚拟变量, help fvvarlist
      est store m2
    reg $y $x i.race i.industry
      est store m3
    reg $y $x i.race i.occupation
      est store m4 
    *-列表呈现回归结果  see [A4_Regress, 4.7 回归结果呈现]
    local s "using $Out/Table3.csv"  //指定存储结果的 Excel 文档名称
    local m "m1 m2 m3 m4"
    esttab `m' `s', replace nogap compress ///
         ar2 scalar(N F rss)  b(%6.3f)           ///
         star(* 0.1 ** 0.05 *** 0.01)      ///
         noomit  nobase                    ///
         indicate("行业效应=*.industry" "职业效应 =*.occupation")

暂元的定义 (local)

Stata 代码习惯:dofile 模板等

Project 功能

Stata 官方 video-PDF 实操-

  • 目的: 将同一个 project 相关的文档汇总到一个文档下,便于管理

  • 用途: 多个 dofile/data 的管理,如论文,课题或课程资料

  • 新建一个 Project:

    • [1] 打开 Do-file Editor,
    • [2] 依次点击 FileNewProject...., 保存
  • 打开一个 Project:

    • [1] 打开 Do-file Editor,
    • [2] 依次点击 File → Open → Project....


Stata 学习资源

Stata 主页资料

Stata 主页:入门介绍

  1. Introducing Stata—sample session
  2. The Stata user interface
  3. Using the Viewer
  4. Getting help
  5. Opening and saving Stata datasets
  6. Using the Data Editor
  7. Using the Variables Manager
  8. Using the Do-file Editor—automating Stata
  9. Learning more about Stata
  10. Subject index


Stata 数据处理

 help use         // 导入 Stata 格式数据

 help sysuse      // Stata 自带数据
 sysuse dir, all  // 数据列表

 help webuse      // 电子手册数据
 help dta_manuals // 数据列表

 help import      // 导入 Excel, csv, txt 格式的数据
      import excel data.xls, firstrow clear
      import excel "D:/data.xlsx", cellrange(A1:A10) firstrow clear

 copy "" "auto.csv" // CSV
      import delimited auto, rowrange(3:6)

 help bcuse       // Wooldridge 书中配套数据
      bcuse wagepan, clear 


sysuse "auto.dta", clear

gen wei2len = weight/length
gen lnPrice = ln(price)
gen bad = (rep78>=4 & rep78!=.)
bysort foreign: egen sd_price = sd(price)

regress  mpg  price  weight##weight  i.foreign  i.foreign#weight



详情:Stata:因子变量全攻略   帮助:help fvvarlisthelp varlist

符号 含义 实例
i. 标示为类别变量 reg y x i.yearyit=ai+xitβ+λt+uit\footnotesize y_{it}=a_i+x_{it}\beta + \lambda_t + u_{it}
c. 标示为连续变量
o. 略去某个类别或变量
# 交乘项 reg y D x i.D#c.x
yi=a+Diβ1+xiβ2+Dixiγ+ui\quad \footnotesize y_i = a + D_i\beta_1 + x_i\beta_2 + D_i x_i \gamma + u_i
reg y x c.x#i.yearyit=a+xitβt+uit\footnotesize y_{it}=a+x_{it}\beta_{\color{red}{t}} + u_{it}
## 两个变量及其交乘项 reg y x##zyi=a+xiβ1+ziβ2+xiziβ3+ui\footnotesize y_i=a+x_i\beta_1+z_i\beta_2 + x_iz_i\beta_3 + u_i
input group   x
        1     30
        1     50
        2     40
        2     60
        2     80
        3     70
. list group, clean  

                   1.      2.      3. 
       group   group   group   group        c.x        c.x        c.x  
  1.       1       1       0       0         30          0          0  
  2.       1       1       0       0         50          0          0  
  3.       2       0       1       0          0         40          0  
  4.       2       0       1       0          0         60          0  
  5.       2       0       1       0          0         80          0  
  6.       3       0       0       1          0          0         70
. reg x, noheader 

     x | Coefficient  Std. err.      t    P>|t|   
 group |
    2  |     20.000     16.667     1.20   0.316   
    3  |     30.000     22.361     1.34   0.272   
 _cons |     40.000     12.910     3.10   0.053   

. regfit

x =  40.00 + 0.00* + 20.00* + 30.00*
    (12.91) (0.00)          (16.67)         (22.36)
     N = 6, R2 = 0.43, adj-R2 = 0.05


基准组运算符 含义
ib#.x 使用*作为基准组,*为变量中其中一类的值
ib(#*).x 使用变量值中的第*位排序的值所对应的类别作为基准组
ib(first).x 使用变量的最小值所对应的类别作为基准组 (该项为 Stata 默认选项)
ib (last).x 使用变量的最大值所对应的类别作为基准组
ib( freq).x 使用变量值的频数最大的类别作为基准组
ibn.x 不设基准组

例 1:种族与工资

模型设定:Wagei=α+blackiβ1+hoursiβ2+blacki×hoursiβ3+ui\footnotesize Wage_i = \alpha + black_i\beta_1 + hours_i\beta_2 + black_i \times hours_i {\color{red}{\beta_3}} + u_i

. sysuse "nlsw88.dta", clear

. gen black=1
. replace black=0 if race!=2
. gen black_x_hours = black*hours
. reg wage black hours black_x_hours

. reg wage 2.race c.hours 2.race#c.hours
. reg wage 2.race##c.hours //与上一行等价

        wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        race |
      Black  |     -1.619      1.266    -1.28   0.201       -4.101       0.864
       hours |      0.089      0.012     7.24   0.000        0.065       0.113
race#c.hours |
      Black  |      0.007      0.033     0.22   0.827       -0.057       0.071
       _cons |      4.810      0.474    10.14   0.000        3.880       5.741

模型设定:Wagei=α+Biβ1+hiβ2+Bi×hiβ3+hi2β4+Bi×hi2β5+ui\footnotesize Wage_i = \alpha + B_i\beta_1 + h_i\beta_2 + B_i \times h_i {\color{red}{\beta_3}} + {\color{blue}{h_i^2\beta_4 + B_i\times h_i^2\beta_5}} + u_i

. sysuse "nlsw88.dta", clear

. reg wage 2.race##c.hours##c.hours

                wage |          Coeff       SE        t     P>|t|  
                race |
              Black  |         -3.461      2.274    -1.52   0.128  
               hours |          0.133      0.044     3.03   0.002  
        race#c.hours |
              Black  |          0.123      0.127     0.97   0.331  
     c.hours#c.hours |         -0.001      0.001    -1.05   0.296  
race#c.hours#c.hours |
              Black  |         -0.002      0.002    -0.95   0.344  
               _cons |          4.165      0.778     5.35   0.000  


帮助:help varlisthelp tsvarlist
🍏 reg y L.y L(0/2).x D.L.z   {\color{red}{\rightarrow\ \ }} yt=a+ρyt1+s=03θsxts+Δzt1+ut\small y_t = a + \rho y_{t-1} + \sum_{s=0}^3 \theta_s x_{t-s} + \Delta z_{t-1} + u_t

 Operator Meaning  L.x xt11-period lag L2.x xt22-period lag F.x xt+11-period lead F2. xt+2 2-period lead  D. Δxt  =xtxt1first difference  D2. Δ2xt=(xtxt1)(xt1xt2)difference of difference L(0/2).x xt,xt1,xt2sevaral variables L.D.x Δxt1can be nested\footnotesize \begin{array}{lll} \text { Operator } & \text {Meaning } & \\ \hline \texttt { L.x } & x_{t-1} & \text{1-period lag} \\ \texttt { L2.x } & x_{t-2} & \text{2-period lag} \\ \ldots & \\ \texttt { F.x } & x_{t+1} & \text{1-period lead} \\ \texttt { F2. } & x_{t+2} & \text{ 2-period lead } \\ \ldots & \\ \texttt { D. } & \Delta x_t \ \ = x_{t}-x_{t-1} & \text {first difference } \\ \texttt { D2. } & \Delta^2 x_t = (x_{t}-x_{t-1})-\left(x_{t-1}-x_{t-2}\right) & \text{difference of difference} \\ \ldots & \\ \texttt { L(0/2).x } & x_{t}, x_{t-1}, x_{t-2} & \text {sevaral variables} \\ \texttt { L.D.x } & \Delta x_{t-1} & \text {can be nested} \\ \end{array}

. use "", clear
. format gnp cpi %5.1f

. list year  L(1/3).(gnp cpi)  D.cpi, clean

                   L.      L2.      L3.      L.     L2.     L3.    D. 
       year      gnp      gnp      gnp     cpi     cpi     cpi   cpi  
  1.   1989        .        .        .       .       .       .     .  
  2.   1990   5837.9        .        .   124.0       .       .   6.7  
  3.   1991   6026.3   5837.9        .   130.7   124.0       .   5.5  
  4.   1992   6367.4   6026.3   5837.9   136.2   130.7   124.0   4.1  
  5.   1993   6689.3   6367.4   6026.3   140.3   136.2   130.7   4.2  
  6.   1994   7098.4   6689.3   6367.4   144.5   140.3   136.2   3.7  
  7.   1995   7433.4   7098.4   6689.3   148.2   144.5   140.3   4.2  
  8.   1996   7851.9   7433.4   7098.4   152.4   148.2   144.5   4.5 
. use "", clear
. keep if time<=5&company<=2
. format invest market %4.1f
. list company time invest L(1/2).invest market

     |                                L.      L2.                 D.|
     | company   time   invest   invest   invest   market    market |
  1. |       1      1    317.6        .        .   3078.5         . |
  2. |       1      2    391.8    317.6        .   4661.7    1583.2 |
  3. |       1      3    410.6    391.8    317.6   5387.1     725.4 |
  4. |       1      4    257.7    410.6    391.8   2792.2   -2594.9 |
  5. |       1      5    330.8    257.7    410.6   4313.2    1521.0 |
  6. |       2      1     40.3        .        .    417.5         . |
  7. |       2      2     72.8     40.3        .    837.8     420.3 |
  8. |       2      3     66.3     72.8     40.3    883.9      46.1 |
  9. |       2      4     51.6     66.3     72.8    437.9    -446.0 |
 1.  |       2      5     52.4     51.6     66.3    679.7     241.8 |

数据的长宽转换 long <--> wide

        +------------+                  wide
        | i  j   var |                 +----------------+
        |------------|                 | i   var1  var2 |
        | 1  1   4.1 |     reshape     |----------------|
        | 1  2   4.5 |   <--------->   | 1    4.1   4.5 |
        | 2  1   3.3 |                 | 2    3.3   3.0 |
        | 2  2   3.0 |                 +----------------+

       // long --> wide:

                                            j 旧变量名称
                reshape wide year, i(i) j(j)

      //  wide --> long:

                reshape long stub, i(i) j(j)
                                            j 新变量名称
. reshape long inc ue, i(id) j(year)  //Note: [1] sex 不发生变化,无需转换
                                      //      [2] j() 选项中填写新的变量名称


Stata 数据处理:推文

. lianxh 数据处理 面板 

推文:Stata 数据处理 (1)

推文:Stata 数据处理 (2)

推文:Stata 数据处理 (3)

Stata 程序基础

🍎 连玉君 b 站视频公开课:Stata程序的编写和发布

Stata 程序:推文

Stata 绘图

Stata 绘图:基本资料

Stata 绘图:图形模板和选项

Stata 绘图:系数可视化

Stata 绘图:散点图

Stata 绘图:进阶

Stata 绘图最难也是最灵活的部分就是选项的设定。


  • 图形模版:图形的整体风格 help scheme
  • 坐标轴、刻度等 help axis_options
  • 标题类:主标题、副标题、注释等 help title_options
  • 附加线类:横向或纵向附加线 help addline_options
 sysuse "sp500", clear
 replace volume = volume/1000
 keep in 1/57
#delimit ;
 twoway                        //help twoway
    (rspike hi low date, lw(*1.3))
    (line   close  date, 
       lpattern(solid) lwidth(*1.2) lcolor(blue))
   yscale(range(1100 1400))    //help axis_options
   ylabel(1100(100)1400, grid) //help axis_options
   ymtick(##5)                 //help axis_options
   xlabel(, angle(30))         //help axis_options
   ytitle("股价", place(top))  //help title_options
   xtitle("交易日")            //help title_options
   legend(order(1 "High-Low" 2 "Close") 
          ring(0) position(2) row(2)) //help legend_options
   subtitle("S&P 500", margin(b+2.5)) //help title_options
   note("数据来源: 雅虎财经!")         //help title_options
   scheme(s2mono);                    //help scheme 
#delimit cr

graph export "sp500_rspike_01.png", replace
 sysuse "sp500", clear
 replace volume = volume/1000
 keep in 1/57
#delimit ;
    (rspike hi low date)
    (line   close  date)
    (bar    volume date, barw(.25) yaxis(2))
   yscale(axis(1) r(900 1400))
   yscale(axis(2) r(  9   45))
   ylabel(, axis(2) grid)
   ytitle("股价: 最高, 最低, 收盘",place(top))
   ytitle("交易量 (百万股)", axis(2) 
           bexpand just(left))
   xtitle(" ")
   subtitle("S&P 500", margin(b+2.5))
   note("数据来源: 雅虎财经!") ;
#delimit cr

graph export "sp500_rspike_02.png", replace



Stata 中最重要的命令

help cmd

  • help xtreg
  • help winsor2

search keywords

  • search dynamic panel data

  • lianxh DID 倍分法
  • songbl 合成控制

Global style




