<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Alex Zhao]]></title><description><![CDATA[Data science and statistics]]></description><link>https://alexzhao.net</link><generator>RSS for Node</generator><lastBuildDate>Mon, 13 Apr 2026 04:20:11 GMT</lastBuildDate><atom:link href="https://alexzhao.net/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Which universities gained R1 in the new 2025 Carnegie Classifications?]]></title><description><![CDATA[The American Council on Education recently released their new updated 2025 Carnegie Classification of Institutions of Higher Education (or Carnegie Classification for short). Of note, this update from the 2021 Classification release changed a previou...]]></description><link>https://alexzhao.net/which-universities-gained-r1-in-the-new-2025-carnegie-classifications</link><guid isPermaLink="true">https://alexzhao.net/which-universities-gained-r1-in-the-new-2025-carnegie-classifications</guid><category><![CDATA[academia]]></category><dc:creator><![CDATA[Alex Zhao]]></dc:creator><pubDate>Thu, 20 Feb 2025 06:54:30 GMT</pubDate><content:encoded><![CDATA[<p>The American Council on Education recently released their <a target="_blank" href="https://carnegieclassifications.acenet.edu/carnegie-classification/basic-classification/">new updated 2025 Carnegie Classification of Institutions of Higher Education</a> (or Carnegie Classification for short). Of note, this update from the 2021 Classification release changed a previously more subjective 10 metric criteria for R1 status (the highest classification) to <a target="_blank" href="https://www.acenet.edu/News-Room/Pages/Carnegie-Classifications-to-Make-Major-Changes.aspx">three new easier to understand cutoffs</a>:</p>
<ul>
<li><p><strong>R1 classification</strong>: institutions with at least $50 million in total research spending and 70 research doctorates</p>
</li>
<li><p><strong>R2 classification</strong>: institutions with at least $5 million in research spending and 20 research doctorates that don’t meet the R1 standard</p>
</li>
<li><p><strong>Research Colleges and Universities</strong>: institutions with at least $2.5 million in research spending that don’t meet the R1 or R2 standards</p>
</li>
</ul>
<p>With these new standards, a total of 42 new institutions have gained R1 status. 10 of those were reclassified from a previous unique designation (“Special Focus Four-Year: Research Institution”), and were all medical schools or health centers. This leaves 32 new schools, as well as 1 school that lost their R1 status, listed below:</p>
<h3 id="heading-schools-that-gained-r1-status-from-r2">Schools that gained R1 status from R2</h3>
<table><tbody><tr><td><p><strong>School Name</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td></tr><tr><td><p>American University</p></td><td><p>Washington</p></td><td><p>DC</p></td><td><p>$69,856,333</p></td><td><p>61</p></td></tr><tr><td><p>Brigham Young University</p></td><td><p>Provo</p></td><td><p>UT</p></td><td><p>$75,758,000</p></td><td><p>102</p></td></tr><tr><td><p>East Carolina University</p></td><td><p>Greenville</p></td><td><p>NC</p></td><td><p>$58,145,667</p></td><td><p>81</p></td></tr><tr><td><p>Florida Atlantic University</p></td><td><p>Boca Raton</p></td><td><p>FL</p></td><td><p>$65,937,333</p></td><td><p>105</p></td></tr><tr><td><p>Howard University</p></td><td><p>Washington</p></td><td><p>DC</p></td><td><p>$68,254,000</p></td><td><p>97</p></td></tr><tr><td><p>Indiana University–Purdue University-Indianapolis</p></td><td><p>Indianapolis</p></td><td><p>IN</p></td><td><p>$72,380,000</p></td><td><p>104</p></td></tr><tr><td><p>Lehigh University</p></td><td><p>Bethlehem</p></td><td><p>PA</p></td><td><p>$49,783,000</p></td><td><p>100</p></td></tr><tr><td><p>Loyola University Chicago</p></td><td><p>Chicago</p></td><td><p>IL</p></td><td><p>$47,550,000</p></td><td><p>132</p></td></tr><tr><td><p>Michigan Technological University</p></td><td><p>Houghton</p></td><td><p>MI</p></td><td><p>$92,206,000</p></td><td><p>81</p></td></tr><tr><td><p>Missouri University of Science and Technology</p></td><td><p>Rolla</p></td><td><p>MO</p></td><td><p>$56,430,000</p></td><td><p>107</p></td></tr><tr><td><p>New Mexico State University-Main Campus</p></td><td><p>Las Cruces</p></td><td><p>NM</p></td><td><p>$119,463,000</p></td><td><p>102</p></td></tr><tr><td><p>Northern Arizona University</p></td><td><p>Flagstaff</p></td><td><p>AZ</p></td><td><p>$71,130,333</p></td><td><p>86</p></td></tr><tr><td><p>Nova Southeastern University</p></td><td><p>Fort Lauderdale</p></td><td><p>FL</p></td><td><p>$35,291,667</p></td><td><p>474</p></td></tr><tr><td><p>Saint Louis University</p></td><td><p>Saint Louis</p></td><td><p>MO</p></td><td><p>$82,555,333</p></td><td><p>233</p></td></tr><tr><td><p>San Diego State University</p></td><td><p>San Diego</p></td><td><p>CA</p></td><td><p>$131,818,667</p></td><td><p>97</p></td></tr><tr><td><p>Southern Illinois University-Carbondale</p></td><td><p>Carbondale</p></td><td><p>IL</p></td><td><p>$50,822,667</p></td><td><p>108</p></td></tr><tr><td><p>Southern Methodist University</p></td><td><p>Dallas</p></td><td><p>TX</p></td><td><p>$53,084,333</p></td><td><p>132</p></td></tr><tr><td><p>The Catholic University of America</p></td><td><p>Washington</p></td><td><p>DC</p></td><td><p>$41,390,667</p></td><td><p>88</p></td></tr><tr><td><p>University of California-Merced</p></td><td><p>Merced</p></td><td><p>CA</p></td><td><p>$52,130,000</p></td><td><p>89</p></td></tr><tr><td><p>University of Dayton</p></td><td><p>Dayton</p></td><td><p>OH</p></td><td><p>$210,481,667</p></td><td><p>62</p></td></tr><tr><td><p>University of Idaho</p></td><td><p>Moscow</p></td><td><p>ID</p></td><td><p>$119,133,000</p></td><td><p>80</p></td></tr><tr><td><p>University of Massachusetts-Boston</p></td><td><p>Boston</p></td><td><p>MA</p></td><td><p>$66,638,667</p></td><td><p>88</p></td></tr><tr><td><p>University of Massachusetts-Lowell</p></td><td><p>Lowell</p></td><td><p>MA</p></td><td><p>$107,729,333</p></td><td><p>109</p></td></tr><tr><td><p>University of Missouri-Kansas City</p></td><td><p>Kansas City</p></td><td><p>MO</p></td><td><p>$47,025,667</p></td><td><p>134</p></td></tr><tr><td><p>University of North Carolina at Charlotte</p></td><td><p>Charlotte</p></td><td><p>NC</p></td><td><p>$62,343,333</p></td><td><p>160</p></td></tr><tr><td><p>University of North Dakota</p></td><td><p>Grand Forks</p></td><td><p>ND</p></td><td><p>$157,806,000</p></td><td><p>90</p></td></tr><tr><td><p>University of Rhode Island</p></td><td><p>Kingston</p></td><td><p>RI</p></td><td><p>$134,137,000</p></td><td><p>92</p></td></tr><tr><td><p>University of Toledo</p></td><td><p>Toledo</p></td><td><p>OH</p></td><td><p>$61,826,333</p></td><td><p>116</p></td></tr><tr><td><p>University of Vermont</p></td><td><p>Burlington</p></td><td><p>VT</p></td><td><p>$215,144,333</p></td><td><p>71</p></td></tr><tr><td><p>University of Wyoming</p></td><td><p>Laramie</p></td><td><p>WY</p></td><td><p>$127,696,667</p></td><td><p>94</p></td></tr><tr><td><p>William &amp; Mary</p></td><td><p>Williamsburg</p></td><td><p>VA</p></td><td><p>$75,523,000</p></td><td><p>70</p></td></tr><tr><td><p>Worcester Polytechnic Institute</p></td><td><p>Worcester</p></td><td><p>MA</p></td><td><p>$57,319,667</p></td><td><p>73</p></td></tr></tbody></table>

<h3 id="heading-schools-reclassified-r1-from-special-focus-four-year-research-institution">Schools reclassified R1 from “Special Focus Four-Year: Research Institution”</h3>
<table><tbody><tr><td><p><strong>Institution Name</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td></tr><tr><td><p>Baylor College of Medicine</p></td><td><p>Houston</p></td><td><p>TX</p></td><td><p>$797,095,333</p></td><td><p>100</p></td></tr><tr><td><p>Medical University of South Carolina</p></td><td><p>Charleston</p></td><td><p>SC</p></td><td><p>$298,713,333</p></td><td><p>236</p></td></tr><tr><td><p>The University of Tennessee Health Science Center</p></td><td><p>Memphis</p></td><td><p>TN</p></td><td><p>$118,398,333</p></td><td><p>185</p></td></tr><tr><td><p>The University of Texas Health Science Center at Houston</p></td><td><p>Houston</p></td><td><p>TX</p></td><td><p>$341,853,667</p></td><td><p>195</p></td></tr><tr><td><p>The University of Texas Health Science Center at San Antonio</p></td><td><p>San Antonio</p></td><td><p>TX</p></td><td><p>$246,440,667</p></td><td><p>110</p></td></tr><tr><td><p>University of California-San Francisco</p></td><td><p>San Francisco</p></td><td><p>CA</p></td><td><p>$1,854,175,000</p></td><td><p>149</p></td></tr><tr><td><p>University of Maryland - Baltimore</p></td><td><p>Baltimore</p></td><td><p>MD</p></td><td><p>$602,615,333</p></td><td><p>74</p></td></tr><tr><td><p>University of Nebraska Medical Center</p></td><td><p>Omaha</p></td><td><p>NE</p></td><td><p>$210,605,000</p></td><td><p>76</p></td></tr><tr><td><p>University of Texas Southwestern Medical Center</p></td><td><p>Dallas</p></td><td><p>TX</p></td><td><p>$717,175,667</p></td><td><p>84</p></td></tr><tr><td><p>Weill Medical College of Cornell University</p></td><td><p>New York</p></td><td><p>NY</p></td><td><p>$574,396,000</p></td><td><p>74</p></td></tr></tbody></table>

<h3 id="heading-schools-that-lost-r1-status">Schools that lost R1 status</h3>
<table><tbody><tr><td><p><strong>School Name</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td><td><p><strong>2025 Research Activity Designation</strong></p></td></tr><tr><td><p>University of Alabama in Huntsville</p></td><td><p>Huntsville</p></td><td><p>AL</p></td><td><p>$160,917,333</p></td><td><p>39</p></td><td><p>Research 2: High Spending and Doctorate Production</p></td></tr></tbody></table>

<h3 id="heading-newly-classified-r2-schools">Newly classified R2 schools</h3>
<p>Schools newly classified as R2’s include a mix of those that were previously classified as Doctoral/Professional Universities or Master’s Colleges and Universities, as well as Special Focus Institutions. The former can be considered to have “gained” R2 status (similar to R2’s moving up to R1’s) while the latter might be better considered as simply reclassified for simplification.</p>
<p>“Newly gained” R2 status:</p>
<table><tbody><tr><td><p><strong>INSTNM</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>2021 Carnegie Classification</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td></tr><tr><td><p>Abilene Christian University</p></td><td><p>Abilene</p></td><td><p>TX</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$6,363,000</p></td><td><p>68</p></td></tr><tr><td><p>Appalachian State University</p></td><td><p>Boone</p></td><td><p>NC</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$6,725,667</p></td><td><p>15</p></td></tr><tr><td><p>California State Polytechnic University-Pomona</p></td><td><p>Pomona</p></td><td><p>CA</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$8,547,333</p></td><td><p>10</p></td></tr><tr><td><p>California State University-Los Angeles</p></td><td><p>Los Angeles</p></td><td><p>CA</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$9,802,333</p></td><td><p>11</p></td></tr><tr><td><p>California State University-Sacramento</p></td><td><p>Sacramento</p></td><td><p>CA</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$25,173,667</p></td><td><p>13</p></td></tr><tr><td><p>CUNY Hunter College</p></td><td><p>New York</p></td><td><p>NY</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$35,445,333</p></td><td><p>19</p></td></tr><tr><td><p>Delaware State University</p></td><td><p>Dover</p></td><td><p>DE</p></td><td><p>Master's Colleges &amp; Universities: Medium Programs</p></td><td><p>$28,569,333</p></td><td><p>20</p></td></tr><tr><td><p>East Texas A &amp; M University</p></td><td><p>Commerce</p></td><td><p>TX</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$5,270,333</p></td><td><p>64</p></td></tr><tr><td><p>Embry-Riddle Aeronautical University-Daytona Beach</p></td><td><p>Daytona Beach</p></td><td><p>FL</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$23,968,333</p></td><td><p>25</p></td></tr><tr><td><p>Hampton University</p></td><td><p>Hampton</p></td><td><p>VA</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$10,966,667</p></td><td><p>27</p></td></tr><tr><td><p>Hofstra University</p></td><td><p>Hempstead</p></td><td><p>NY</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$6,882,000</p></td><td><p>65</p></td></tr><tr><td><p>Kean University</p></td><td><p>Union</p></td><td><p>NJ</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$6,366,333</p></td><td><p>25</p></td></tr><tr><td><p>Lamar University</p></td><td><p>Beaumont</p></td><td><p>TX</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$6,402,000</p></td><td><p>67</p></td></tr><tr><td><p>Pepperdine University</p></td><td><p>Malibu</p></td><td><p>CA</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$5,687,333</p></td><td><p>71</p></td></tr><tr><td><p>Saint Joseph's University</p></td><td><p>Philadelphia</p></td><td><p>PA</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$6,879,333</p></td><td><p>26</p></td></tr><tr><td><p>San Jose State University</p></td><td><p>San Jose</p></td><td><p>CA</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$71,733,667</p></td><td><p>12</p></td></tr><tr><td><p>South Carolina State University</p></td><td><p>Orangeburg</p></td><td><p>SC</p></td><td><p>Master's Colleges &amp; Universities: Small Programs</p></td><td><p>$6,774,000</p></td><td><p>19</p></td></tr><tr><td><p>Southern Connecticut State University</p></td><td><p>New Haven</p></td><td><p>CT</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$6,681,000</p></td><td><p>18</p></td></tr><tr><td><p>Texas Woman's University</p></td><td><p>Denton</p></td><td><p>TX</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$5,964,667</p></td><td><p>155</p></td></tr><tr><td><p>University of Michigan-Dearborn</p></td><td><p>Dearborn</p></td><td><p>MI</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$8,830,333</p></td><td><p>19</p></td></tr><tr><td><p>University of Northern Colorado</p></td><td><p>Greeley</p></td><td><p>CO</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$5,177,000</p></td><td><p>94</p></td></tr><tr><td><p>University of Puerto Rico-Mayaguez</p></td><td><p>Mayaguez</p></td><td><p>PR</p></td><td><p>Master's Colleges &amp; Universities: Medium Programs</p></td><td><p>$16,877,000</p></td><td><p>17</p></td></tr><tr><td><p>University of San Francisco</p></td><td><p>San Francisco</p></td><td><p>CA</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$4,555,000</p></td><td><p>29</p></td></tr><tr><td><p>University of West Florida</p></td><td><p>Pensacola</p></td><td><p>FL</p></td><td><p>Master's Colleges &amp; Universities: Larger Programs</p></td><td><p>$38,720,333</p></td><td><p>24</p></td></tr><tr><td><p>Virginia State University</p></td><td><p>Petersburg</p></td><td><p>VA</p></td><td><p>Master's Colleges &amp; Universities: Medium Programs</p></td><td><p>$11,984,333</p></td><td><p>15</p></td></tr><tr><td><p>Yeshiva University</p></td><td><p>New York</p></td><td><p>NY</p></td><td><p>Doctoral/Professional Universities</p></td><td><p>$7,129,000</p></td><td><p>74</p></td></tr></tbody></table>

<p>Reclassified Special Focus Institutions:</p>
<table><tbody><tr><td><p><strong>Institution Name</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td></tr><tr><td><p>Albert Einstein College of Medicine</p></td><td><p>Bronx</p></td><td><p>NY</p></td><td><p>$318,755,667</p></td><td><p>29</p></td></tr><tr><td><p>Eastern Virginia Medical School</p></td><td><p>Norfolk</p></td><td><p>VA</p></td><td><p>$21,857,000</p></td><td><p>33</p></td></tr><tr><td><p>Icahn School of Medicine at Mount Sinai</p></td><td><p>New York</p></td><td><p>NY</p></td><td><p>$895,568,000</p></td><td><p>44</p></td></tr><tr><td><p>Louisiana State University Health Sciences Center-New Orleans</p></td><td><p>New Orleans</p></td><td><p>LA</p></td><td><p>$55,690,667</p></td><td><p>19</p></td></tr><tr><td><p>Medical College of Wisconsin</p></td><td><p>Milwaukee</p></td><td><p>WI</p></td><td><p>$310,928,333</p></td><td><p>35</p></td></tr><tr><td><p>Oklahoma State University Center for Health Sciences</p></td><td><p>Tulsa</p></td><td><p>OK</p></td><td><p>$9,542,000</p></td><td><p>13</p></td></tr><tr><td><p>Oregon Health &amp; Science University</p></td><td><p>Portland</p></td><td><p>OR</p></td><td><p>$483,260,000</p></td><td><p>47</p></td></tr><tr><td><p>Rush University</p></td><td><p>Chicago</p></td><td><p>IL</p></td><td><p>$118,222,000</p></td><td><p>19</p></td></tr><tr><td><p>Texas Tech University Health Sciences Center</p></td><td><p>Lubbock</p></td><td><p>TX</p></td><td><p>$45,598,333</p></td><td><p>85</p></td></tr><tr><td><p>The Rockefeller University</p></td><td><p>New York</p></td><td><p>NY</p></td><td><p>$367,862,667</p></td><td><p>36</p></td></tr><tr><td><p>The University of Texas Medical Branch at Galveston</p></td><td><p>Galveston</p></td><td><p>TX</p></td><td><p>$207,823,000</p></td><td><p>48</p></td></tr><tr><td><p>University of Arkansas for Medical Sciences</p></td><td><p>Little Rock</p></td><td><p>AR</p></td><td><p>$203,203,667</p></td><td><p>25</p></td></tr><tr><td><p>University of Massachusetts Chan Medical School</p></td><td><p>Worcester</p></td><td><p>MA</p></td><td><p>$344,090,667</p></td><td><p>52</p></td></tr><tr><td><p>University of Oklahoma-Health Sciences Center</p></td><td><p>Oklahoma City</p></td><td><p>OK</p></td><td><p>$128,185,333</p></td><td><p>28</p></td></tr><tr><td><p>Upstate Medical University</p></td><td><p>Syracuse</p></td><td><p>NY</p></td><td><p>$49,558,333</p></td><td><p>18</p></td></tr></tbody></table>

<h3 id="heading-newly-recognized-research-colleges-and-universities">Newly recognized Research Colleges and Universities</h3>
<p>This category kind of rolls up previous lower classifications and establishes what seems to be a simpler, clearer way to recognize universities that conduct research, although not up to R2 standards. Only two tables will be called out, those that dropped down from the previous R2 classification, and schools that were not classified at all.</p>
<p>Schools dropping from R2 to RCU:</p>
<table><tbody><tr><td><p><strong>School Name</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td></tr><tr><td><p>California State University-East Bay</p></td><td><p>Hayward</p></td><td><p>CA</p></td><td><p>$17,066,333</p></td><td><p>9</p></td></tr><tr><td><p>California State University-San Bernardino</p></td><td><p>San Bernardino</p></td><td><p>CA</p></td><td><p>$12,229,333</p></td><td><p>11</p></td></tr><tr><td><p>University of Maryland Eastern Shore</p></td><td><p>Princess Anne</p></td><td><p>MD</p></td><td><p>$9,546,667</p></td><td><p>18</p></td></tr></tbody></table>

<p>Schools not previously included in Carnegie Classifications altogether:</p>
<table><tbody><tr><td><p><strong>INSTNM</strong></p></td><td><p><strong>City</strong></p></td><td><p><strong>State</strong></p></td><td><p><strong>Average Research Spending (FY 21-23)</strong></p></td><td><p><strong>Average Research Doctorates (21-23)</strong></p></td></tr><tr><td><p>Pennsylvania State University-Penn State Erie-Behrend College</p></td><td><p>Erie</p></td><td><p>PA</p></td><td><p>$6,502,333</p></td><td><p>0</p></td></tr><tr><td><p>Pennsylvania State University-Penn State Harrisburg</p></td><td><p>Middletown</p></td><td><p>PA</p></td><td><p>$3,674,667</p></td><td><p>7</p></td></tr></tbody></table>]]></content:encoded></item><item><title><![CDATA[New research paper about high-dimensional inference out now]]></title><description><![CDATA[A paper that was part of my dissertation work during grad school is now finally published in the Annals of Statistics. Details can be found below:
Title: Testing high-dimensional regression coefficients in linear models
Abstract: This paper is concer...]]></description><link>https://alexzhao.net/new-research-paper-about-high-dimensional-inference-out-now</link><guid isPermaLink="true">https://alexzhao.net/new-research-paper-about-high-dimensional-inference-out-now</guid><category><![CDATA[statistics]]></category><category><![CDATA[research]]></category><category><![CDATA[paper]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[linearregression]]></category><dc:creator><![CDATA[Alex Zhao]]></dc:creator><pubDate>Mon, 17 Feb 2025 01:07:43 GMT</pubDate><content:encoded><![CDATA[<p>A paper that was part of my dissertation work during grad school is now finally published in the <em>Annals of Statistics</em>. Details can be found below:</p>
<p><strong>Title:</strong> Testing high-dimensional regression coefficients in linear models</p>
<p><strong>Abstract</strong>: This paper is concerned with statistical inference for regression coefficients in high-dimensional linear regression models. We propose a new method for testing the coefficient vector of the high-dimensional linear models, and establish the asymptotic normality of our proposed test statistic with the aid of the martingale central limit theorem. We derive the asymptotical relative efficiency (ARE) of the proposed test with respect to the test proposed in Zhong and Chen (<em>J. Amer. Statist. Assoc.</em> <strong>106</strong> (2011) 260–274), and show that the ARE is always greater or equal to one under the local alternative studied in this paper. Our numerical studies imply that the proposed test with critical values derived from its asymptotical normal distribution may retain Type I error rate very well. Our numerical comparison demonstrates the proposed test performs better than existing ones in terms of powers. We further illustrate our proposed method with a real data example.</p>
<p><strong>Canonical link:</strong> <a target="_blank" href="https://doi.org/10.1214/24-AOS2420">https://doi.org/10.1214/24-AOS2420</a></p>
<p>Because part of my time working on this was funded via an NIH grant an open-access PDF is available for those who do not subscribe to the journal in question.</p>
<p><strong>Open access link</strong> : <a target="_blank" href="https://scholarsphere.psu.edu/resources/231ddcf5-36b8-40bf-b08a-d453b00aaf25">https://scholarsphere.psu.edu/resources/231ddcf5-36b8-40bf-b08a-d453b00aaf25</a></p>
]]></content:encoded></item><item><title><![CDATA[The most annoying default setting to turn off on a new Windows 11 install]]></title><description><![CDATA[This is more or less just a note for myself in the future (for Windows 11 anyways, who knows how Windows 12 will work), but I figure if someone on a search engine stumbles across this, it might be helpful to them as well.
tl,dr; Go to OneDrive's Sync...]]></description><link>https://alexzhao.net/the-most-annoying-default-setting-to-turn-off-on-a-new-windows-11-install</link><guid isPermaLink="true">https://alexzhao.net/the-most-annoying-default-setting-to-turn-off-on-a-new-windows-11-install</guid><category><![CDATA[Windows]]></category><category><![CDATA[windows 11]]></category><category><![CDATA[OneDrive]]></category><dc:creator><![CDATA[Alex Zhao]]></dc:creator><pubDate>Sat, 30 Dec 2023 07:36:16 GMT</pubDate><content:encoded><![CDATA[<p>This is more or less just a note for myself in the future (for Windows 11 anyways, who knows how Windows 12 will work), but I figure if someone on a search engine stumbles across this, it might be <em>helpful to them as well.</em></p>
<p><em>tl,dr; Go to OneDrive's Sync and Backup section in Settings, unsync all the "important folders" from your computer to OneDrive, then check below for the Microsoft guide on which registry keys to change.</em></p>
<h3 id="heading-how-onedrive-worked-before-windows-11">How OneDrive worked before Windows 11</h3>
<p>Before Windows 11, OneDrive functioned much like the rest of the Microsoft (or any standard software company's) ecosystem did: it was there, it was a convenient default, it was somewhat hard to totally remove, and it was totally optional. Specifically, by default its usage was separated out from critical system folders, and was in general the same as any other cloud storage solution.</p>
<h3 id="heading-what-changed-in-windows-11">What changed in Windows 11</h3>
<p>If you're upgrading from a previous version of Windows to 11, not much has changed with regards to OneDrive. If, however, you're doing a fresh install or buying a new machine, one major annoying change is that by default, the Documents, Desktop, and Pictures folders built into Windows are automatically synced to your OneDrive account.</p>
<p>I can see why Microsoft did this, business case (tighter integration with OneDrive = more usage = more $$$) aside. While some people only want to selectively back up certain important files or otherwise organize their cloud storage how they please, a lot of people will tend to store most, if not all of their important files in a cloud backup just by default. Many people especially want to do this with their documents and photos, hence the natural inclusion of those two folders. As for the Desktop, we all know of many, many people who basically store all their files that they regular use or download in their Desktop area, because it's immediate and easily accessible.</p>
<p>So if you use OneDrive already, or if you don't want to think about which files to backup, this kind of default would generally work. There are a couple of scenarios where this absolutely sucks, however:</p>
<ol>
<li><p>You don't pay for OneDrive storage. In this case, you'd run up against your storage limits pretty quickly if you put even a relatively small number of picture or documents in those folders.</p>
</li>
<li><p>You tend to use the Desktop area to temporarily transfer large files, like from a phone or if you're copying over a hard drive or thumb drive</p>
</li>
<li><p>You have multiple machines and you don't want to share everything across all of them (especially the Desktop icons for any applications you have installed)</p>
</li>
</ol>
<p>The list can go on, but generally it makes more sense to not have this feature enabled than it does to have it enabled unless you are paying for OneDrive and don't want to do the bare minimum to manage your files. <strong>However, if you happen to be anywhere close to a power user or have used Windows for more than a few years, this default can be especially annoying.</strong></p>
<h3 id="heading-fixing-the-onedrive-syncing-issue">Fixing the OneDrive syncing issue</h3>
<p>This section is short because there are only really a few steps:</p>
<ol>
<li><p>Click on the OneDrive icon, go to Settings, then Sync and backup.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703921190296/98a21635-35ee-40b9-b1d7-505897fe30f4.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Click on Manage Backup next to "Back up important PC folders to OneDrive", and uncheck all the folders.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1703921253294/d2b21a98-dac9-478f-acbe-caacc13f3d4c.png" alt class="image--center mx-auto" /></p>
<p>Make sure that the Windows Registry keys are changed following this guide: <a target="_blank" href="https://support.microsoft.com/en-us/topic/operation-to-change-a-personal-folder-location-fails-in-windows-ffb95139-6dbb-821d-27ec-62c9aaccd720">https://support.microsoft.com/en-us/topic/operation-to-change-a-personal-folder-location-fails-in-windows-ffb95139-6dbb-821d-27ec-62c9aaccd720</a></p>
<p>And that's it. Now you should have actual local folders for your computer, separate from whatever you want to store in OneDrive</p>
]]></content:encoded></item><item><title><![CDATA[Where Do Penn State Statistics PhD Students Go After They Graduate? (2022 Update)]]></title><description><![CDATA[A version of this article was originally written in 2019. Now that I’ve
graduated and as part of moving my website to Hashnode I decided to
revisit this analysis with data through Summer 2022.
When I was a statistics PhD student, I was often asked wh...]]></description><link>https://alexzhao.net/where-do-penn-state-statistics-phd-students-go-after-they-graduate-2022-update</link><guid isPermaLink="true">https://alexzhao.net/where-do-penn-state-statistics-phd-students-go-after-they-graduate-2022-update</guid><category><![CDATA[statistics]]></category><category><![CDATA[exploratory data analysis]]></category><category><![CDATA[Penn State]]></category><dc:creator><![CDATA[Alex Zhao]]></dc:creator><pubDate>Wed, 12 Oct 2022 03:26:16 GMT</pubDate><content:encoded><![CDATA[<p><em>A version of this article was originally written in 2019. Now that I’ve
graduated and as part of moving my website to Hashnode I decided to
revisit this analysis with data through Summer 2022.</em></p>
<p>When I was a statistics PhD student, I was often asked what I intended
to do after I graduate. After I’d give my answer, the next natural
question was “where do PhD students from your department usually go
after they graduate?” This question also came up every year when
prospective students visited our department.</p>
<p>In the past, my answer to that second question was that 1/3rd of our
graduates went to academia, 1/3rd went into industry, and roughly 1/3rd
went into what could loosely be defined as “other,” which are jobs that
are not quite academia but also not industry either, such as those at a
national lab, a government agency, or a think tank like the RAND
Corporation. This was more of a guess than anything based on real data,
mostly because while we do have a webpage with where our alumni go after
they graduate (<a target="_blank" href="https://science.psu.edu/stat/alumni/members">https://science.psu.edu/stat/alumni/members</a>), that information is not in
a format that’s easily analyzable.</p>
<p>Nonetheless, I do actually want the answer to this question. Part of
this is so that I can actually give a good answer to prospective
students (or other people who are curious), and part of it is just to
satisfy my own curiosity. I recently spent some time organizing the
available data into a more manageable CSV file so that I can finally
answer the question of where Penn State statistics PhD graduates go.</p>
<h3 id="heading-the-data">The Data</h3>
<p>All of the data was taken off of the website link above for graduates
from 2010 until 2022. Previously, I had chosen 2010 as a cutoff because
data from before 2010 seemed more incomplete, and I thought that data
from before 2010 wouldn’t be representative of recent trends in the
department. Of course, now we have more recent data, but since I already
collected the previous data, I’m keeping it.</p>
<h3 id="heading-limitations-of-the-data">Limitations of the Data</h3>
<p>There are a few limitations from this dataset that are worth noting:</p>
<ul>
<li>This data only includes up to people who would graduate in the 2022
calendar year: some individuals might defend their dissertation in
the Fall 2022 semester, but as of this publication date nobody has
announced or defended yet this semester, and anyone defending after
today will only be eligible to graduate in Spring of 2023.</li>
<li>The information about first jobs is self-reported by the people
graduating. In the case of some individuals not publicly listed, I
have used personal recollection and some searches to find their
first post-graduation job.</li>
<li>Only people who graduated with a PhD are included in this dataset.
People sometimes withdrew or graduated with a Master’s for a variety
of reasons, but this analysis is focused on PhD graduates due to
them making up the overwhelming number of people who graduate from
the PhD program.</li>
<li>A couple of PhD graduates were excluded from the analysis because no
job information was provided nor could be found online.</li>
</ul>
<p>Overall, this meant that I had information on 157 graduates and the
first jobs and institutions they went to.</p>
<p>All of the data as well as the RMarkdown file (with the R code) can be
found here: <a target="_blank" href="https://github.com/yazhao/PSUPhdGrads">https://github.com/yazhao/PSUPhdGrads</a></p>
<h3 id="heading-definitions">Definitions</h3>
<p>For the purposes of this analysis, I looked only at the first job taken
by graduates right after they got their PhD. It’s entirely possible that
many of these people switched career tracks, but tracking down that full
information proved to be difficult. Additionally, I’m mostly interested
in where people are <em>first</em> placed, not necessarily where they
ultimately end up. I wanted to see what the immediate next step of a
Penn State statistics PhD would be. The three categories of jobs were:</p>
<p><strong>Academia</strong>: This includes any job working at an academic institution
like a university. It also included any first job where the title would
imply postdoc (ie “postdoctoral researcher”). This was an overly broad
definition, and so I included people who weren’t strictly on a
tenure-track assistant professorship, but rather anyone who got a job in
an academic setting.</p>
<p><strong>Industry</strong>: Anyone who went to work at a private company was put into
this category.</p>
<p><strong>Other</strong>: This included jobs and institutions that weren’t
necessarily academia but also clearly were not industry. In this
category included working at a government agency, a national lab,
central banks, and non-profit think tanks.</p>
<p>In the cases where I didn’t have a first job title, I used the first job
location to define the job type.</p>
<h3 id="heading-summary">Summary</h3>
<pre><code>## 
## Academia Industry    Other 
##    <span class="hljs-number">54.14</span>    <span class="hljs-number">38.22</span>     <span class="hljs-number">7.64</span>
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1665544799789/ZK3H41uUt.png" alt="Pie chart of the first jobs of statistics PhD graduates at Penn State from 2010 to 2022" /></p>
<p>Based on the past 13 years worth of data, it seems that the majority of our
PhD graduates go into academia. Over 50 percent of our graduates end up
in academia with their first job, a little under 40 percent go into
private industry, while only about 7.5 percent go into that Other
category. It seems that my initial impressions (and the answer I’ve been
giving people) were wrong. And while these numbers might be inflated due
to my broad definition of an academic job, it seems that our department
largely prepares people for academia.</p>
<h3 id="heading-trends">Trends</h3>
<p>I was curious to see if the trends behind these first jobs had changed
over the time of the dataset. Was this pattern of mostly academic jobs
relatively consistent, or have there been changes in the composition of
jobs over time?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1665544999105/ZG0I8DEdY.png" alt="Trend of percent of statistics PhD graduates by job type over time, 2010 to 2022" /></p>
<p>It seems like there has definitely been changes in the category
compositions over time. Initially, basically all graduates went into
academia. However, around 2013, that trend started shifting, as more
students went into industry. That trend reversed in 2016, though that
change was mostly because of people going into the Other category. By
2018, it was a roughly even split between academia and industry
positions, and since 2021 the number of people going into industry jobs
has exceeded those going into academia. People going into “other” jobs
has all but fallen off in that timeframe.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>My intuition that the department was roughly evenly split among the
three types of jobs for graduates was obviously incorrect. Most of my
mistake was in overestimating the share of people going into research
positions in the Other category, possibly partly due to the people I
knew when I was starting in the program. Based on historical data, the
academic/industry/other split is approximately 50/40/10. In recent
years, however, that trend has been changing, and post-2020 it’s been
more 50/50 academia/industry, with “other” jobs falling off. Moreover,
it looks like after 2020 the trend has been towards a 60/40 split
between industry and academic jobs, respectively.</p>
<p>This updated analysis also further solidifies the trend that I initially
saw in 2018: while the initial percent of people going to academia
recovered from the one-time 2018 dip, it’s now the case that industry
jobs are more popular than academic ones for students from Penn State’s
statistics department.</p>
<p>Of course, this kind of analysis has a lot of caveats. It’s possible
that people going into industry might not be as interested in filling
out this information, or people going into jobs that they don’t
particularly like. It’s also possible that a lot of the people who first
get an academic job (notably, many of those who are lecturers or
postdocs) might end up switching tracks and going into industry or some
other type of job, while the reverse trend might be much less common.</p>
<p>Nonetheless, this information does seem to capture a trend: while in the
past most graduates first went into academic jobs, it seems like more
and more statistics PhD students at Penn State are willing to jump into
industry first. In any case, at least now I have a more accurate answer
to the question of “Where do Penn State statistics PhD students go after
they graduate?” the next time somebody asks.</p>
]]></content:encoded></item><item><title><![CDATA[The Unroll.me Data That Uber Bought Probably Isn't That Anonymous]]></title><description><![CDATA[As was to be expected, a recent New York Times article about Uber has gotten the internet into an uproar over how Uber conducts its business. What's strange, however, is that while much of the internet's displeasure has been focused on Uber (and read...]]></description><link>https://alexzhao.net/the-unrollme-data-that-uber-bought-probably-isnt-that-anonymous</link><guid isPermaLink="true">https://alexzhao.net/the-unrollme-data-that-uber-bought-probably-isnt-that-anonymous</guid><category><![CDATA[technology]]></category><category><![CDATA[Startups]]></category><category><![CDATA[data privacy]]></category><dc:creator><![CDATA[Alex Zhao]]></dc:creator><pubDate>Tue, 25 Apr 2017 20:19:14 GMT</pubDate><content:encoded><![CDATA[<p>As was to be expected, a recent <a target="_blank" href="https://www.nytimes.com/2017/04/23/technology/travis-kalanick-pushes-uber-and-himself-to-the-precipice.html">New York Times article about Uber</a> has gotten the internet into an uproar over how Uber conducts its business. What's strange, however, is that while much of the internet's displeasure has been focused on Uber (and reading the article points out several aggressive and probably borderline unethical actions by the company), the biggest casualty from this story might not be Uber at all. Instead, another company, Slice Technologies, owner of a popular inbox decluttering service <a target="_blank" href="https://unroll.me/">Unroll.me</a>, was revealed to have <a target="_blank" href="https://www.nytimes.com/2017/04/24/technology/personal-data-firm-slice-unroll-me-backlash-uber.html">sold receipt data</a> from Uber's competitor Lyft to Uber, in a supposedly anonymized fashion.</p>
<p>I personally don't think that Slice's particular defenses (we told you we do this and everyone else does it) are particularly compelling. The former depends on a Terms of Service no human being is actually reading, and the latter isn't quite true: the closest analogy I have seen would be Google mining its own data (in its apps and services) to monitor employees of competitors like Apple and Facebook. But that is an issue for another day. </p>
<p>My problem with this whole saga is the fact that <strong>the information Slice sold really isn't that anonymous</strong>.</p>
<h3 id="heading-location-data-might-be-enough">Location data might be enough</h3>
<p>Let's suppose for a second that Slice did what it thought was due diligence and removed the names from every receipt. From the articles that I've read, that is essentially the extent to which the data was anonymized and sold to Uber. </p>
<blockquote>
<p>Unroll.me, a free service to unsubscribe from email lists, can scour people’s inboxes for receipts from services like Lyft and then sell the information to companies like Uber. The data is anonymized, meaning individuals’ names are not attached to the information, and can be used as a proxy for the health of a rival.</p>
</blockquote>
<p>It's possible that Slice also protected some of the location data, but a) that's very much unclear from its own <a target="_blank" href="https://www.slice.com/privacy">privacy policy</a>, and b) If they sold receipt-level data, it's possible both that Uber wanted that location data to keep track of Lyft and that the data is there anyways. At the very least, nothing in Slice's privacy policy prohibits this, and it's very likely they make as much of the individual receipt-level information available as possible.</p>
<p>This is a problem, since a Lyft receipt looks like this:
<img src="https://images.ctfassets.net/ecaxsf5u3xse/4TPDlSlcd9vgtMBS1G1Rkl/09f190ce492ac2495dcf1dbfba428574/payment_frequency_example.png" alt="Lyft Receipt" /></p>
<p>If the reporting around these data sales is true, then what we assume is that all of the information in the receipt above, sans the customer's name, were sold to Uber. That means the last 4 of the credit card, the pickup, and the dropoff, would all be information that Uber could effectively parse. In the abstract, missing the names might be an effective method for eliminating personally identifiable information that would allow someone to trace individual trips back to an actual person. That is to say, this it's difficult to take that kind of receipt data and expose individual identities from just that database alone.</p>
<p><strong>The problem is that Uber has another pretty much identical database which can be used to reveal these receipt-level identities.</strong></p>
<h3 id="heading-how-to-take-anonymous-data-and-find-identities-if-youre-uber">How to take "anonymous" data and find identities if you're Uber</h3>
<ol>
<li>Take your own users' data, specifically any saved addresses (for example for Home or Work)</li>
<li>In addition to this, find for each user their most commonly used addresses/GPS coordinate region (say, within 50 feet) both for pickup and dropoff</li>
<li>Since the Slice data exists also at the user level, do the same thing for receipts you get for each distinct customer</li>
<li>If Slice gives you the last 4 of the credit cards for each receipts, compare those with your internal data as well</li>
<li>Use the information from steps 1-4 (and information about when rides were requested as well) to match up users</li>
<li>Now you have information on which of your Uber riders use Lyft, and their names.</li>
</ol>
<p>There are, of course, many caveats to this process. For starters, it's not very well defined since the objective here was not to fully flesh out the exact process by which to match up this supposedly anonymous data with Uber's own data, but rather to point out the general direction by which it could be done. It's also possible that Slice Technologies is better about data privacy than the articles claim, since they might simply aggregate data up at a level that makes it harder to tell who individual users are. Moreover, it's possible that Slice neither provided nor Uber requested that level of granularity, though I am more skeptical about the latter case and nothing within Slice's privacy policy actually rules that out (and indeed reserves the right to sell email messages). And finally, even given all this information, you probably can't get a 100% match between Lyft and Uber users (though very good rates are not out of the question).</p>
<h4 id="heading-do-a-better-job-of-protecting-user-privacy">Do a better job of protecting user privacy</h4>
<p>Nonetheless, the biggest problem was simply the claim that removing people's names from this receipt level data was sufficient to anonymize the information. Perhaps in the abstract this is true, but when everything else about the receipt information is available, it's not that hard to figure out a name. Indeed, Uber would be far from the only company that could do this: any e-commerce business that stores addresses or tracks locations could use this level of data to find the identities of Lyft users. Given all of the potential touchpoints available from these receipts (and other touchpoints Slice has, including but not limited to device IDs, geographical location, times of purchase, amount of purchase, and last 4 of the credit card), to call this data anonymous simply because names were removed is simply wrong.</p>
<p>Data and databases don't exist simply in the abstract. When considering how to securely release or anonymize your data, not accounting for the data that the party you're selling to might have is one surefire way to leak data you didn't want to reveal in the first place. At the very least, companies should be revealing as little data as possible to third parties if they truly care about protecting user privacy. Barring that, aggregating the data or adding random noise to the user-level data is imperative if you actually care about obscuring personally identifiable information.</p>
<p>Also, if you haven't already, you should probably delete your Unroll.me account.</p>
]]></content:encoded></item></channel></rss>