Welcome to Spring Semester 2025.

Syllabus

What is This Course?

What it is:

Innovations in statistical learning have created many engineering breakthroughs. From real time voice recognition to automatic categorization (and in some cases production) of news stories, machine learning is transforming the way we live our lives. These techniques are, at their heart, novel ways to work with data, and therefore they should have implications for social science. This course explores the intersection of statistical learning (or machine learning) and social science and aims to answer two primary questions about these new techniques:

  1. How does statistical learning work and what kinds of statistical guarantees can be made about the performance of statistical-learning algorithms?

  2. How can statistical learning be used to answer questions that interest social science researchers, such as testing theories or improving social policy?

In order to address these questions, we will cover so-called “standard” techniques such as supervised and unsupervised learning, statistical learning theory and nonparametric and Bayesian approaches. If it were up to me, this course would be titled “Statistical Learning for Social Scientists”—I believe this provides a more appropriate guide to the content of this course. And while this class will cover these novel statistical methodologies in some detail, it is not a substitute for the appropriate class in Computer Science or Statistics. Nor is this a class that teaches specific skills for the job market. Rather, this class will teach you to think about data analytics broadly. We will spend a great deal of time learning how to interpret the output of statistical learning algorithms and approaches, and will also spend a great deal of time on better understanding the basic ideas in statistical learning. This, of course, comes at some cost in terms of time spent on learning computational and/or programming skills.

Enrollment for credit in this course is simply not suitable for those unprepared in or uninterested in elementary statistical theory no matter the intensity of interest in machine learning or “Big Data”. Really.

You will be required to understand elementary mathematics in this course and should have at least some exposure to statistical theory. The class is front-loaded technically: early lectures are more mathematically oriented, while later lectures are more applied.

The topics covered in this course are listed later in this document. I will assign readings sparingly from Introduction to Statistical Learning, henceforth referred to as ISL. This text is available for free online and, for those who like physical books, can be purchased for about $25. Importantly, the lectures deviate a fair bit from the reading, and thus you will rely on your course notes much more than you might in other classes.

If—after you have read this document and preferably after attending the first lecture—you have any questions about whether this course is appropriate for you, please come talk to me.

What it is Not:

The focus of this course is conceptual. The goal is to create a working understanding of when and how tools from computer science and statistics can be profitably applied to problems in social science. Though students will be required to apply some of these techniques themselves, this course is not…

…a replacement for EC420, EC422, or a course in causal inference.

As social scientists, we are most often concerned with causal inference in order to anaOpelyze and write policies. Statistical learning and the other methods we will discuss in this course are generally not well-suited to these problems, and while I’ll give a short overview of standard methods, this is only to build intuitions. Ultimately, this course has a different focus and you should still pursue standard methodological insights from your home departments.

…a course on the computational aspects of the underlying methods.

There are many important innovations that have made machine learning techniques computationally feasible. We will not discuss these, as there are computer science courses better equipped to cover them. When appropriate, we will discuss whether something is computable, and we will even give rough approximations of the amount of time required (e.g. P vs NP). But we will not discuss how optimizers work or best practices in programming.

…a primer on the nitty-gritty of how to use these tools or a way to pad your resume.

The mechanics of implementation, whether it be programming languages or learning to use APIs, will not be covered in any satisfying level of depth. Students will be expected to learn most of the programming skills on their own. Specifically, while there will be some material to remind you of basic R commands, this is not a good course for people who are simply looking to learn the mechanics of programming. This course is designed to get you to use both traditional analytics and, eventually, machine learning tools. We will do some review of basic programming, and you will have the opportunity to explore topics that interest you through a final project, but ultimately this is a course that largely focuses on the theoretical and practical aspects of statistical learning as applied to social science and not a class on programming.

Perhaps most importantly, this course is an attempt to push undergraduate education toward the frontiers in social science. Accordingly, please allow some messiness. Some topics may be underdeveloped for a given person’s passions, but given the wide variety of technical skills and overall interests, this is a near certainty. Both the challenge and opportunity of this area comes from the fact that there is no fully developed, wholly unifying framework. Our collective struggle—me from teaching, you from learning—will ultimately bear fruit.

Course Times, Structure, and Office Hours

Lecture times and location

This course meets Tu/Th 1:00 - 2:20pm in NatSci 204. This class is in-person and will not have an online component.

Course Structure

Here’s how each week will work: before class on Tuesday, you’ll read the first entry under content for the week – the first is more geared towards “principles” and the second toward “applications” and are labeled as such on our course schedule. On Tuesday, you’ll come to lecture. Before Thursday, you’ll read the second content entry for the week, and on Thursday you’ll come to lecture ready to participate with a charged laptop. On Saturday night by 11:59pm, you’ll turn in your Weekly Writing assignment responding to the weekly writing prompt given during class on Tuesday or Thursday, rendered to PDF by RMarkdown and using the proper template (see assignments). By Monday at 11:59pm, you’ll turn in your Lab assignment, also using the RMarkdown template.

You’ll repeat this each week until we’re out of labs. If a holiday occurs on a due date, then the assignment or lab is due the next non-holiday day at 11:59pm or as otherwise noted. You’ll also work with your assigned group on the Group Project. We’ll assign groups after the drop deadline has passed.

Our material is laid out on each page, one for Tuesday and one for Thursday. This means we will conduct our class meetings by moving through the material and discussing it. Scrolling through together can feel a bit odd, but having one document means it is more searchable than, say, slides. Along the way, we’ll hit callout boxes instructing us to try some coding, time permitting.

This class is totally, unapologetically a work in progress. Material is a mish-mash of stuff from courses offered at Caltech, Stanford, Harvard, and Duke…so, yeah, it will be challenging. Hopefully, you’ll find it fun! Because this is an ever-evolving field, there may be more hiccups than you might otherwise expect:

  • Some of the lectures will be too long or too short.

  • Some of the lectures won’t make sense at first

  • Some of the time I’ll forget what I intended to say and awkwardly stare at you for a few moments (sorry).

I promise to improve the course with feedback, so if you have comments please speak up.

Office hours - Prof. Bushong

  • Prof. Bushong office hours: Tuesday and Thursday 4:00-5:00 PM

  • Prof. Bushong office hours Zoom link: Zoom, (Passcode: GODUCKS).

My office hours start immediately following class on Thursday, so I will stay in the classroom and answer questions, or we can walk-and-talk back to my office if you have a longer question. If you prefer to meet at my office located at Marshall-Adams Hall #25E, allow a little time for me to return from class. I’ll log onto Zoom when I arrive at the office for anyone attending remotely.

I also hold dedicated Sunday evening office hours via Slack between 6:30 PM until about 8:00 PM. In addition, I will check Slack throughout the week and use it as an always-on avenue of communication and help.

It would be remarkable if you didn’t need some assistance with the material, and I am here to help. One of the benefits of open office hours is to accommodate many students at once; if fellow students are in my office, please join in and feel very free to show up in groups. As a general rule, please first seek course-related help from the course website. However, if my scheduled office hours do not work for you please let me know. I may encourage you to make appointments with me. I ask that you schedule your studying so that you are prepared to ask questions during office hours – office hours are not a lecture and if you’re not prepared with questions we will end up awkwardly staring at each other for an hour until you leave.

Some gentle requests regarding office hours and on contacting me. First, my office hours end sharply at the end, so don’t arrive 10 minutes before the scheduled end and expect a full session. Please arrive early if you have lengthy questions, or if you don’t want to risk not having time due to others’ questions. You are free to ask me some stuff by e-mail, (e.g. a typo or something on a handout), but please know e-mail sucks for answering many types of questions. “How do I do this lab?” or “What’s wrong with my R setup?” are short questions with long answers. Come to office hours or ask on Slack.

Office Hours - Teaching Assistant

Slack

We will use Slack as a forum for asking questions about the course, including course policies and, primarily, help questions with R. Students are encouraged to help answer each others’ questions, and to use the forum as a first-step for seeking help. Myself and the TA will monitor slack and answer questions regularly. You can join our Slack channel with this link: Join EC242 Slack. Once you have joined, bookmark our Slack.

One of the biggest advantages of using Slack is that you can take screenshots and paste them directly into your post (using screenshot on mac or snipping tool on Windows, set it to capture to your “clipboard”).

My DM’s are closed

Please ask your question in one of the coursewide channels so that our TA can answer when I am unavailable. You will receive a faster reply that way.

Joining Slack counts towards course participation. I highly recommend you join.

About Me

Me: My primary area of expertise is economics. In brief, I am a behavioral economist whose research examines how cognitive biases and erroneous social beliefs influence decision-making and our interactions with others. In my research, I utilize a mix of theoretical modeling and experimental studies. These approaches guide the work in the Spartan Psychology and Economics Advanced Research (SPEAR) lab, which I formed in 2023. My teaching emphasizes how psychological factors shape economic choices (EC404; EC895) and the importance of rigorous empirical methods (EC242).

While my research occasionally touches the topics in the course, it mostly utilizes tools and techniques from this course as tools.

Course materials

The course website can be found at https://ec242.netlify.app (but you know that. You’re on it right now.)

The second required reading is the Introduction to Statistical Learning (2nd Ed), which is available free online (you can buy a paper copy if you want, and you hate trees).

All of the readings and software in this class are free. There are free online version of all the texts including Introduction to Statistical Learning (2nd Ed) and R / RStudio are free (don’t pay for RStudio). We will reference outside readings and there exist paper versions of some “books” but you won’t need to buy anything

R and RStudio/Posit

You will do all of your analysis with the open source (and free!) programming language R. You will use RStudio (which is undergoing a slow-mo rebrand to “Posit” while the functionality remains the same) as the main program to access R. Think of R as an engine and RStudio as a carR handles all the calculations produces the actual statistics and graphical output, while RStudio provides a nice interface for running R code.

R is free, but it can sometimes be a pain to install and configure. To make life easier, you can (and should!) use the free Posit.cloud (formerly Rstudio.cloud) service, which lets you run a full instance of RStudio in your web browser. This means you won’t have to install anything on your computer to get started with R. We recommend this for those who may be switching between computers and are trying to get some work done. That said, while Posit.cloud is convenient, it can be slow and it is not designed to be able to handle larger datasets or more complicated analysis and graphics. You also can’t use your own custom fonts with RStudio.cloud. And, generally speaking, you should have (from the prerequisite course) sufficient experience to make your R work. If not, over the course of the semester, you’ll probably want to get around to installing R, RStudio, and other R packages on your computer and wean yourself off of RStudio.cloud. If you plan on making a career out of data science, you should consider this a necessary step.

You can find instructions for installing R, RStudio/Posit, and all the tidyverse packages here. And you may find some other goodies.

Online help

Data science and statistical programming can be difficult. Computers are stupid and little errors in your code can cause hours of headache (even if you’ve been doing this stuff for years!).

Fortunately there are tons of online resources to help you with this beyond our course Slack. Two of the most important are StackOverflow (a Q&A site with hundreds of thousands of answers to all sorts of programming questions) and RStudio Community (a forum specifically designed for people using RStudio and the tidyverse (i.e. you)).

Searching for help with R on Google can sometimes be tricky because the program name is, um, a single letter. Google is generally smart enough to figure out what you mean when you search for “r scatterplot”, but if it does struggle, try searching for “rstats” instead (e.g. “rstats scatterplot”). Likewise, whenever using a specific package, try searching for that package name instead of the letter “r” (e.g. “ggplot scatterplot”). Good, concise searches are generally more effective.

Help with Using R: There are some excellent additional tutorials on R available through Rstudio/Posit Clould Primers.

Evaluations and Grades

Your grade in this course will be based on attendance/participation, labs, weekly writings, and a final project.

The general breakdown will be approximately 55% for labs, participation, and weekly writings, and 45% for projects (see below for specific details). The primary focus of the course is a final project; this requires two “mini-projects” to ensure you’re making satisfactory progress. Assignment of numeric grades will follow the standard, where ties (e.g., 91.5%) are rounded to favor the student. Evaluations (read: grades) are designed not to deter anyone from taking this course who might otherwise be interested, but will be taken seriously.

Weekly writings are intended to be an easy way to get some points. Labs will be short homework assignments that require you to do something practical using R. You must have access to computing resources and the ability to program basic statistical analyses. If you are unprepared to implement basic statistical coding, please take (or retake) PLS202. I highly encourage seeking coding advice from those who instruct computer science courses – it’s their job and they are better at it than I am. I’ll try to provide a good service, but I’m really not an expert in computer science.

More in-depth descriptions for all the assignments are on the assignments page. As the course progresses, the assignments themselves will be posted within that page.

Dropping your lowest scores

I will automatically drop your two lowest weekly writings and your one lowest lab assignment score. This allowance absorbs any personal issues, travel problems, computing issues, or non-excused medical issues that may preclude you from completing your assignment on time. If you request an extension or exemption, I will politely point you here.

Grade Rubric

Assignment Points Percent
Class Participation 25 5%
Weekly Writings (14-2 x 8 ea), drop two lowest 96 17%
Labs (14-1 x 15 ea), drop one lowest 195 35%
Mini project 1 50 9%
Mini project 2 50 9%
Final project 135 25%
Total 551
Grade Range Grade Range
4.0 92-100% 2.0 72-76%
3.5 87-91% 1.5 67-72%
3.0 82-87% 1.0 62-67%
2.5 77-81% 0.0 bad-66%

Grading, in general

Grading: come to class.

If you complete all assignments and attend all class dates, I suspect you will do very well. Given the way the syllabus is structured, I conjecture that the following is a loose guide to grades:

4.0 Turned in all assignments with good effort, worked hard on the projects and was proud of final product.

3.5 Turned in all assignments with good effort, worked a bit on the projects and was indifferent to final product.

3.0 Turned in all assignments with some effort, worked a bit on the projects and was shy about final product.

< 3.0 Very little effort, or did not turn in all assignments, worked very little on the projects and was embarassed by final product.

…of course, failing to turn in assignments can lead to a grade dramatically lower than just a 3.0.

Grading Appeals

All grades are considered final. Any request for a re-grade beyond simple point-tallying mistakes will require that the entire assignment be re-graded. Any points previously awarded may be changed in either direction in the re-grade.

Class Participation

Participation can take many forms. Most preferred is active participation during class – asking clarifying questions or responding to lecture questions. In the latter half of the semester, we will work in groups in class on small coding exercises and will share results at the end. Sharing your group’s results and code will always count towards participation. Finally, I will often bribe give extra credit points for answering specific questions in class, which I will clearly state as extra credit. Wrong answers get the same credit as right answers. We are here to learn . If you knew everything already, you wouldn’t be in the class.

We are a sizable class, so participation points will be awarded using EC slips which you will be able to turn into me (with your name on it) at the end of class. That way, I don’t slow down the class trying to write down everyone’s names.

Success in this Course

I promise, you are equipped to succeed in this course. It will provide two challenges: the conceptual challenge of understanding and linking statistical theory to social science problems; and the implementation challenge of learning to program in R.

Learning R can be difficult at first. Like learning a new language—Spanish, French, or Chinese—it takes dedication and perseverance. Hadley Wickham (the chief data scientist at RStudio and the author of some amazing R packages you’ll be using like) ggplot2made this wise observation:

It’s easy when you start out programming to get really frustrated and think, “Oh it’s me, I’m really stupid,” or, “I’m not made out to program.” But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later.

Even experienced programmers (like me) find themselves bashing their heads against seemingly intractable errors. If you’re finding yourself bashing your head against a wall and not making progress, try the following. First, take a break. Sometimes you just need space to see an error. Next, talk to classmates. Finally, if you genuinely cannot see the solution, e-mail the TA. But, honestly, it’s probably just a typo.

Academic honesty

Violation of MSU’s Spartan Code of Honor will result in a grade of 0.0 in the course. Moreover, I am required by MSU policy to report suspected cases of academic dishonesty for possible disciplinary action.

Generative AI

Generative AI is both a computing resource and potential avenue for cheating in violation of the Spartan Code of Honor (see Academic Integrity, below). For this course, some use of generative AI is permitted or forbidden as follows:

  • For the purposes of learning R coding, data cleaning and processing, visualization, and other coding applications of R, you are fully permitted to use ChatGPT or other generative AI models provided you indicate such a use at the top of your problem set, and include a comment #Used Generative AI in your code. When “stuck” on coding, I have always encouraged students to use Stack Overflow to find similar problems and then translate the found solutions into the problem at hand. It is one of the most important skills in coding. Generative AI facilitates this process, and thus is a tool. Since R itself is a tool to help you learn and apply the material, I view using generative AI for help in specific coding tasks as a suitable use. You must, in all cases, understand what your code is doing, and be able to justify its use. If you cannot tell me, on review, what a line of code is used for, then you have not used generative AI properly.

Accommodations

If you need a special accommodation for a disability, religious observance, or have any other concerns about your ability to perform well in this course, please contact me immediately so that we can discuss the issue and make appropriate arrangements. MSU has a specific policy for religious observance available here.

Michigan State University is committed to providing equal opportunity for participation in all programs, services and activities. Requests for accommodations by persons with disabilities may be made by contacting the Resource Center for Persons with Disabilities at 517-884-RCPD or on the web at rcpd.msu.edu. Once your eligibility for an accommodation has been determined, you will be issued a verified individual services accommodation (“VISA”) form. Please present this form to me at the start of the term and/or two weeks prior to the accommodation date (test, project, etc). Requests received after this date will be honored whenever possible.

Resources

Mental health concerns or stressful events may lead to diminished academic performance or reduce a student’s ability to participate in daily activities. Services are available to assist you with addressing these and other concerns you may be experiencing. You can learn more about the broad range of confidential mental health services available on campus via the Counseling & Psychiatric Services (CAPS) website at www.caps.msu.edu.

Mandated Reporting

Writings, labs, and projects, and other materials submitted for this class are generally considered confidential pursuant to the University’s student record policies. However, students should be aware that University employees, including instructors, may not be able to maintain confidentiality when it conflicts with their responsibility to report certain issues to protect the health and safety of MSU community members and others. As the instructor, I must report the following information to other University offices (including the Department of Police and Public Safety) if you share it with me: - Suspected child abuse/neglect, even if this maltreatment happened when you were a child; - Allegations of sexual assault, relationship violence, stalking, or sexual harassment; and - Credible threats of harm to oneself or to others. These reports may trigger contact from a campus official who will want to talk with you about the incident that you have shared. In almost all cases, it will be your decision whether you wish to speak with that individual. If you would like to talk about these events in a more confidential setting, you are encouraged to make an appointment with the MSU Counseling and Psychiatric Services.

Acknowledgements

This course structure and content has been improved greatly by Prof. Kirkpatrick. All remaining errors are my own.

Miscellanea

All class material will be posted on https://ec242.netlify.app. D2L will be used sparingly for submission of weekly writings and assignments and distribution of grades.

Contacting Me

Email is a blessing and a curse. Instant communication is wonderful, but often email is the wrong medium to have a productive conversation about course material. Moreover, I get a lot of emails. This means that I am frequently triaging emails into two piles: “my house is burning down” and “everything else”. Your email is unlikely to make the former pile. So… asking questions about course material is always best done in-class or in office hours. Students always roll their eyes when professors say things like that, but it’s true that if you have a question, it’s very likely someone else has the same question.

That said, email is still useful. If you’re going to use it, you should at least use if effectively. There’s a running joke in academia that professors only read an email until they find a question. They then respond to that question and ignore the rest of the email. I won’t do this, but I do think it is helpful to assume that the person on the receiving end of an email will operate this way. By keeping this in mind, you will write a much more concise and easy to understand email.

Some general tips:

  • Always include [EC242] in your subject line (brackets included).
  • Use a short but informative subject line. For example: [EC242] Final Project Grading
  • Use your University-supplied email for University business. This helps me know who you are.
  • One topic, one email. If you have multiple things to discuss, and you anticipate followup replies, it is best to split them into two emails so that the threads do not get cluttered.
  • Ask direct questions. If you’re asking multiple questions in one email, use a bulleted list.
  • Don’t ask questions that are answered by reading the syllabus! This drives me nuts.
  • I’ve also found that students are overly polite in emails. I suppose it may be intimidating to email a professor, and you should try to match the style that the professor prefers, but I view email for a course as a casual form of communication. Said another way: get to the point. Students often send an entire paragraph introducing themselves, but if you use your University email address, and add the course name in the subject, I will already know who you are. Here’s an example of a perfectly reasonable email:

Subject: [EC242] Lab, Question 2, Typo

Hi Prof. Bushong,

There seems to be a typo in the Lab on Question 2. The problem says to use a column of data that doesn’t seem to exist. Can you correct this or which should we use?

Thanks, Student McStudentFace

Letters of Recommendation / References

At this time, I am essentially not writing letters of recommendation. If you want to convince me to do so, you’ll likely need to work exceptionally hard. Consider this a warning.

Footnotes

  1. If you’ve got money to burn, you can buy me a burrito.↩︎

  2. This bothers me way more than it should.↩︎

  3. By the end of the course, you will realize that 1) I make many many many errors; 2) that I frequently cannot remember a command or the correct syntax; and 3) that none of this matters too much in the big picture because I know the broad approaches I’m trying to take and I know how to Google stuff. Learn from my idiocy.↩︎

  4. So just don’t cheat or plagiarize. This is an easy problem to avoid.↩︎