Reddit Bot

Mike Izbicki
Claremont McKenna College
mike@izbicki.me

Overview

Reddit is a popular social media service used by many students. Reddit also encourages its users to make bots, and it has an active "botting" community. In this assignment, students join this botting community by building bots that "talk" to each other. To make the assignment more fun, I try to time the assignment's due date with an election and have the bots talk about political candidates.

Students learn many technical skills from this assignment: how to use external python libraries, how to navigate trees, how to deal with randomness, and basic devops.

But the real benefits of this assignment are non-technical. The assignment forces students to wrestle with important ethical/legal dilemmas about online botting and freedom of speech, and it gives them the technical knowledge necessary to meaningfully participate in these debates.

Because of these non-technical applications, this assignment is particularly well suited for getting humanities students excited about computer science.

Example

I used this assignment in a CS1 class in Fall 2020, and the students' bots sent messages supporting their favorite presidential candidate. Most students bots wrote messages supporting either Biden or Trump.

All of these messages are publicly viewable on the /r/csci040temp subreddit. The screenshot below shows a typical conversation between two student bots:

Since these posts are publicly viewable, a large number of ethical and legal questions are raised by this assignment. The ethical/legal FAQ below addresses these concerns.

Materials

This project is relatively long for CS1, and I've scaffolded it into 2 labs and a homework.

Lab 1 introduces students to Python's PRAW library. Students learn how to download and install external libraries using pip, and read/write messages to reddit programmatically. Students are also exposed to many examples of good reddit bots that are important parts of the reddit ecosystem. Materials can be found in the lab-PRAW folder.

Lab 2 teaches students how to programmatically generate text using a simple MadLibs-style formula. This is a relatively easy assignment compared to the previous lab and final homework, but students find it particularly fun because I have them play around with AI Dungeon to explore state-of-the-art text generation. Materials can be found in the lab-madlibs folder.

The Homework combines the two labs into a full-fledged bot that posts random comments to reddit. The homework requires both writing code and running it for an extended period, which exposes students to basic devops ideas. I typically give students 2 weeks for the coding portion of the assignment, and 1 week for the devops portion. I don't require students submit their code until the end of the devops portion, however, so students typically continuously make corrections and improvements to their bot during this time as well. Materials can be found in the hw-redditbot folder.

Support files: There are also many supporting files needed for this assignment, all located in the support directory. There are three example solutions, each of which posts slightly different messages:

  1. bot-for.py posts messages about how great I, Mike Izbicki, would be as president
  2. bot-against.py posts messages about how bad I would be as president
  3. bot-plagiarism.py posts messages that are just copies of the most highly upvoted comments from other political subreddits
A final file called generate-posts.py copies posts from other political subreddits and posts them to the class subreddit.

The main purpose of all of these files is to generate a handful of starter posts and comments so that students have something that their bots can reply to. They can also serve as a reference for instructors for how to implement the assignment. Unfortunately, the code cannot be run as-is because credential information needs to be provided.

Ethical/legal FAQ

This assignment raises many ethical and legal questions for students, faculty, and administrators. This FAQ addresses these questions. I make this FAQ available for students, and we have extensive in-class discussions about these topics.

Q1: Bots are bad for democracy. Why would you want to teach students to make them?

Bad bots spreading fake news have gotten a lot of media attention, but many bots are actually good for society. The community /r/TheoryOfReddit maintains a wiki of reddit's most famous good bots. Other online social networks like Twitter also have many good bots.

Students who complete this assignment will have the skills needed to both distinguish between good and bad bots online, and to make their own good bots.

Q2: But is it legal to develop bots?

Yes. Reddit actively encourages its users to create bots, and bots form an important part of the reddit ecosystem. Reddit requires all bots to comply with the API terms and bottiquette, and students are required to review and comply with these documents. This is fairly easy to do. Basically, bots are not allowed to do things humans are not allowed to do (e.g. spam, harass, spread malware), and bots must comply with technical constraints like providing a valid HTTP user agent and obeying rate limitations.

Outside of reddit, the legal status of bots is a more interesting question. In 2019, the California Senate passed the Bolstering Online Transparency (BOT) law that requires most political and commercial bots to clearly label themselves as bots. This is an intuitively sensible regulation, but it has many subtle first ammendment implications and so legal scholars are currently unsure about it's constitutionality (see comments by Politico, and Electronic Frontier Foundation, and the Knight First Ammendment Institute). Fully understanding these legal issues requires a technical understanding of how bots work, and that understanding can only be gained by actually building bots.

Q3: I've heard that most bots are run by foreign governments like Russia. Don't you need a lot of resources and expertise to create a bot?

No. Bots are easy to create, and anyone with a small amount of CS knowledge can create them. A major purpose of this assignment is to help students understand that anyone can write a bot, not just state-actors.

Q4: How do you prevent unintended negative externalities of this assignment?

Since student bots are posting messages to the public internet, the main risk of this assignment is that someone who is not part of the class finds these messages and doesn't realize they are being sent by bots. I take two measures to prevent this from happening:

  1. I create a specialized subreddit dedicated for class activities, and student bots are required to post only in this subreddit. The subreddit is clearly marked as being only for class activities and that all posters are bots.

  2. All student bots must explicitly label themselves as a bot by including the word "bot" in their username.

These measures exceed the requirements imposed by law and reddit's terms of service.

Q5: Why give students a potentially controversial assignment that raises so many ethical and legal questions?

Ethical reflection is a core part of a good CS curriculum. For example, accreditation agencies like ABET require that we integrate ethics into computer science courses, and the ACM's Code of Ethics and Professional Conduct admonishes CS professionals to "reflect upon the wider impacts of their work". This assignment and accompanying in-class discussion teach students how to reflect on the ethical considerations of their work.

Bot technology (and other potentially dangerous CS technology) is going to be a central part of our future. We cannot shield students from this fact, so we must prepare them for it. Students will have to make personal decisions about how they will interact with this technology online, and our society will have to make collective decisions about how this technology will be regulated. It is best that students make these decisions from an informed perspective.

Collaboration with other Instructors

I'd love to help other instructors get setup running a similar assignment, or collaborate with other instructors to run this assignment jointly between our two schools. I think it would be particularly cool if we could have students' bots from different schools all "talking" with each other on the same subreddit, perhaps debating which school is best. Please email me at mike@izbicki.me if you're interested :)

Metadata

Summary Students build a reddit bot that posts messages to reddit and replies to other student's messages
Audience The assignment is designed for humanities majors taking CS1. I've found that these students enjoy both working with real world social media websites and the ethical/legal issues raised by this assignment.
Topics This assignment integrates a large number of both technical and non-technical topics. Technical topics include:
  1. Using python libraries. Students learn to install and use real-world python libraries. This includes such "banal" tasks as having to read the documentation, and more advanced concepts like dealing with rate limiting. (The PRAW library we use handles rate limiting for us automatically, but students need to adapt their debugging techniques to accommodate this fact.)
  2. OOP. Reddit's PRAW library has classes for representing subreddits, posts, messages, and users. Students need to learn how to use instances of each of these classes and understand the different capabilities of each.
  3. Trees. Reddit comments are structured as trees and students must navigate these trees in order to create new messages. This is significantly easier than the typical BST/AVLTree/Heap assignments common in CS2 classes because students do not have to build the tree structure. The PRAW library builds the tree and provides many useful functions for working with the tree.
  4. Randomness. Students learn to generate random text and to have their bots randomly select which messages to reply to.
  5. Devops. Students must keep their bot running over a multi-day period in order to post a sufficient number of messages. To accomplish this, students must manually monitor their programs in order to restart it when it crashes... and every student's code has bugs that cause crashes. This also helps students realize that code doesn't have to be "perfect" to be "useful".

    Students also learn how to manage their login credentials and prevent leaking the credentials. They are penalized if the credentials leak to anyone else, including to the instructor in the assignment submission. We have a long discussion in class about famous hacking incidents caused by employees uploading company credentials to github repos.

Non-technical topics include:
  1. Ethical/legal issues. Students learn how to comply with an independent website's terms of service, how to follow norms of the botting community, and the ongoing legal debates about online botting. They gain the technical skills needed to understand and contribute to these debates.
  2. Culture. Reddit has a large programming community, and many programming "subreddits" like /r/programming, /r/python, and /r/ProgrammerHumor. Students are naturally exposed to this culture through this assignment.
  3. Research Exposure. Students interact with state-of-the-art language models like GPT-3 and learn the high level risks and limitations of these models.
Difficulty This is a relatively long and difficult project for a CS1 course. I normally give students 3 weeks for this assignment: 2 weeks for coding, and 1 week for devops. The main challenges students stuggle with are:
  1. navigating the tree structure of reddit comments
  2. understanding the root cause of an error message (is it a problem in their code, in their network connection, or is reddit down?)
Strengths
  1. Students love the real-world nature of this assignment.
  2. The assignment is 100% whitehat, but it still "feels" blackhat to students, which makes it more exciting.
  3. Students naturally want to collaborate with each other when their bots start "communicating" with each other online.
  4. There's lots of opportunities to tie in legal/ethical concerns about bot making and CS more broadly.
  5. These legal/ethical problems make humanities students excited about computer science.
  6. Students talk about the assignment with their friends outside of class, getting them interested in computer science.
  7. Due to the open-ended nature of the problem, there are lots of opportunities for students to extend the assignment. More advanced students won't get bored.
Weaknesses The main weakness of this assignment is that it's much more work for the instructor than a typical CS1 assignment. For example:
  1. The instructor needs to learn the PRAW library and reddit, something most instructors won't have previous experience with.
  2. Grading is relatively hard, and traditional automatic grading tools won't work. This extra work is mitigated by the `bot_counter.py` script which can count the number of successfully sent messages, but the instructor still needs to at least manually skim student's code while grading.
  3. There are many factors outside the instructor's control, and these factors can cause unforseen difficulties. For example:
    1. Reddit's website occasionally goes down. Students can become confused about why their "correct" code is not working, and lab times can be much less productive if reddit downtime happens to correspond with a lab.
    2. Reddit occasionally bans student accounts for violating their terms of service. This can be frustrating for students when they don't understand what they did wrong, but is easily remedied by creating a new account.
  4. The assignment needs updating every year to respond to changes in reddit. So the given sample code/solutions may not work.
Dependencies
  • Students need to have mastered basic python control structures (if statements, for/while loops, and functions).
  • Knowledge of markdown is helpful (but not required) since reddit posts support markdown formatting.
  • Basic web knowledge is helpful for debugging, and I do this assignment after a web scraping assignment so that students have this knowledge. For example, students occasionally encounter HTTP status code error messages and DNS name resolution errors. Reddit requires bots have a custom HTTP user agent, so it's helpful to know what this actually is. No knowledge of HTML/CSS is needed.
Variants There are many variants to this problem described in the extra credit section of the grading rubric. Students often have a lot of fun on this assignment, and put in a lot of extra work to complete these extra credits. For students who do this, I'm happy to give them 150% on the assignment because they've demonstrated both a real enthusiasm for programming and the ability to work through novel complex problems that weren't covered in class. From my perspective, that's the best case result of a CS1 course, and I'm happy to use lavish extra credit to incentivize this behavior.