Call: 347-559-3778

Copyright 2019 by Lu Chen

Data Analysis of

MOOC Learner Behavior 


In Massive Open Online Courses (MOOCs), struggling learners often post questions in discussion forums to seek help. Overlooking these learners’ posts may detrimentally impact the learning experience and outcome. The Stanford MOOC Posts datasets contains 29604 anonymized learners from forum posts from eleven Stanford online courses. Previous data analysis into MOOC datasets have considered attrition rates, student demographics, and student participation in online forums. We were interested in considering the nature of the blog posts. Since we know that the number of posts decays through the course duration, what is inspiring students to post? In particular, we were interested in studying the correlation of posting and confusion. Do students tend to post when they are more confused? Does the confusion level of student posts change over the course, suggesting that some of the content is more confusing that other portions? Does it depend on the course? Or does it depend on the student and their willingness to participate in the forum? These questions led to the guiding question:

"What are the confusion characteristics of MOOC blog posts for different users over the course duration?"


We answered our guiding question to provide insights for educators and researchers to understand student behavior on MOOC forums and the buttress sections of the course. If there are particular times that student posting behavior is indicative of higher levels of confusion, perhaps an instructional designer would be able to provide some additional support to those concepts in the course, to the latter point. To the former, this analysis provides a general analysis of whether confusion expressed and content discussed in MOOC posts is a good method for gaining insight into student understanding.

To answer the guiding question, we did data analysis to answer the following questions:


1. What different types of student posters are there? Who is making how many posts? (across all courses)

2. By considering different number of posts,

How many posts are the different types of students posting through the course duration? (across all courses)

What are the confusion characteristics of the different types of students? (across all courses)

3. For two case study classes,

How does the confusion change over time? (individual course)

Is the content of the posts (word cloud) changing over time? (individual course)

We visualized our key findings for these questions , and then did user testing from the audience perspective ( educators and researchers) .

Visualization of Key findings 

Data Overview

Stanford Mooc Dataset

3 Branches, 11 Courses

29,604 Posts

Each post characterized with:

  • Sentiment

  • Confusion

  • Urgency

Scaling: 1(lowest) - 7(maximum)

Figure: Number of registered users per course versus the total amount of posts in the course blog (thickness of the branch)

Students post more in the beginning of the course.

Overall, student posts have (slightly) increased confusion as time goes by.

Post Characteristics 

0.1% Contributors have 15 posts or more

Figure: Number of Posts v/s Category of users (by post number)

53.3% One-timers only has 1 post

Figure: Posts over time with respect to different user categories

Figure: Different post types and confusion level by different student types

Case studies: When and How do Learners Get Confused?  

How to Learn Math

Environmental Physiology