Dataset Catalog

Title Platform/Publisher Description URL Year Data Formats
iSnap - Introductory Programming DataShop "iSnap logs all student actions to a remote database, including any interactions with the user interface and coding area. It... Link 2017 ProgSnap
Scratch Dataset GitHub "A dataset of 250K recent Scratch projects from 100K different authors scraped from the Scratch project repository. We processed the... Link
Hour of Code 2013 Code.org Link 2014
ShortAnswersIDSV Harvard Dataverse This data set contains exam questions and answers from an introductory course to computer science Link
Supplementary data for study: Challenges Faced by Teaching Assistants in Computer Science Education Across Europe DataverseNO This data includes the themes, sub-themes, codes and exemplary quotes from the analysis of reflection essays for the study "Challenges... Link 2020
CSEDM 2019 Data Challenge DataShop The dataset used in the challenge comes from a study of novice Python programmers working with the ITAP intelligent tutoring... Link 2016 0
CodeWorkout data Spring 2019 DataShop Link 2019 ProgSnap 2
Supplementary data for study: Understanding the Relation Between Study Behaviors and Educational Design (Study 1) DataVerseNO It has been identified that the first-year experience is crucial to student motivation and throughput of study programs, therefore it... Link 2018
Code Hunt GitHub Code Hunt is a serious education game which has been played by over 140,000 students and enthusiasts over the past... Link 0 0
Blackbox ACM Digital Library Blackbox is a perpetual data collection project that collects data from worldwide users of the BlueJ IDE -- a programming... Link 0 0
2019 CS1 Keystroke Data Harvard Dataverse Keystroke data collected from CS1 student participants during 2019 at Utah State University. See readme.txt for detailed information. This dataset... Link 2019 ProgSnap 2
2021 CS1 Keystroke Data Harvard Dataverse Keystroke data collected from CS1 student participants during fall 2021 semester at Utah State University. See readme.txt for detailed information.... Link 2021 ProgSnap 2
CloudCoder GitHub CloudCoder is an open source web-based programming exercise system (inspired by CodingBat). It is designed to make it easy for... Link
OLI Introductory Programming with Media DataShop Link 2010 0
CloudCoder Link
Mob Programming DataShop Link
RedBlackTreeTutor DataShop Link
TMC Link
KC Modeling for Programming DataShop Step-by-step analysis of students solving introductory programming questions in Python Link 2016 Custom
Fall 2019 use of OpenDSA Formal Languages eTextbook DataShop Student utilization of e textbook, student perceptions and performance on exams Link 2019 --
Runestone Interactive DataShop Analsysis of student hint seeking behaviour in relation to time spent Link 2019
OLI Principles of Computing DataShop Python Link 2021 --
Python Trace Table Tutor DataShop Link
QuizJET DataShop Link
ReadingCircle DataShop Link
Utrecht Python Datasets DataShop Link
INFSCI OOP Studies DataShop Link
Australian Institute Python Datasets DataShop Link
E-learning Design Course Instances DataShop 0 Link 2022
Robomission Github Block-based environments are today commonly used for introductory programming activities like those that are part of the Hour of Code... Link
CodeBench CodeBench CodeBench is a Programming Online Judge developed by the Institute of Computing (IComp) of the Federal University of Amazonas, Brazil.... Link 2023
How Creatively Are We Teaching and Assessing Creativity in Computing Education: A Systematic Literature Review Zenodo Link 2021
METRECC Africa 2020 data Apollo - University of Cambridge Repository This file includes the responses from the 58 study participants to the survey questions on demographics, years of teaching experience,... Link 2021
FalconCode FalconCode FalconCode -a collection of over 1.5 million Python programs from over two thousand undergraduate students capturesoverfivesemestersworthofcodesamplesfromourintroductiontocomputingcourse,whichistakenbyeverystudent regardlessof theiracademicmajor. Link 0 0
IDE Action Log Dataset from a CS1 MOOC Zenodo This is a a dataset containing Integrated Development Environment (IDE) logs from an introductory programming MOOC. The dataset contains information... Link 2017 0
The Conventional versus a constructionist-Scratch programming instructions and students achievements in higher education CS1 classes. Mendeley Data Link 0
Dataset: Recursive problem solving in the online learning environment CodingBat by computer science students DZHW Link 2017
Programming steps working group at ITiCSE'22 GitHub The data is from an online introductory programming course using Dart language. The students have varied backgrounds and study from... Link 0 0
Concept Map for Cybersecurity Courses GitLab Link
Cybersecurity Literature Review Zenodo This paper discusses trends,and implications for further research in cybersecurity education. Link 2019 0
Distributed System Syllabi Zenodo authors try to map 51 offerings of distributed systems courses from different schools to two popular curriculum initiatives Link 2020
Supplementary materials for the paper "Hyperstyle : A Tool for Assessing the Code Quality of Solutions to Programming Assignments" Zenodo Link 2021 0
Group Work in Learning Programming DZHW The research project "Digital Programming in Teams" (DiP-iT) investigates how collaborative learning in computer science studies can be didactically developed... Link 2020
Discovering Misconceptions in formal methods using ITS OSF In this data repository we store the data for the paper Discovering and quantifying misconceptions in formal methods using intelligent... Link 2022
CS1QA GitHub Repository for CS1QA: A Dataset for assisting Code-based Question Answering in an Introductory Programming Course, published at NAACL 2022 The... Link
Artifacts of FSE-2017 paper on an Intelligent Tutoring System for Programming Github In our ESEC/FSE-17 paper titled A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments, we apply four... Link 2015
Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant Zeondo The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course... Link 2022
Dataset for the evaluation of student-level outcomes of a primary school Computer Science curricular reform Zenodo Student learning and perception data from three studies with respectively 1384, 2433 and 1644 grade 3-6 students (ages 7-11) and... Link 2022
Unravelling the numerical and spatial underpinnings of computational thinking: a pre-registered replication study OSF Link 2022

This dataset catalog is a compilation of open-source datasets in computing education, curated by the "Where is the data? Finding and reusing datasets in computing education" CompEd 23' working group. The working group aims to make research data more accessible and encourage open data practices in the computing education research (CER) community. For more information, please refer to the working group's paper: Kiesler, Natalie, John Impagliazzo, Katarzyna Biernacka, Amanpreet Kapoor, Zain Kazmi, Sujeeth Goud Ramagoni, Aamod Sane, Keith Tran, Shubbhi Taneja, and Zihan Wu. "Where's the Data? Exploring Datasets in Computing Education." In Proceedings of the ACM Conference on Global Computing Education Vol 2, pp. 209-210. 2023.