As the use of digital resources continues expand in education, an unprecedented amount of new data is becoming available to educational researchers and practitioners. Among these new data sources, unstructured data such as text represents a significant share. This introductory course on text mining is designed to prepare education researchers and practitioners to use this data more efficiently, effectively, and ethically. This course will provide students with an overview of text mining as an analytic approach, examples of its use in educational contexts, and applied experience with widely adopted tools and techniques. As participants gain experience in the collection, analysis, and reporting of data throughout the course, they will be better prepared help educational organizations understand and improve both online and blended learning environments.
Course Prerequisites/Co-requisites: This course is part of the Graduate Certificate in Learning Analytics Program (GCLA) program and is open to all Masters and Doctoral students. This course has no prerequisites but ECI 586: Introduction to Learning Analytics and/or prior experience with R is highly recommended is highly recommended. ECI 586 Introduction to Learning Analytics. For those new to R and RStudio, however, tutorials will be provided.
Number of Credits: 3
Meeting Time: This distance education course is predominantly asynchronous. Online tools are utilized throughout the course for communication and interaction. In addition, we will use Zoom for synchronous virtual office hours, web conferencing, or whole class discussions. For optional live class meet-ups, I will send out a poll the first week of class to find a time that works for the majority of students and record these meetings for students who are not able to attend.
Virtual Class Locations: All course materials and activities can be accessed online through NC State’s Moodle course management platform. Access http://wolfware.ncsu.edu/ and log-in with your Unity ID and password. After logging-in, locate and click on ECI 588 Text Mining in Education to access the course site.
Students must have Internet access and access to a Web browser (e.g., Safari, Firefox, Chrome) to participate in this course. The Moodle course site and Web-based software required for completing course projects may only be accessed online. It is strongly recommended that students have high-speed Internet access.
Name: Dr. Shaun Kellogg
Email: shaun.kellogg\@ncsu.edu
Office: Friday Institute for Educational Innovation (Room 223)
Phone: (919) 513-8563
Hours: Appointments by Calendly Monday-Friday 8:00-4:00
Social: LinkedIn | GitHub
There are several required textbooks for this course, all of which are freely available online or through the NCSU Library. Supplemental course readings and content (e.g. articles, videos) will also be provided at no cost through the Moodle course site. You will also be asked to locate articles of interest for our discussions and I highly recommend that you link Google Scholar to the NCSU Library: https://www.lib.ncsu.edu/articles/google-scholar.
Students should feel comfortable installing new software programs and navigating unfamiliar graphical user interfaces. It is also recommended that students in this class have some background knowledge of online learning environments (e.g. LMS, MOOCs, etc.).
Hands-on data analysis tutorials and experiences will make extensive use of R and R Studio. You will access Learning Analytics Case Study activities through Posit Cloud and will complete R tutorials to assist with these activities through DataCamp.
Posit Cloud (https://posit.co/products/cloud/cloud/)){.uri} provides access to Posit’s powerful set of data science tools, including:
RStudio (https://posit.co/products/open-source/rstudio), an integrated development environment (IDE) for R and Python that includes a console and syntax-highlighting editor, as well as tools for plotting, history, debugging, and workspace management.
Quarto (https://quarto.org), an open-source scientific and technical publishing system used for creating reproducible, production quality articles, presentations, dashboards, websites, blogs, and books in HTML, PDF, MS Word, ePub, and more.
Register for a free Posit Cloud account at: https://login.posit.cloud/register.
Following registration, you can access our ECI 588: Text Mining in Education Posit workspace at: https://posit.cloud/spaces/597706/join?access_code=weMxPiHe3BsKRsQOtUOnlD9NwbNG3L1G7NsuBanV
Quarto Pub (https://quartopub.com) is a free and easy web publishing platform for Quarto docs. To publish the documents via Quarto Pub, however, you will first need to create an account.
DataCamp (https://www.datacamp.com) is an online learning platform with a large catalog of video tutorials, coding activities, assessments and certifications for R, Python, data science, statistics & more.
Register for a free account using your \@ncsu.edu email address at: https://www.datacamp.com/users/sign_up
After registering with your NC State email, click the following link to access our ECI 588: Text Mining in Education datacamp group.
While we will be using RStudio through Posit Cloud for this course, I eventually recommend shifting to RStudio Desktop if you plan to use R and RStudio beyond this course.
R (https://www.r-project.org) is an open-source language and computing environment for data manipulation, analysis, and visualization. Installation files for Windows, Mac, and Linux can be found at the website for the Comprehensive R Archive Network: http://cran.r-project.org/{.uri}.
RStudio Desktop (https://posit.co/products/open-source/rstudio)){.uri} is a free desktop application of the RStudio IDE for R that runs on Windows, MacOS or Linux. Its advantages over Posit Cloud include offline access, full control over the local environment, customization options, data privacy and security, and the ability to leverage the full resources of your local machine, such as CPU, memory, and storage.
R Pubs (https://rpubs.com)){.uri} is a free and easy web publishing platform for R and alternative to Quarto Pubs. To publish these documents via RPubs, however, you will first need to create an account here: https://rpubs.com/users/new.
Posit Recipes (https://posit.cloud/learn/recipes) provide a collection of R code snippets and instructions featuring up-to-date best practices for coding in R.
Posit Cheat Sheets (https://posit.co/resources/cheatsheets) also provide handy printable reference sheets to commonly used packages and their essential functions, including example code for testing them out. also provide handy reference to commonly used packages and their essential functions, including example code for testing them out.
LinkedIn Learning (https://www.linkedin.com/learning) offers tutorials and training courses on R, R Studio, and more. LinkedIn Learning is available at no charge to students.
ChatGPT (https://openai.com/chatgpt/) is an advanced AI language model developed by OpenAI and is highly recommended for this course. It’s and excellent tool for learning analytics and can assist by explaining concepts, helping with code, offering analysis guidance, and troubleshooting, though it’s important to verify its advice against reliable sources.
Use with or without registering for an account at: https://chatgpt.com
Note: Please review the acceptable use policy below for appropriate uses of generative AI tools like ChatGPT
Git is a free and open source distributed version control system. Jenny Bryan’s very thorough installation and R Studio set up process for Mac and Windows can be found here: http://happygitwithr.com.
GitHub is a web-based hosting service for version control using Git. You can create an account here: https://github.com
Goals for the Text Mining in Education course are guided by the North Carolina State University motto: Think and Do. Specifically, goals for this course are twofold:
Disciplinary Knowledge. Students will deepen their understanding of text mining as an emerging approach within the field of Learning Analytics, including its application in a wide range of education settings.
Technical Skills. Scholars will develop proficiency with the processes, tools, and techniques necessary to efficiently, effectively, and ethically apply text mining to understand and improve learning and the contexts in which learning occurs.
The following learning objectives are aligned with the overarching learning objectives of the Graduate Certificate in Learning Analytics program and are embedded in each unit of the course. Student who complete this course will be able to:
Conceptual Foundations: Describe key text mining concepts, terminology, legal and ethics issues and how text mining has been applied to address important problems, questions, and issues in education;
Data Sources & Measures: Identify and appropriately use education data sources for text analysis (e.g. student essays, online discussion forums, etc.) and associated measures (e.g. term frequency, tf-idf, etc.);
Tool Proficiency: Efficiently and effectively apply up-to-date software and tools (e.g., R and Quarto) to implement text mining workflows for preparing, analyzing, and sharing data;
Processes & Techniques: Understand and apply text mining approaches and techniques (e.g. text preprocessing, sentiment analysis, topic modeling) in order to understand and improve learning and the contexts in which learning occurs; and,
Communication: Clearly communicate methods, analyses, findings, and recommendations that can provide actionable insight into learning contexts for a range of education stakeholders.
This course is divided into four Units introducing common and progressively advanced text mining techniques, including their applications in educational contexts. Each unit last three weeks. The first week introduces terminology, core concepts, and applications of Text Mining through course Readings & Discussion. The second week of each unit focuses on developing the technical skills necessary for exploratory analysis through a guided Case Study that illustrates a text mining analytic workflow in R using data from an educational context. During the third week, students in the GLCA program will apply these skills by conducting a simple, self-directed Independent Analysis while students new to R will further develop their fluency with text mining techniques through datacamp R Tutorials. In lieu of a final exam, students will develop a Data Product (e.g. report, presentation, data dashboard, etc.) that demonstrates their ability to independently analyze text as data.
| Schedule | Topics |
|---|---|
| WELCOME | OVERVIEW & INTRODUCTIONS |
| Week 1 | Introductions, syllabus review, and software setup |
| UNIT 1 | TEXT AS DATA |
| Week 2 | Readings & Discussion: Introduction to text mining data sources, basic concepts and applications in educational contexts. |
| Week 3 | Case Study: An introduction to “tidy” text, performing basic word counts, and examining words unique to specific document groups. |
| Week 4 | Independent Analysis (GCLA): A demonstration of your ability to tokenize text, perform basic analyses, and summarize findings. R Tutorials (R Beginners): Datacamp tutorials on text mining basics like preprocessing text and producing word counts. |
| UNIT 2 | DICTIONARY-BASED METHODS |
| Week 5 | Readings & Discussion: Introduction to dictionary-based methods and sentiment anlysis, including applications in education contexts. |
| Week 6 | Case Study: Introduction to dictionary-based methods in R with an emphasis on sentiment analysis using three common lexicons. |
| Week 7 | Independent Analysis (GCLA): A demonstration of your ability to conduct a basic sentiment analysis and summarize findings. Datacamp Tutorials (R Beginners): Completion of datacamp tutorials on sentiment analysis. |
| UNIT 3 | TOPIC MODELING |
| Week 8 | Readings & Discussion: Introduction to “bag-of-words” and topic modeling techniques, including applications in educational contexts. |
| Week 9 | Case Study: An introduction to Topic Modeling with R, an automated approach to classifying texts or documents. |
| Week 10 | Independent Analysis (GCLA): A demonstration of your ability to conduct basic topic modeling and summarize findings. Datacamp Tutorials (R Beginners): Completion of datacamp tutorials on sentiment analysis. |
| UNIT 4 | TEXT AS NETWORKS |
| Week 11 | Readings & Discussion: Introduction to analyzing text from a network perspective, including applications in education research. |
| Week 12 | Case Study: An introduction to text networks and epistemic network analysis with R. |
| Week 13 | Independent Analysis (GCLA): A demonstration of your ability to analyze text as networks and summarize findings. Datacamp Tutorials (R Beginners): Completion of datacamp tutorials on text networks. |
| FINAL EXAM | PROJECT PLANNING & DATA PRODUCT |
| Week 14 | Project Planning: Planning week to identify a text-based data set for analysis and conceptualize how you will analyze and share findings from your analysis. |
| Week 15 | Data Product: In lieu of a final exam, students will develop a data product (e.g. report, presentation, data dashboard, etc.) that demonstrates your ability to independently analyze text as data. |
Housekeeping (6 pts): Students will review the syllabus and respond to the welcome announcement sent at the beginning of the course, post a brief introduction of themselves and respond to their peers. Student will also be required to access the required course texts and software. For students who have not completed ECI 586: Intro to Learning Analytics, they will be also be required complete some introductory R tutorials via Datacamp.
Reading & Discussion (24 pts): The first week of each unit introduces terminology, core concepts, and applications of an analytical approach through readings, course videos, and discussion. To help guide discussions, students are provided a set of essential questions to address and are also encouraged to explore their own areas of interest. The primary goal of course readings and discussion is to foster a deeper understanding of how Learning Analytics has been applied in educational contexts.
R Tutorials (24 pts): The second week of each unit, consists of tutorials for working with R packages and functions used import, wrangle, explore, and model data. The primary goal of these tutorials is to support familiarity and fluency with R syntax and key functions for data analysis.
Case Studies (24 pts): In the third week of each unit, students will complete an interactive “case study” demonstrating how key data-intensive research workflow processes (i.e., wrangling, visualizing, summarizing, modeling, and communicating data) featured in exemplary education research studies are implemented in R. Coding case studies also provide a holistic setting to explore important foundational LA topics integral to data analysis such as reproducible research, use of APIs, ethical considerations, diversity and inclusion, and creation of useful data products.
Final Project (22 pts): In lieu of a final exam, students will conduct an independent analysis using a data source of their choosing and create a “data product” (e.g. report, presentation, data dashboard, etc.) demonstrating the knowledge and skills gained throughout the semester.
Grading Scale: The grading scale is based on 100 points:
A+ (97-100), A (94-96), A- (90-93), B+ (87-89), B (84-86), B- (80-83)
C+ (77-79), C (74-76), C- (70-73), D+ (67-69), D (64-66), D- (60-63), F (59 or less)
Late work is accepted but may be penalized at 15% per week it is late. Assignments submitted by the due date, however, may be revised and resubmitted for a higher grade by the following week. Students experiencing unforeseen circumstances with a resulting excused absence (e.g., family medical emergency) are allowed to make up work without penalty.
Course Feedback Expectations: Please contact your instructor via email (shaun.kellogg\@ncsu.edu) with any questions about the course project or other assignments. Your instructor will strive to answer any emails within 24 hours (M-F) and 48 hours on the weekend, and grade submitted assignments within 5-7 days of the due date. In addition, students will be provided ongoing opportunities, and are strongly encouraged, to provide course feedback to help improve the design of current and future course implementations.
Learning new research methods and especially learning a programming language like R (or any new language for that matter) will inevitably be a bit frustrating at first. Even experienced R developers like Hadley Wickham get frustrated:
“It’s easy when you start out programming to get really frustrated and think, ‘Oh it’s me, I’m really stupid,’ or, ‘I’m not made out to program.’ But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later.”
When feeling stuck or like banging your head against your desk, there are several options for seeking out help within and beyond this course:
Course Forums & Email: Including this general software troubleshooting forum, we will have forums for each assignment. You’ll likely have similar questions as your peers, and you’ll likely be able to answer other peoples’ questions too so I encourage you to use these forums. Unlike most of my apps and social media accounts, I actually have notifications enabled. Also, do not hesitate to email me directly as well.
Learning Analytics Office Hours: Some issues are much easier to troubleshoot in real time. To schedule a 1 on 1 meeting with me in person of via Zoom, use the following Calendly link: https://calendly.com/sbkellogg/analytics
ChatGPT (https://openai.com/chatgpt/) is an advanced AI language model developed by OpenAI and students are encouraged to explore it as a learning tool for this course. It’s and excellent tool for helping to explaining concepts; interpreting, writing, and troubleshooting code; summarizing and dratfting text; and offering guidance on data analysis. However, it’s important to recognize that it can frequently be wrong and produce very plausible but misleading or inaccurate results. See policies on using generative AI for this course in the next section.
NCSU Library Services: The Data & Visualization group is an incredible asset for NC State students. Though the library’s website, you can access and enroll in workshops, find resources, and chat or schedule a Zoom appointment to get R help.
Social Media: If you use Twitter, you can also post R-related questions and content with the #rstats hashtag. One of the things I most value about the R in general is that the R community is exceptionally helpful.
The Interwebs: Aside from Google of course, StackOverflow and the RStudio Community will likely become tried and true tools in your text mining toolkit. In fact, I’d wager that the majority of Google searcher will likely direct you to one of these two sites. Note that when search Google, it sometimes helps to include “rstats” in your query.
ChatGPT and other generative artificial intelligence (GenAI) tools have been growing rapidly. ChatGPT, and most of instances of GenAI, are large language models (LLM)s that has been trained on billions or trillions of pages of information (articles, books, or parts of the Internet), and are very good at guessing a likely logical answer to any question or text based prompt. However, it’s not truly “thinking” as it gives each answer. Like a precocious toddler, GenAI can mimic words but has no clue what they are actually saying. Even worse, the system - like toddlers - may outright invent (“hallucinate”) facts, names, quotes, and book/journal/article titles and sound confident as it does.
In this course, you are welcome to collaborate with GenAI systems on projects or assignments, but remember that the primary goal is to enhance your understanding of course materials and concepts. It is crucial to consider the ethical implications of AI technologies and avoid over reliance on AI to complete tasks without actively engaging with the course content. Ensure that your use of AI aligns with NC State’s ethical guidelines, respects privacy, avoids bias, and promotes fairness and transparency (See NC State Policies section below).
Below are the acceptable uses of AI in this course:
Course Readings & Discussion: You are encouraged to leverage AI-based tools and platforms to enhance your understanding of our course readings. These tools can provide explanations and examples of key concepts introduced in our readings, summaries of key points, and even recommendations for further readings or resources.
Case Studies: You are welcome to utilize GenAI tools to assist in generating, interpreting, and troubleshooting code and its output. However, it is essential to ensure that you understand the underlying code and interpret the results critically.
Independent Analyses & Final Project: You may use GenAI in support of independent analyses and your final course project. GenAI can be useful for brainstorming ideas, offering a different perspective or provide constructive feedback, organize content ideas into a cohesive structure, proofreading or summarizing content. However, it is important to maintain your creative input and provide proper attribution if AI-generated content is used in your work.
While I encourage the integration of GenAI in your learning experience, there are certain uses that are not permitted in this course:
Course Readings & Discussion: Please to do NOT use GenAI tools to craft written responses to discussion prompts. Using GenAI to generate a response to each unit’s discussion questions will not help you think through the materials. Although the text it generates may sound plausible, there’s an official philosophical term for this kind of writing: bulls**t, i.e. ““speech or text produced without concern for its truth” (Hicks, Humphries, and Slater 2024).
Case Studies: Although you are welcome to utilize GenAI tools to assist in generating, interpreting, and troubleshooting code and its output. However, using GenAI without ensuring that you understand the underlying code and interpreting the results critically is actually really detrimental to learning, especially if you just copy/paste directly from what it spits out. In addition, please do NOT use GenAI to generate responses to open-ended questions and prompts.
Independent Analyses & Final Project: While you You may use GenAI for independent analyses and your final course project, AI tools are not encrypted or private. Do not enter proprietary data or personal information about study participants. Also, GenAI tools should NOT be used to create entire pieces of written content. It can be used for tasks such as brainstorming, drafting headlines or assisting with code, but fully AI-generated content is prohibited at this time.
Academic Integrity: Students are bound by the academic integrity policy as stated in the code of student conduct. Therefore, students are required to uphold the university pledge of honor and exercise honesty in completing any assignment. See the website for a full explanation: http://www.ncsu.edu/policies/student_services/student_discipline/POL11.35.1.php
N.C. State University Policies, Regulations, and Rules (PRR): Students are responsible for reviewing the PRRs which pertain to their course rights and responsibilities. These include:
http://policies.ncsu.edu/policy/pol-04-25-05 (Equal Opportunity and Non-Discrimination Policy Statement),
http://oied.ncsu.edu/oied/policies.php (Office for Institutional Equity and Diversity),
http://policies.ncsu.edu/policy/pol-1135-01 (Code of Student Conduct), and
http://policies.ncsu.edu/regulation/reg-02-50-03 (Grades and Grade Point Average).
University Non-Discrimination Policies: It is the policy of the State of North Carolina to provide equality of opportunity in education and employment for all students and employees. Accordingly, the university does not practice or condone unlawful discrimination in any form against students, employees or applicants on the grounds of race, color, religion, creed, sex, national origin, age, disability, or veteran status. In addition, North Carolina State University regards discrimination based on sexual orientation to be inconsistent with its goal of providing a welcoming environment in which all its students, faculty, and staff may learn and work up to their full potential.
Disabilities: Reasonable accommodations will be made for students with verifiable disabilities. In order to take advantage of available accommodations, students must register with the Disability Resource Office at Holmes Hall, Suite 304, Campus Box 7509, 919-515-7653. For more information on NC State’s policy on working with students with disabilities, please see the Academic Accommodations for Students with Disabilities Regulation (REG02.20.01).
Keep Learning: https://dasa.ncsu.edu/students/keep-learning/
Protect the Pack FAQs: https://www.ncsu.edu/coronavirus/frequently-asked-question
NC State Protect the Pack Resources for Students: Resources for Students | Protect the Pack
NC State Keep Learning, tips for students opting to take courses remotely: Keep Learning Tips for Remote Learning
Introduction to Zoom for students: https://youtu.be/5LbPzzPbYEw
Learning with Moodle, a student’s guide to using Moodle: https://moodle-projects.wolfware.ncsu.edu/course/view.php?id=226
NC State Libraries Technology Lending Program
Bail, C. (2018). Strengths and weaknesses of text as data. Retrieved from https://cbail.github.io/textasdata/strengths- weaknesses/rmarkdown/Strengths_and_Weaknesses.html
Gupta, V., & Lehal, G. S. G. S. (2009). A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence, 1(1), 60–76. <doi:10.4304/jetwi.1.1.60-76> 3. 4.
Hotho, A., Nürnberger, A., & Forum, G. P. L. (2005). A brief survey of text mining. JISC. (2008).
Text mining: Briefing paper. Retrieved from http://www.jisc.ac.uk/publications/briefingpapers/2008/bptextminingv2.aspx 5.
Kwartler, T. (2016). R tutorial: What is text mining? Retrieved from https://youtu.be/mmz95b1k0J0
Kwartler, T. (2016). Cleaning and preprocessing text. Retrieved from https://youtu.be/3putwMZpt1E
Prato, S. (2013). What is text mining? Retrieved from http://infospace.ischool.syr.edu/2013/04/23/what-is-text-mining/
Galyardt, A., Aleahmad, T., Fienberg, S., Junker, B., & Hargadon, S. (2009). Analysis of a Web- based Network of Educators, 1–31. Retrieved from http://www.stat.cmu.edu/tr/tr878/tr878.pdf
Abdous, M., & He, W. (2011). Using text mining to uncover students’ technology-related problems in live video streaming. British Journal of Educational Technology, 42(1), 40–49. <doi:10.1111/j.1467-8535.2009.00980.x>
Leong, C. K., Lee, Y. H., & Mak, W. K. (2012). Mining sentiments in SMS texts for teaching evaluation. Expert Systems with Applications, 39(3), 2584–2589. <doi:10.1016/j.eswa.2011.08.113>
Cheon, J., Lee, S., Smith, W., Song, J., & Kim, Y. (2013). The Determination of children’s knowledge of global lunar patterns from online essays using text mining analysis. Research in Science Education, 43(2), 667–686. <doi:10.1007/s11165-012-9282-5>
Anderson, T., Upton, L., Dron, J., Malone, J., & Poelhuber, B. (2015). Social Interaction in self- paced distance education. Open Praxis, 7(1), 7–23.
Ai, H., Sionti, M., Wang, Y.-C., & Rose, C. P. (2010). Finding transactive contributions in whole group classroom discussions. In Proceedings of the 2010 International Conference of the Learning Sciences (Vol. 1, pp. 976–983).
Ezen-Can, A., Kellogg, S., Boyer, K. E., & Booth, S. (2015). Unsupervised modeling for understanding MOOC discussion forums: A learning analytics approach. In LAK15: 5th International Conference on Learning Analytics & Knowledge
Rosé, C. P. (2014). Exploration of student attitudes in MOOCs. Retrieved March 18, 2015, from https://youtu.be/LFT5IhiPZB8
Wen, M., Yang, D., & Rosé, C. (2014). Sentiment analysis in MOOC discussion forums: What does it tell us? Proceedings of Educational Data Mining, (Edm), 1–8. Retrieved from http://www.cs.cmu.edu/~mwen/papers/edm2014-camera-ready.pdf
Gtech.edu Edu Research Group. (2011). Educational text mining
Straumsheim, Carl. (2016). Detecting more than plagiarism. Inside Higher Ed. Retrieved from https://www.insidehighered.com/news/2016/01/21/turnitin-expanding-beyond-plagiarism- detection-launches-revision-assistant
Turnitin. (2014). Turnitin Revision Assistant results from the classroom : Pilot study review. 6
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(03), 267–297. http://doi.org/10.1093/pan/mps028
Bail, C. (n.d.). Dictionary-based text analysis in r. Retrieved from https://cbail.github.io/SICSS_Dictionary-Based_Text_Analysis.html#dictionary-based- quantitative-text-analysis
Bail, C. (2017). SICSS 2017 - dictionary-based text analysis. Retrieved from https://youtu.be/4xv1ccEUleA
Gupta, S. (2018). Reasons to replace dictionary based text mining with machine learning techniques. Retrieved from https://hackernoon.com/reasons-to-replace-dictionary-based-text- mining-with-machine-learning-techniques-27537835e1bf
MonkeyLearn. (n.d.). Sentiment analysis: Nearly Everything you need to know. Retrieved February 3, 2019, from https://monkeylearn.com/sentiment-analysis
Berkowitz, R. (2017). Introduction to sentiment analysis. Retrieved from https://youtu.be/65RP29Jll80
Raval, S. (2019). Sentiment analysis - Data Lit #1. Retrieved from https://youtu.be/3Pzni2yfGUQ?t=92
LIWC. (n.d.). LIWC: How it works. Retrieved February 3, 2019, from https://liwc.wpengine.com/how-it-works/
Tausczik, Y. R., & Pennebaker, J. W. (2009). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. http://doi.org/10.1177/0261927x09351676
Johar, I. (2016). LIWC Text Analysis Tutorial. Retrieved from https://youtu.be/lzuMukv8Ql4
Brett, M. R. (2012). Topic modeling: A basic introduction. Journal of digital humanities, 2(1), 2-1. https://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/
Knispelis, A. (2016). LDA Topic Models [Video]. YouTube. https://www.youtube.com/watch?v=3mHy4OSyRf0
The “Secret” Recipe for Topic Modeling Themes Matthew Jocker’s blog post highlights the importance of preprocessing text and provides some very practical guidelines for topic modeling.
Probabilistic Topic Models
Article by David Blei explains some of the basic concepts of topic modeling, including some underlying math and some great visuals.
Finding structure in xkcd comics with Latent Dirichlet Allocation
Quick intro and fun example of applying to LDA to a favorite comic of mine
The LDA Buffet is Now Open
Short, whimsical blog post by Matthew Jocker explaining LDA for English Marjors Topic Modeling and Figurative Language Lisa M. Rhody explores the productive failure of topic modeling.
Choose your own! Google Scholar or Duck Duck Go
“Twitter Archeology” of Learning Analytics and Knowledge
Conferences Paper exploring the conference tweets through multiple methods including: topic modeling, and descriptive, network, and hashtag analysis.
Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses
This paper introduces the Structural Topic Model with applications to self-reported students’ motivations, identifying discussion themes, and patterns of feedback in course evaluations.
Unsupervised Modeling for Understanding MOOC Discussion Forums
Paper exploring three different approaches to text classification: manual coding, LDA, and the k-medoids clustering algorithm
Using a Learner-Topic Model for Mining Learner Interests in Open Learning Environments
A study that applies topic modeling to automatically discover learner interests in open learning environments.
Coming soon!