Back to Modules
IS450

Text Mining and Language Processing

1 CreditsTerm 2

Description

Given the dominance of text information over the Internet, mining high-quality information from text becomes increasingly critical. The actionable knowledge extracted from text data facilitates our life in a broad spectrum of areas, including business intelligence, information acquisition, behavior analysis and decision making process. In this course, we will cover important topics in ext mining including: document representation, text categorization and clustering, sentiment analysis, probabilistic topic models and text visualization. Text mining techniques adopt the models from research areas such as Statistics, NLP and Linguistics. We will also focus on basic natural language processing techniques, language parsing and analysis and evaluation techniques.

Requisites

Prerequisites: IS200/IS111/SMT111/CS101/COR-IS1704 - Pre-req

Co-requisites: None

Anti-requisites: None

Attributes

Department: SCIS

Course Level: Undergraduate

Tracks: IS/T4BS: Business Analytics Track

Areas: Advanced Business Technology Major Analytics Major Business Options Data Science and Analytics Electives Econ Major Rel/Econ Options Grad Req - Dig Tech/Data Ana (Intake 2024 onwards) IS Depth Electives IT Solution Development Electives Social Sciences/PLE Major-related Technology & Entrepreneurship

Learning Outcomes

1. To understand the vector representation of documents and apply cosine to measure similarity 2. To understand TF and IDF weighting and gain hands-on experience with vector space models 3. To understand how naïve Bayes classifier works for text classification 4. To gain some basic knowledge about other classification algorithms including linear classifiers and neural networks 5. To apply API for text classification and document clustering 6. To understand why topic modeling is useful and apply Gensim API to derive topics from a corpus 7. To understand the basic approaches to some typical problems in sentiment analysis and apply supervised approach for sentiment polarity classification 8. To gain some basic understanding of natural language processing 9. To understand Information Extraction (IE), techniques and its applications 10. To understand Named Entity Recognition (NER) and gain knowledge about the techniques for NER 11. To understand the definitions of accuracy, precision, recall and F-measure 12. To be aware of evaluation methods for text clustering 13. To understand advanced text analytics tasks like Text summarization and Question answering and apply techniques for such tasks To apply deep learning models and LLM models for text processing and mining tasks.

Graduate Learning Outcomes

Disciplinary Knowledge, Critical thinking & problem solving, Innovation and enterprising skills, Collaboration and leadership, Communication, Self-directed learning

Competencies

Data Analytics, Business Innovation, Pattern Recognition Systems, Research, Text Analytics and Processing