Data Mining CSC533 Study guides, Class notes & Summaries
Looking for the best study guides, study notes and summaries about Data Mining CSC533? On this page you'll find 8 study documents about Data Mining CSC533.
All 8 results
Sort by
-
Hands-On Exercise 6-1: Outlier Detection with Titanic dataset
- Exam (elaborations) • 7 pages • 2024
-
- $10.49
- + learn more
Hands-On Exercise 6-1: 
Outlier Detection with Titanic dataset 
In this Hands-on exercise, you will learn. 
• How to use quantiles to detect the outliers in data (the Titanic Training dataset) 
Related DM Book Chapters/Sections: 
• Section 2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard 
Deviation, and Interquartile Range 
Related Hands-on Exercises: 
• Exercise 1-2 Apache Spark and Basic Statistics 
Finish the assignments shown below. Submit a word document (...
-
Hands-On Experiment 5-2: Clustering with Spark - Part II
- Exam (elaborations) • 5 pages • 2024
-
- $10.49
- + learn more
Hands-On Experiment 5-2: 
Clustering with Spark - Part II
-
Hands-On Experiment 5-1: Clustering with Spark
- Exam (elaborations) • 4 pages • 2024
-
- $10.49
- + learn more
Hands-On Experiment 5-1: 
Clustering with Spark 
In this Hands-on exercise, you will learn. 
• How to use the k-means clustering algorithm in Apache Spark 
• How to handle data and features for clustering 
• Training and prediction for clustering 
• Evaluation for clustering 
Related DM Book Chapters/Sections: 
• Section 10.1 Cluster Analysis 
• Section 10.2 Partitioning Methods 
• Section 10.2.1 k-Means: A Centroid-Based Technique 
Submit a word document (or PDF) with answers/expl...
-
Hands-On Experiment 4-2: Classification with Titanic dataset
- Exam (elaborations) • 4 pages • 2024
-
- $10.49
- + learn more
Hands-On Experiment 4-2: 
Classification with Titanic dataset 
2.2.1 (20pts) Assignment 1: Index the Gender values 
We have learned how to index values using StringIndexer in previous hands-on exercises 
• Write codes for indexing the gender values 
1. Import a Class 
2. Define an indexer 
– Input column: Gender 
– Output column: IndexedGender 
3. Train and transform 
• Take a screenshot of running your codes and outputs using the show (5) function 
3 Building a Model 
3.1 Training and T...
-
Hands-On Experiment 4-1: Classification with Spark
- Exam (elaborations) • 7 pages • 2024
-
- $10.49
- + learn more
Hands-On Experiment 4-1: 
Classification with Spark 
In this Hands-on exercise, you will learn 
• Decision Tree classifier in Apache Spark 
• How to handle data, features, and training & testing data 
• Training & Testing 
• Evaluation 
Related DM Book Chapters/Sections: 
• Section 8.1 Basic Concepts 
• Section 8.2 Decision Tree 
DataFrame-based Spark ML is new, much easier, and better. However, some features are missing. The 
evaluator for DataFrame provides limited metrics only. Th...
-
Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II
- Exam (elaborations) • 4 pages • 2024
-
- $10.49
- + learn more
Hands-On Experiment 3-2: 
Frequent Pattern Mining with Spark - Part II 
1.3 Create DataFrames 
You can create your DataFrames using 
Assignment 1 
1. Write spark codes to read the following data. 
(a) Only read the following four tables that will be used for this exercise 
i. orders 
ii. products 
iii. departments 
iv. order_products_train 
(b) Make sure that you read the “headers” as well 
i. Each CSV file of the dataset has a header line. 
ii. You can achieve this behavior by 
Assignment ...
-
Hands-On Experiment 3-1: Frequent Pattern Mining with Spark
- Exam (elaborations) • 6 pages • 2024
-
- $10.49
- + learn more
2.4 Let’s try to practice answering some exercise questions 
Q1: List 3 most frequent itemsets of size 1. 
Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 
Q3: Colby is purchased most frequently with what other product? 
Q4: What is the confidence for the rule: American → Cheddar 
3 Submission: Find frequent patterns using FPGrowth from a 
real-world grocery store dataset 
Please read the related news article “Kroger Knows Your Shopping Patterns B...
-
Hands-On Experiment 2-2: Data Warehousing with Hive
- Exam (elaborations) • 78 pages • 2024
-
- $10.49
- + learn more
Objectives 
In this Hands-on exercise, you will learn 
1. Practice PySpark SQL for data analytics. 
2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE 
in PySpark. 
3. Analyzing Driver Risk factor 
4. Analyzing data using Data Warehousing/OLAP functions in Hive 
 
Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it 
rollup-query). Run it, check the results, and explain the differences. 
– Replace the GROUPING SETS ...