Data Mining CSC533 Notes - Data Mining - CSC533

Hands-On Exercise 6-1: Outlier Detection with Titanic dataset
Exam (elaborations) • 7 pages • 2024

Hands-On Exercise 6-1: Outlier Detection with Titanic dataset In this Hands-on exercise, you will learn. • How to use quantiles to detect the outliers in data (the Titanic Training dataset) Related DM Book Chapters/Sections: • Section 2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Related Hands-on Exercises: • Exercise 1-2 Apache Spark and Basic Statistics Finish the assignments shown below. Submit a word document (...

(0)
$10.49
+ learn more

Preview 2 out of 7 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

Hands-On Exercise 6-1: Outlier Detection with Titanic dataset In this Hands-on exercise, you will learn. • How to use quantiles to detect the outliers in data (the Titanic Training dataset) Related DM Book Chapters/Sections: • Section 2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range Related Hands-on Exercises: • Exercise 1-2 Apache Spark and Basic Statistics Finish the assignments shown below. Submit a word document (...

$10.49

Add to cart

Show more info

Hands-On Experiment 5-2: Clustering with Spark - Part II
Exam (elaborations) • 5 pages • 2024

Hands-On Experiment 5-2: Clustering with Spark - Part II

(0)
$10.49
+ learn more

Hands-On Experiment 5-1: Clustering with Spark
Exam (elaborations) • 4 pages • 2024

Hands-On Experiment 5-1: Clustering with Spark In this Hands-on exercise, you will learn. • How to use the k-means clustering algorithm in Apache Spark • How to handle data and features for clustering • Training and prediction for clustering • Evaluation for clustering Related DM Book Chapters/Sections: • Section 10.1 Cluster Analysis • Section 10.2 Partitioning Methods • Section 10.2.1 k-Means: A Centroid-Based Technique Submit a word document (or PDF) with answers/expl...

(0)
$10.49
+ learn more

Preview 1 out of 4 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

Hands-On Experiment 5-1: Clustering with Spark In this Hands-on exercise, you will learn. • How to use the k-means clustering algorithm in Apache Spark • How to handle data and features for clustering • Training and prediction for clustering • Evaluation for clustering Related DM Book Chapters/Sections: • Section 10.1 Cluster Analysis • Section 10.2 Partitioning Methods • Section 10.2.1 k-Means: A Centroid-Based Technique Submit a word document (or PDF) with answers/expl...

$10.49

Add to cart

Show more info

Hands-On Experiment 4-2: Classification with Titanic dataset
Exam (elaborations) • 4 pages • 2024

Hands-On Experiment 4-2: Classification with Titanic dataset 2.2.1 (20pts) Assignment 1: Index the Gender values We have learned how to index values using StringIndexer in previous hands-on exercises • Write codes for indexing the gender values 1. Import a Class 2. Define an indexer – Input column: Gender – Output column: IndexedGender 3. Train and transform • Take a screenshot of running your codes and outputs using the show (5) function 3 Building a Model 3.1 Training and T...

(0)
$10.49
+ learn more

Preview 1 out of 4 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

Hands-On Experiment 4-2: Classification with Titanic dataset 2.2.1 (20pts) Assignment 1: Index the Gender values We have learned how to index values using StringIndexer in previous hands-on exercises • Write codes for indexing the gender values 1. Import a Class 2. Define an indexer – Input column: Gender – Output column: IndexedGender 3. Train and transform • Take a screenshot of running your codes and outputs using the show (5) function 3 Building a Model 3.1 Training and T...

$10.49

Add to cart

Show more info

Hands-On Experiment 4-1: Classification with Spark
Exam (elaborations) • 7 pages • 2024

Hands-On Experiment 4-1: Classification with Spark In this Hands-on exercise, you will learn • Decision Tree classifier in Apache Spark • How to handle data, features, and training & testing data • Training & Testing • Evaluation Related DM Book Chapters/Sections: • Section 8.1 Basic Concepts • Section 8.2 Decision Tree DataFrame-based Spark ML is new, much easier, and better. However, some features are missing. The evaluator for DataFrame provides limited metrics only. Th...

(0)
$10.49
+ learn more

Preview 2 out of 7 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

Hands-On Experiment 4-1: Classification with Spark In this Hands-on exercise, you will learn • Decision Tree classifier in Apache Spark • How to handle data, features, and training & testing data • Training & Testing • Evaluation Related DM Book Chapters/Sections: • Section 8.1 Basic Concepts • Section 8.2 Decision Tree DataFrame-based Spark ML is new, much easier, and better. However, some features are missing. The evaluator for DataFrame provides limited metrics only. Th...

$10.49

Add to cart

Show more info

Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II
Exam (elaborations) • 4 pages • 2024

Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II 1.3 Create DataFrames You can create your DataFrames using Assignment 1 1. Write spark codes to read the following data. (a) Only read the following four tables that will be used for this exercise i. orders ii. products iii. departments iv. order_products_train (b) Make sure that you read the “headers” as well i. Each CSV file of the dataset has a header line. ii. You can achieve this behavior by Assignment ...

(0)
$10.49
+ learn more

Preview 1 out of 4 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II 1.3 Create DataFrames You can create your DataFrames using Assignment 1 1. Write spark codes to read the following data. (a) Only read the following four tables that will be used for this exercise i. orders ii. products iii. departments iv. order_products_train (b) Make sure that you read the “headers” as well i. Each CSV file of the dataset has a header line. ii. You can achieve this behavior by Assignment ...

$10.49

Add to cart

Show more info

Hands-On Experiment 3-1: Frequent Pattern Mining with Spark

Hands-On Experiment 3-1: Frequent Pattern Mining with Spark
Exam (elaborations) • 6 pages • 2024

2.4 Let’s try to practice answering some exercise questions Q1: List 3 most frequent itemsets of size 1. Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 Q3: Colby is purchased most frequently with what other product? Q4: What is the confidence for the rule: American → Cheddar 3 Submission: Find frequent patterns using FPGrowth from a real-world grocery store dataset Please read the related news article “Kroger Knows Your Shopping Patterns B...

(0)
$10.49
+ learn more

Preview 2 out of 6 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

2.4 Let’s try to practice answering some exercise questions Q1: List 3 most frequent itemsets of size 1. Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 Q3: Colby is purchased most frequently with what other product? Q4: What is the confidence for the rule: American → Cheddar 3 Submission: Find frequent patterns using FPGrowth from a real-world grocery store dataset Please read the related news article “Kroger Knows Your Shopping Patterns B...

$10.49

Add to cart

Show more info

Hands-On Experiment 2-2: Data Warehousing with Hive

Hands-On Experiment 2-2: Data Warehousing with Hive
Exam (elaborations) • 78 pages • 2024

Objectives In this Hands-on exercise, you will learn 1. Practice PySpark SQL for data analytics. 2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE in PySpark. 3. Analyzing Driver Risk factor 4. Analyzing data using Data Warehousing/OLAP functions in Hive Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it rollup-query). Run it, check the results, and explain the differences. – Replace the GROUPING SETS ...

(0)
$10.49
+ learn more

Preview 4 out of 78 pages

Add to cart

Exam (elaborations)

(0)

Last document update: ago

Objectives In this Hands-on exercise, you will learn 1. Practice PySpark SQL for data analytics. 2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE in PySpark. 3. Analyzing Driver Risk factor 4. Analyzing data using Data Warehousing/OLAP functions in Hive Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it rollup-query). Run it, check the results, and explain the differences. – Replace the GROUPING SETS ...

$10.49

Add to cart

Show more info

You searched for: Data Mining CSC533

Data Mining CSC533 Study guides, Class notes & Summaries

Hands-On Exercise 6-1: Outlier Detection with Titanic dataset

Hands-On Experiment 5-2: Clustering with Spark - Part II

Hands-On Experiment 5-1: Clustering with Spark

Hands-On Experiment 4-2: Classification with Titanic dataset

Hands-On Experiment 4-1: Classification with Spark

Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II

Hands-On Experiment 3-1: Frequent Pattern Mining with Spark

Hands-On Experiment 2-2: Data Warehousing with Hive