SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
Objectives
and machine learning process.
data in big size.
corporate security analyst.
purposes.
SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
This assignment consists of a report worth 20 marks. Delays caused by student’s own computer downtime cannot be accepted as a valid reason for late submission without penalty. Students must plan their work to allow for both scheduled and unscheduled downtime.
Submission instructions: You must submit an electronic copy of all your assignment files via Cloud- Deakin. You must include both your report, source codes, necessary data files and optionally presentation file. Assignments will not be accepted through any other manner of submission. Students should note that email and paper based submissions will ordinarily be rejected.
Special requirements to prove the originality of your work: On-campus students (B and G) are required to demonstrate the execution of your classification programs in R to your tutor in Week 10; Cloud students are required to attach a 3-5 minutes Video presentation to demonstrate how your R codes are executed to derive the claimed results. The video should be uploaded to a cloud storage (You can find out how to upload a video from https://video.deakin.edu.au/.) Failure to do so will result a delayed assessment of your submission. SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
Late submissions: Submissions received after the due date are penalized at a rate of 5% (out of the full mark) per day, no exceptions. Late submission after 5 days would be penalized at a rate of 100% out of the full mark. Close of submissions on the due date and each day thereafter for penalties will occur at 05:00 pm Australian Eastern Time (UTC +10 hours). Students outside of Victoria should note that the normal time zone in Victoria is UTC+10 hours. No extension will be granted.SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
It is the student’s responsibility to ensure that they understand the submission instructions. If you have ANY difficulties ask the Lecturer/Tutor for assistance (prior to the submission date).
SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
Overview
The popularity of social media networks, such as Twitter, leads to an increasing number of spamming activities. Researchers employed various machine learning methods to detect Twitter spams. In this assignment, you are required to classify spam tweets by using provided datasets. The features have been extracted and clearly structured in JSON format. The extracted features can be categorized into two groups: user profile-based features and tweet content-based features as summarized in Table 1.
The provided training dataset and testing dataset are separately listed in Table 2 and Table 3. In testing dataset, we can find that the ratio of spam to non-spam is 1:1 in Dataset1, while the ratio is 1:19 in Dataset 2. In most of previous work, the testing datasets are nearly evenly distributed. However, in real world, there are only around 5% spam tweets in Twitter, which indicates that testing Dataset 2 simulates the real-world scenario. You are required to classify spam tweets, evaluate the classifiers’ performance and compare the Dataset 1 and Dataset 2 outcomes by conducting experiments.
SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
Twitter Spam Detection Work Flow
Problem Statement
This is an individual assessment task. Each student is required to submit a report of approximately 2,000-2,500 words along with exhibits to support findings with respect to the provided spam and non-spam messages. This report should consist of:
classification
To demonstrate your achievement of these goals, you must write a report of at least 2,000 words (2,500 words maximum). Your report should consist of the following chapters:
executive summaries from http://unilearning.uow.edu.au/report/4bi1.html.)
algorithms), the features used for classification, the performance evaluation metrics (at least 5 evaluation metrics), the brief summary of your findings, and the organization of the rest of your report. (You may find hints on features used for classification from Twitter Developer Documentation https://dev.twitter.com/overview/api )
specifically, your argument should explain why machine learning algorithms should be used rather than human readers. (Please read through the hints on this web page before writing this chapter http://www.uq.edu.au/student-services/learning/literature-review.))
SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018
Need Help in yourSIT 384 Assignment?
Proficient (above 80%) |
Average (60-79%) |
Satisfactory (50-59%) |
Below Expectation (0-50%) |
Score |
|
Scientific Writing in Introduction and Conclusion |
Use appropriate language and genre to extend the knowledge of a range of audiences. |
Use discipline-specific language and genres to address gaps of a self-selected audience. Apply innovatively the knowledge developed to a different context. |
Use some discipline-specific language and prescribed genre to demonstrate understanding from a stated perspective and for a specified audience. Apply to different contexts the knowledge developed. |
Fail to demonstrate understanding for lecturer/teacher as audience. Fail to apply to a similar context the knowledge developed. |
Out 0f 4 marks |
Literature Review |
Collect and record self- determined information from self-selected sources, choosing or devising an appropriate methodology with self- structured guidelines; Organize information using student- determined structures and management of processes; Generate questions/aims/hypoth eses based on literature |
Collect and record self- determined information/ data from self-selected sources, choosing an appropriate methodology based on structured guidelines; Organize information/data using student-determined structures, and manage the processes, within the parameters set by |
Collect and record required information/ data from self- selected sources using one of several prescribed methodologies; Organize information/data using recommended structures. Manage self-determined processes with multiple possible pathways; Respond to questions/tasks generated from a closed inquiry. |
Fail to collect required information or data from the prescribed source; Fail to organize information/data using |
Out of 4 marks |
Technical Demonstrati- on |
Provide fully explained screenshots with R script. Explain each step of the procedure of classification, and the performance results in details. The entire demo is clear, correct and covers all findings. |
Provide fully explained screenshots with R script. Explain each step of the procedure of classification, and the performance results. The entire demo is clear, but there are some mistakes. |
Provide screenshots with R script. Explain each step of the procedure of classification, and the performance results. But many parts of demo are not clear enough and/or contain major flows or mistakes. |
No screenshots and explanations provided. |
Out of 4 marks |
Performan- ce Evaluation |
Evaluate information/data and inquiry process rigorously based on the latest literature. |
Evaluate information/data and the inquiry process comprehensively developed within the scope of the given literature. Reflect insightfully to renew others’ processes. Construct and use one testing data set and two training data sets. 4 classifiers work correctly. 4 evaluation metrics apply to analyse the performance of classifiers. |
Evaluate information/data and reflect on the inquiry process based on the given literature. Use only one testing data set. Less than 4 classifiers work correctly. Less than 4 evaluation metrics apply to analyse the performance of classifiers. SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018 |
Fail to evaluate information/data and to reflect on inquiry process. Use one or no testing data set. Less than 2 classifiers work correctly. Less than 2 evaluation metrics apply to analyse the performance of classifiers. |
Out of 4 marks |
Reference |
More than 10 bibliographic items (all of them are academic papers and at least 1 item per classifier/ at least 1 item for per evaluation metrics) are correctly presented and inline citations are correctly used. |
More than 10 bibliographic items (most of them are academic papers and at least 1 items per classification) are presented, but there are a few errors. Inline citations are used but with a few errors. |
More than 10 bibliographic items (most of them are academic papers) are presented. Inline citations are often used incorrectly. SIT 384 Data Analytics for Cyber Security Assignment 2 Trimester 1 2018 |
Less than 10 bibliographic items are presented. Or there are more than 3 errors in the bibliographic list and inline citations. |
Out of 4 marks |