Skip to Main Content

Data Structures

Computer Science Department

Cyber Crime Investigation – 100 course points

This assignment will take MORE time to complete than the previous assignment. 

Start your assignment early! You need time to understand the assignment and to answer the many questions that will arise as you read the description and the code provided.

Refer to our Programming Assignments FAQ for instructions on how to install VSCode, how to use the command line and how to submit your assignments.

  • See this video on how to import the project into VSCode and how to submit into Autolab.

The assignment has two components:

  1. Coding (97 points) submitted through Autolab.
  2. Reflection (3 points) submitted through a form.
    • Submit the reflection AFTER you have completed the coding component.
    • Be sure to sign in with your RU credentials! (netid@scarletmail.rutgers.edu)
    • You cannot resubmit reflections but you can edit your responses before the deadline by clicking the Google Form link, signing in with their netid, and selecting “Edit your response”

Overview

Cyber Security is a vital branch of computer science which handles our systems, applications, and information. From the FBI website:

As people around the world become more reliant on information and communication technologies (ICTs), criminals are increasingly shifting online. In 2020 alone, the FBI estimated more than $4 billion was lost to cybercrime in the United States.

With the rapid increase in cyber crimes happening all across the globe, it is important for the relevant authorities to track these incidents and monitor hackers. We can use a Hash Table to efficiently manage large quantities of data with add/remove/search operations.
 
We will use a separate chaining Hash Table to implement our Hacker Database. This uses an array of Linked Lists, where each node contains a hacker. Then, we will use a hash function to decide in which list (array index) each hacker will be placed. This allows us to keep each list small on average, providing good runtime.

Implementation

Overview of files provided

  • The Hacker class allows you to create instances of Hackers; it stores information like the main name of the Hacker, a list of known aliases, and a list of Incidents this Hacker was involved in. Do not edit or submit to Autolab.
  • The CyberCrimeInvestigation class contains methods that manipulate a Hash table in order to create a Hacker Database and analyze its information. Edit the empty methods with your solution, but DO NOT edit the provided ones or the method signatures of any method. This is the file you submit.
  • The HNode class represents a linked list node; it contains a Hacker as its data and a HNode “next” which represents the next node in the list. Getters and setters are provided. Do not edit or submit to Autolab.
  • The Driver class visualizes the Hacker Database, and provides options for reading from a file, viewing Hacker information, deleting Hackers, merging Hackers, and viewing Hackers by Location/Num of Incidents.
  • StdIn and StdOut are libraries that handle input and output. Do not edit these classes.
  • Multiple input files are included; they store different hacker information and their incidents. You are welcome to create your own input files, as they will not be submitted to Autolab

CyberCrimeInvestigation.java

  • DO NOT add new import statements.
  • DO NOT change any of the method’s signatures.

Methods to be implemented by you:

initializeTable() (provided)

This method first initializes the hackerDirectory Hash Table to length 10. It then reads lines from an input file and calls the following two methods to create a Hacker object and insert it into the hackerDirectory Hash Table.

You are given input files containing different Hacker information as well as their incident information. Each hacker has 7 lines, formatted as follows.

For IPs and URLs, we used a one-way hash called MD5 in order to de-identify URLs and IPs that could be malicious. We aren’t working with real IPs and URLs to avoid links to malicious sites.

  • Hacker Name
  • IP Address Hash
  • Location
  • OS
  • Web Server
  • Date
  • URL Hash

Do not update this method, it has been implemented for you. IMPLEMENT the first two methods (createSingleHacker and addHacker).

1. readSingleHacker()

This method reads a the information for single incident for a single hacker and creates the corresponding objects.

Remember that the order you will read items in is as follows, and each is contained on one line:

  • Hacker Name (string that may contain spaces, use StdIn.readLine() to read)
  • IP Address Hash (a string)
  • Location (a string that may contain spaces)
  • OS (a string that may contain spaces)
  • Web Server (a string that may contain spaces)
  • Date (a string YYYY-MM-DD)
  • URL Hash (a string)

You can assume StdIn.setFile() has already been called, and simply call StdIn.readLine() once for each above bullet.

Create an Incident object, using the OS, Web Server, Date, Location, IP Address Hash, and URL Hash. Then, create a Hacker object using the name, and Incident. Remember that hackers have an ArrayList of incidents. 

Return that Hacker object.

Test this using initializeTable AFTER you have implemented the addHacker method. Use the debugger in VS Code to verify that items are being read and created correctly on each iteration of this method.

EARLY SUBMISSION CONSISTS OF THIS METHOD AND addHacker. Make sure you submit BOTH methods for early submission; we will not make exceptions if you only implement this method.

2. addHacker(Hacker toAdd)

This method inserts a Hacker into the hash table stored at hackerDirectory.

You are given a Hacker object to insert, via the method parameters. This is the candidate that we will attempt to insert – no duplicates are allowed.

Calculate which Hash Table index this hacker belongs to with Math.abs(hacker.hashCode()) % tableLength (where tableLength is the length of the hackerDirectory array). 

Then, check the Linked List found at that Hash Table index:

  • If the list is empty, add that Hacker to the list. YOU MUST increment numHackers by 1 when a unique hacker is added, even when the list is empty, and you MUST rehash if the load factor is met even when inserting into an empty list.
  • If the hacker’s name already exists in that list, then add all of the toAdd Hacker’s incidents to the existing Hacker object you found.
    • You can use hacker.getIncidents().addAll( toAdd.getIncidents() ) to add the new Hacker’s incidents to the existing Hacker (replace “hacker” with the existing Hacker object you found).
  • If the hacker’s name is not found in the list at that index, add the hacker to the END of the list and increment numHackers by 1.

AFTER YOU INSERT, check if the number of hackers in the table (numHackers) is >= the length of hackerDirectory divided by 2. If it is, then call the resize() method (seen below). DO NOT do this if you did not insert a new HNode.

Early submission. Implement the first two methods for extra credit: createSingleHacker() and addHacker(). We will not make exceptions if you only implement one method. You do NOT need to implement resize() for early submission; we’ll only test early submission with tables that do not resize. 

This is the expected output in the driver when attempting to run hackerTest.in (which does NOT rehash).

IMPORTANT: If you see 20 indices OR Hacker7356 OR Hacker4774, you have an outdated hackerTest file and must redownload the input file from Autolab. 

You can view incidents by clicking the “View Incidents” button.


Here’s a text based representation:

Index 0
{Suspect: Hacker7658, Num Incidents: 1}
        {OS: Linux, Web Server: Apache, URL Hash: 1a320cc7e82d4e9dd0b251cda1e5149b, IP Hash: 8692d611f3c49198155ffaa6f2f4e8fe, Location: Indonesia}
Index 1
Index 2
Index 3
{Suspect: Hacker6313, Num Incidents: 2}
        {OS: Linux, Web Server: Apache, URL Hash: a6e869c4ae9a48eb1bc2b54fa03db660, IP Hash: 126e7450a80a4689d5977980cce05883, Location: France}
        {OS: Linux, Web Server: Apache, URL Hash: 775c0a468b31b4321922c66177f4f8b0, IP Hash: 126e7450a80a4689d5977980cce05883, Location: France}
Index 4
Index 5
{Suspect: Hacker8850, Num Incidents: 2}
        {OS: Linux, Web Server: Apache, URL Hash: b2aec8b43de6bfb640e8bc361f02e926, IP Hash: 157ef7f43422d9fae16234410796f823, Location: United States}
        {OS: Linux, Web Server: Apache, URL Hash: b61bd74247657f91c4540359d635f403, IP Hash: 157ef7f43422d9fae16234410796f823, Location: United States}
Index 6
{Suspect: Hacker6879, Num Incidents: 2}
        {OS: Linux, Web Server: Apache, URL Hash: 22d9f5148e5ff424df1897c502ae756f, IP Hash: dea0daf8749639f8618c64c7b092325c, Location: European Uni}
        {OS: Linux, Web Server: Apache, URL Hash: ed55ac2ae85cf97caac5c6ce30255149, IP Hash: ecc205bf2f3cc3b8ce8e48503bcf5c5d, Location: Italy}
Index 7
Index 8
Index 9

3. resize()

This method resizes the existing hackerDirectory Hash Table, to double the current length.
Create a temp HNode[] variable, and set it to the existing hackerDirectory array – this will be the old array that we will reference. Since each Hacker gets reinserted in addHacker(), numHackers will be reset to 0.
Then, set the hackerDirectory instance variable to a brand new HNode[] object, with its size as double the old hackerDirectory’s length.
Finally, loop through your old hackerDirectory (stored in your temp variable), and call your addHacker() method on each hacker to reinsert them into your larger table.

RESTART the driver if you already read hackerTest.in or another file.

The input files that rehash are large — it’s best to use the debugger to verify that items are being inserted correctly across multiple rehashes. If nothing is listed under an index, it’s empty.
This is the first 20 indices for hacker2.in – since this is large, we’re providing a text representation. Use “View Incidents” to see incidents and number of incidents.
Index 0
Index 1
Index 2
Index 3
Index 4
Index 5
Index 6
Index 7
Index 8
Index 9
Index 10
{Suspect: Hacker2626, Num Incidents: 6}
        {OS: Linux, Web Server: Apache, URL Hash: 172882f07ecdcd8d42de3a2f07fa9607, IP Hash: e817cc1d765b4920424d3a3b001452b5, Location: European Uni}
        {OS: Linux, Web Server: Apache, URL Hash: 459eaf92f4f8536a685be1f3736a51f6, IP Hash: 38c69655d13e9a5c8912cf40951f2154, Location: United States}
        {OS: Linux, Web Server: Apache, URL Hash: 617ab4c6bfb906fbacfbd7d40133a8e0, IP Hash: 72d70361f5c1fec38965d7931b4c3fd0, Location: United States}
        {OS: Linux, Web Server: Apache, URL Hash: 3415706925c4137b1e57b3dd6cd81c48, IP Hash: 4e06aab35ba391ed833d3e103592357f, Location: United States}
        {OS: Linux, Web Server: Apache, URL Hash: 1246b51b663cb1d86ebe39cb445fba9c, IP Hash: e7c4494a0b2ea0c8ce8148a297f6090a, Location: United States}
        {OS: Linux, Web Server: Apache, URL Hash: 840c3b2f6d4d918ee61f5998fbec54e2, IP Hash: 4e321ab435dc09055fc55a77831e4f97, Location: United States}
Index 11
{Suspect: Hacker4308, Num Incidents: 1}
        {OS: Linux, Web Server: Apache, URL Hash: 9d21a764728221f50ae565b452768934, IP Hash: f2d4ec8598283416a9b3b1249ad81999, Location: United States}
Index 12
Index 13
Index 14
Index 15
Index 16
{Suspect: Hacker6789, Num Incidents: 2}
        {OS: Linux, Web Server: Apache, URL Hash: 22d9f5148e5ff424df1897c502ae756f, IP Hash: dea0daf8749639f8618c64c7b092325c, Location: European Uni}
        {OS: Linux, Web Server: Apache, URL Hash: ed55ac2ae85cf97caac5c6ce30255149, IP Hash: ecc205bf2f3cc3b8ce8e48503bcf5c5d, Location: Italy}
Index 17
{Suspect: Hacker9758, Num Incidents: 1}
        {OS: Linux, Web Server: Apache, URL Hash: 5cd45772fa424c4a634300336e165674, IP Hash: 4d3f24f4d89daa06ae478633d5dec18e, Location: Germany}
{Suspect: Hacker8812, Num Incidents: 1}
        {OS: FreeB, Web Server: Apache, URL Hash: f4471bb06fbd315019b5e007625df615, IP Hash: 88f2b0fca35de2076db48efd29856d41, Location: Sweden}
{Suspect: Hacker6139, Num Incidents: 1}
        {OS: Linux, Web Server: Apache, URL Hash: e129227e161872f12db500f89f7934f9, IP Hash: 96f5553c5e438a1978e3e761b54cbb9a, Location: United States}
Index 18
{Suspect: Hacker9658, Num Incidents: 1}
        {OS: Linux, Web Server: Apache, URL Hash: c90fc10b2ec03b6c8eea44d380a20ba4, IP Hash: 0a0f4995422e59de2096fbff37327c5d, Location: United States}
Index 19

4. search(String toSearch)

This method searches for a given Hacker name in the hackerDirectory, and returns that Hacker object if it is found. We will test this method separately, but this is a helper for other methods. 

If it is not found, return null.

You can determine which index the Hacker should be at with:

Math.abs(toSearch.hashCode()) % hackerDirectory.length

To test this method, you can:

– Use a JUnit test where you run initializeTable on a HackerDirectory object, call search afterward and then compare the result. 
– OR use the debugger when you call search in other methods. 

5. remove(String toRemove)

This method searches for a given Hacker name and removes that Hacker object (and corresponding HNode) from the hackerDirectory if it is found.

You should attempt to locate the specified hacker the same as in search(). This time, if you find them perform Singly Linked-List deletion on that HNode.

If you find and succesfully remove the Hacker, decrement numHackers by one. Return the Hacker object for the hacker removed, or if you don’t find the hacker to remove then return null.

To test this method, hit “Delete File” to delete the record we have of a hacker. 

You can check the output manually to see if your deletion succeeded – use your knowledge of singly linked list deletion to do so, and make sure you correctly handle cases of deleting the front, middle, and end. 

Alternatively, you can use a JUnit test to check the list. 

6. mergeHackers(String hacker1, String hacker2)

This method takes two Hacker names as parameters, and attempts to merge them into one Hacker. We would do this if we find out that both hackers are actually the same.

You can use your search() method to find both Hacker objects in hackerDirectory. If either is not found, return false.

If both hackers are found, attempt to merge the hacker with LESS incidents into the one with more. If they have the same amount of incidents, merge hacker2 into hacker1.

To merge two hackers, add all incidents from one hacker (Hacker A) into the hacker you wish to merge into (Hacker B). Then, add Hacker A’s name as an alias for Hacker B. Finally, remove Hacker A, using your remove method. Return true if successful.

 When merging Hacker4308 -> Hacker2626 in hacker2.in,  Hacker2626 stays, gains Hacker4308 as an alias and Hacker4308 is deleted. Here are the outputs and incidents.

RESTART the driver if you have merged or removed before.

 

- Date: 2012-06-25
  Location: European Uni
  IP Hash: e817cc1d765b4920424d3a3b001452b5
  OS: Linux
  Web Server: Apache
  URL Hash: 172882f07ecdcd8d42de3a2f07fa9607

- Date: 2012-06-26
  Location: United States
  IP Hash: 38c69655d13e9a5c8912cf40951f2154
  OS: Linux
  Web Server: Apache
  URL Hash: 459eaf92f4f8536a685be1f3736a51f6

- Date: 2012-06-26
  Location: United States
  IP Hash: 72d70361f5c1fec38965d7931b4c3fd0
  OS: Linux
  Web Server: Apache
  URL Hash: 617ab4c6bfb906fbacfbd7d40133a8e0

- Date: 2012-06-26
  Location: United States
  IP Hash: 4e06aab35ba391ed833d3e103592357f
  OS: Linux
  Web Server: Apache
  URL Hash: 3415706925c4137b1e57b3dd6cd81c48

- Date: 2012-06-26
  Location: United States
  IP Hash: e7c4494a0b2ea0c8ce8148a297f6090a
  OS: Linux
  Web Server: Apache
  URL Hash: 1246b51b663cb1d86ebe39cb445fba9c

- Date: 2012-06-26
  Location: United States
  IP Hash: 4e321ab435dc09055fc55a77831e4f97
  OS: Linux
  Web Server: Apache
  URL Hash: 840c3b2f6d4d918ee61f5998fbec54e2

- Date: 2012-06-26
  Location: United States
  IP Hash: f2d4ec8598283416a9b3b1249ad81999
  OS: Linux
  Web Server: Apache
  URL Hash: 9d21a764728221f50ae565b452768934
 

7. getNMostWanted(int n)

In this method, you’ll find the most wanted hackers in the hackerDirectory, and return them in an ArrayList. The number of Hackers in your list is determined by the parameter “n”. (i.e. return the Top N hackers, such as Top 10, Top 50, Top 100, etc…). Hacker ranking is determined by number of incidents, and ties are broken by a Hacker’s name (this is done for you in the Hacker compareTo() method).

To do this, we can use a Priority Queue, specifically a Max Heap. We provide a MaxPQ.java class for this purpose, which you can use.

Create a new MaxPQ object, which holds Hacker objects (MaxPQ uses generics, so just like with ArrayLists you’ll need to specify a type). USE the MaxPQ class from lecture that’s provided to you, DO NOT use other implementations or implement a priority queue from scratch.

Insert every Hacker in the hackerDirectory HashTable into your MaxPQ object.

Create an ArrayList to hold the top N hackers.

Then, call delMax() N times, and add each deleted hacker to your ArrayList. Since a MaxPQ orders largest to smallest, this will delete and return the top N hackers. You don’t need to handle comparisons – the compareTo method in the Hacker class compares by incidents. 

Finally, return your Most Wanted ArrayList. 

Here are the top ten most wanted hackers for hacker2.in. RESTART the driver if you have merged or removed.

8. getHackersByLocation(String location)

In this method, you’ll find all Hackers with a given location, and return them in an ArrayList. Create an ArrayList of Hackers, to store any matching Hackers. Iterate through all Hackers in the hackerDirectory, if their location matches the given location, then add them to your list. Finally, return your list of hackers for the given location. These are the hackers by location for the US in hacker2.in. Use the United States option in the location dropdown in the driver to sort by location – it’ll open up a popup.  RESTART the driver if you have merged or removed items.
Name: Hacker2626
Aliases: []
Num Incidents: 6
Name: Hacker4308
Aliases: []
Num Incidents: 1
Name: Hacker6139
Aliases: []
Num Incidents: 1
Name: Hacker9658
Aliases: []
Num Incidents: 1
Name: Hacker5793
Aliases: []
Num Incidents: 1
Name: Hacker9057
Aliases: []
Num Incidents: 3
Name: Hacker9051
Aliases: []
Num Incidents: 1
Name: Hacker3700
Aliases: []
Num Incidents: 1
Name: Hacker9399
Aliases: []
Num Incidents: 2
Name: Hacker4636
Aliases: []
Num Incidents: 2
Name: Hacker3996
Aliases: []
Num Incidents: 4
Name: Hacker2155
Aliases: []
Num Incidents: 1
Name: Hacker4685
Aliases: []
Num Incidents: 1
Name: Hacker6662
Aliases: []
Num Incidents: 1
Name: Hacker4923
Aliases: []
Num Incidents: 1
Name: Hacker7943
Aliases: []
Num Incidents: 1
Name: Hacker2241
Aliases: []
Num Incidents: 3
Name: Hacker3385
Aliases: []
Num Incidents: 1
Name: Hacker6548
Aliases: []
Num Incidents: 1
Name: Hacker6746
Aliases: []
Num Incidents: 3
Name: Hacker8526
Aliases: []
Num Incidents: 1
Name: Hacker8574
Aliases: []
Num Incidents: 1
 

Implementation Notes

  • YOU MAY only update the methods with the WRITE YOUR CODE HERE line. 
  • COMMENT all print statements you have written from CyberCrimeInvestigation.java
  • DO NOT add any instance variables to the CyberCrimeInvestigation class.
  • DO NOT add any public methods to the CyberCrimeInvestigation class.
  • DO NOT add/rename the project or package statements.
  • DO NOT change the class CyberCrimeInvestigation name.
  • YOU MAY add private methods to the CyberCrimeInvestigation class.
  • YOU MAY use any of the libraries provided in the zip file.
  • DO NOT use System.exit()

VSCode Extensions

You can install VSCode extension packs for Java. Take a look at this tutorial. We suggest:

Importing VSCode Project

  1. Download CyberCrimeInvestigation.zip from Autolab Attachments.
  2. Unzip the file by double clicking.
  3. Open VSCode
    • Import the folder to a workspace through File > Open

Executing and Debugging

  • You can run your program through VSCode or you can use the Terminal to compile and execute. We suggest running through VSCode because it will give you the option to debug.
  • How to debug your code
  • If you choose the Terminal:
    • first navigate to CyberCrimeInvestigation directory/folder
      • to compile:  javac -d bin src/investigation/*.java
      • to execute: java -cp bin investigation.Driver

Before submission

COMMENT all printing statements you have written from CyberCrimeInvestigation.java

Collaboration policy. Read our collaboration policy here.

Submitting the assignment. Submit CyberCrimeInvestigation.java separately via the web submission system called Autolab. To do this, click the Assignments link from the course website; click the Submit link for that assignment.

Early submission. Implement the first two methods for extra credit: createSingleHacker() and addHacker(). We will not make exceptions if you only implement one method.

Getting help

If anything is unclear, don’t hesitate to drop by office hours or post a question on Piazza.

  • Find instructors office hours here  
  • Find tutors office hours on Canvas -> Tutoring
  • Find head TAs office hours here 
  • In addition to office hours we have the CSL in Hill 252, a community space staffed with community managers who are undergraduate students further along the CS major to answer questions.
 

By Colin Sullivan

Assignment Concept by Navya Sharma and Pooja Kedia