Internet and Distributed Programming (IS52025A)

 Sebastian Danicic 


Contents

(Week 0) Introduction

Watch Intoductory Course Video

Course Structure

The course consists of 10 lectures and 10 labs. The first seven weeks will concentrate on the course material. The last three weeks will concentrate on completion of the coursework.

date lab lecture
13 Jan - Lecture 1
20 Jan Lab 1 Lecture 2
27 Jan Lab 2 Lecture 3
3 Feb Lab 3 Lecture 4
10 Feb Lab 4 Lecture 5
17 Feb Reading Week
24 Feb Lab 5 Lecture 6
3 March Lab 6 Lecture 7
10 March Lab 7 Lecture 8
17 March Lab 8 Lecture 9
24 March Lab 9 Lecture 10
28 April Lab 10  

Assessment

The course is assessed by coursework(20%) and exam (80%).

Weekly Lab Exercises

There are 7 weekly lab exercises worth 7 marks each. Each coursework element must be demonstrated to a tutor. The deadline for demonstration of each lab exercise is the week after it is set.

Courseworks

There are two large courseworks. The first is worth 51 marks It must be completed, uploaded and demonstrated to a tutor by April 28 2014. Coursework two is not assessed.

A zip file containing all a zip file which contains all the source code and a README file (instructions on how to run program) should be uploaded. to

IS52025A-assignments-2013-14/cwk12014/

Courswork

This course has two assignements. One to implement an Internet chat room and the other is to implement a web-crawler which creates a graph of a particular domain (not assessed).

Coursework one

You must implement an Internet chat room with the following components:

  1. A chat client.

  2. The Server. (10 extra marks if all chats are stored on a database)

  3. An Admin Client (10 marks extra. This should at least show all sent messages as they are happening in real time.)

This chat client has two windows, one for typing in messages and commands and another for receiving messages from users. There are the following commands:

You should make sure your client and server programs do not crash.

Some useful code done in class:

class firstPart
{

	class Message
	{
		int messageType;
		String username;
		String message;
	
		Message(int i, String m, String u )
		{
			messageType=i;
			username=u;
			message=m;
		}
	
	}

	static String fp (String s)
	{ 
		String result="";
	        
		for (int i=0;i<s.length();i++)
		{
			if (s.charAt(i)==' ') return result;
			else result=result+s.charAt(i);
			
		}
		return result;
	}

	static Message parseMessage(String s)
	{
	    if ((fp(s).equals("!join")) return new Message(0,"",secondPart(s));
	    else if (s.equals("!block")) return new Message(4,"","");	
	         else etc. etc.	
	}

      public static void main(String [] args)
      {
      
      	System.out.println(fp(args[0]));
        if (fp(args[0]).equals("!join"))  System.out.println("WHOOOPEEEE!");
      }

}

Coursework two

Coursework Upload Instructions

Deadline Midnight 28 April 2014.

Coursework One:

A zip file containing all a zip file which contains all the source code and a README file (instructions on how to run program) should be uploaded to

IS52025A-assignments-2013-14/cwk12014/
on igor.

Coursework Two:

A zip file containing all a zip file which contains all the source code and a README file (instructions on how to run program) should be uploaded. to

IS52025A-assignments-2013-14/cwk22014/
on igor.

Lecture 1: Simple Clients and Servers

Watch Clients and Servers video

Watch Clients and Servers video

A Very Simple Echo Server

import java.io.*;
import java.net.*;

class evenSimplerEchoServer
{
 public static void main(String[] argv) throws Exception
  {ServerSocket s = new ServerSocket(5000);
   Socket t = s.accept();//wait for client to connect
   InputStream b = t.getInputStream();
   OutputStream p =t.getOutputStream();
   int c;
   while((c=b.read())!=-1) {
                            p.write(c);
                            p.flush();	 
                            System.out.print((char) c);
			   }
  }
}

A Very Simple Client

import java.io.*;
import java.net.*;

class evenSimplerEchoClient
{
 public static void main(String[] argv) throws Exception
  {Socket s = new Socket("localhost",5000);
   OutputStream p =s.getOutputStream();
   InputStream i = s.getInputStream();
   InputStreamReader b = new InputStreamReader(System.in); 
    int c;
   while((c=b.read())!=-1) {
                            p.write(c);
                            p.flush();
                            System.out.print((char)i.read());
		           }
  }
}

Lab (1) Exercises for 20 Jan 2014

  1. Compile and run the server and then the client on your local machine.
  2. Copy the server to igor. Compile the server on igor and point your client at the server on igor. You will have to choose a different port number.
  3. Rewrite your client and server so that the ports etc. can be taken from the command line (using args[0] etc.)
  4. Rewrite the server so it sends back upper case values of the characters it receives.

Lecture 2: Concurrency and Threads

Watch Video about Threads in Java

Compile and run thread below:

class threads
{


	static class t1 extends Thread
	{
	   
	   public void run()
	   {    
	   	int i=0;
		while(true) 
		{
		   System.out.println("hello"+i++);
	   
	        }
	   }
	   
	}


      
	static class t2 extends Thread
	{
	   
	   public void run()
	   {    
	   	int i=0;
		while(true) 
		{
		   System.out.println("goodbye"+i++);
	   
	        }
	   }
	
	
	}



       public static void main( String [] args)
       {
                new t1().start();
		new t2().start();
       
       }

}

Run it a few times. What happens? Is it the same each time?

A Graphical Client and a Multithreaded Echo Server

Watch Video about the multi-threaded echoserver

A Graphical Client

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import java.io.*;
import java.net.*;


public class evenSimplerGuiClient implements ActionListener {
	private JTextField user = new JTextField("user",20);
	private JTextArea server = new JTextArea("server",5,20);
	private JScrollPane sp =new JScrollPane(server); 
	private  Socket s;
	private OutputStreamWriter p;
	private InputStream i; 
	private JFrame window = new JFrame("client");
        
	
	class serverReader extends Thread
        {
	 
	        public void run()
		{ 
		  String s="";
		  int c;
		  try
		  {
		       while ((c=i.read())!=-1)
		       {
		  	s=s+ ((char)c);
			server.setText(s);
		       }
                  }
		  catch(Exception e){};	
		}
	       }
	 
	public evenSimplerGuiClient() throws Exception
	{
	  try
	  {
	     s = new Socket("localhost",5000);
	     p =new  OutputStreamWriter(s.getOutputStream());
	     i =  s.getInputStream();
	     new serverReader().start();
	  }
	    
	  catch (Exception e){System.out.println("error");};
	  
	  window.setSize(300,300);
	  window.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
	  window.setLayout(new FlowLayout());
	  window.add(sp);
	  window.add(user);
	
	  user.addActionListener(this);
	  
          window.setVisible(true);
	}	 
	
	public void actionPerformed(ActionEvent a) 
	{
		
		String s= user.getText();
		try
		{
		 p.write(s+'\n',0,s.length()+1);
		 p.flush();user.setText("");
		}
	        catch (Exception e){};  
	}
	
	public static void main(String[] args) throws Exception
        {
		
		new evenSimplerGuiClient();
	}
}




A Simple Multithreaded Echo Server

import java.io.*;
import java.net.*;

class simpleMultiThreadedEchoServer
{
 public static void main(String[] argv) throws Exception
  {
   ServerSocket s = new ServerSocket(5000);
   Transaction k;
   while (true) 
   {
      k = new Transaction(s.accept());
      k.start();
   }
  }
}   
 
class Transaction extends Thread
{
 InputStream b;
 OutputStream p;
  public Transaction(Socket s) throws Exception
  {
    b=s.getInputStream();
    p =s.getOutputStream();
  }
 
 public void run() 
  {
   int c;
   try
   {
     while((c=b.read())!=-1)          
     {
       p.write((char)c);
       p.flush();
       System.out.print((char)c);
     }
   }
   
   catch (Exception e)
   {
   }   
  }
}

SSH Tunnels

Read http://en.wikipedia.org/wiki/Tunneling_protocol

See My blog

Suppose you have a server called fred.bloggs.ac.uk at your institution and you have written some nice server which is listening on port 9999 and suddenly in the middle of your course port 9999 gets blocked. What do you do?

Luckily you can SSH onto fred.blogs.ac.uk

(1) Run you server as normal on fred.blogs.ac.uk listening on port 9999.

(2) then type: ssh -L4950:fred.bloggs.ac.uk:9999 fred.bloggs.ac.uk

(You will need your username if your username on your local machine is different form your username on fred.bloggs. too like this:

ssh -L4950:fred.bloggs.ac.uk:9999 mas01sd@fred.bloggs.ac.uk)

(3) finally point your clients at your local machine port 4950 and off you go!

Using Putty (courtesy of Eamonn Martin)

1. enter the normal hostname for igor as usual

2. before connecting, open Connection-SSH-Tunnels

3 enter local port in "Source port" (4950)

4. enter remote host:port in destination (igor.gold.ac.uk:9999)

5 click Add then the tunnel shows in the list box above

6. Click Open ( you get a spurious connection to igor which you just minimize and ignore - thats the tunnel)

Watch Bypassing Firewalls Using SSH Tunnelling

SSH Tunnels and Web Browsers

If you want to make it appear that you are browsing from somewhere where you are not. (igor for example) Do ssh -p 22 -D 7335 -f -C -q -N igor

and install FoxyProxy on your browser and

and configure FoxyProxy to use localhost:7335 for all your urls

This is useful if, say you are abroad, and you want to use things only available in the UK.

Watch SSH tunnelling and Foxyproxy

Lab (2) Exercises 27th Jan 2014:

  1. Compile and run the server and then the graphical client on your local machine.
  2. Copy the server to igor. Compile the server on igor and point your graphical client.
  3. You might need to set up an SSH tunnel.
  4. Check you can run more than one client.

  5. Rewrite your graphical client so that the ports etc. can be taken from the command line (using args[0] etc.)

Lecture 3: The need for Synchronisation with Shared Data and Threads

Read about Thread Interference.

Watch Video about The need for synchronization with shared data and threads in Java

Synchronisation Can Cause Deadlock

Watch Video about Deadlock in Java.

Try:

class Friend {
        
	String name;
        
	public Friend(String name) {
            this.name = name;
        }
       
       
        public /*synchronized*/   void bow(Friend f) {
            System.out.println(name);
            f.bowBack();
        }
        public  /*synchronized */void bowBack() {
             System.out.println(name);
        }
    }

class bla extends Thread
{
      
      Friend  one,two;
      
      bla(Friend f, Friend g)
      {
        one=f; two=g;
      
      }
      public void run() { one.bow(two);}

}


public class Deadlock {
   

    public static void main(String[] args) {
        Friend fred = new Friend("Fred");
        Friend jack = new Friend("Jack");
        new bla(fred,jack).start();
	new bla(jack,fred).start();
            }
}
Now uncomment the synchronized labels and describe what happens. Why?

Multi-threaded BroadCaster Servers

Watch Video about the multi-threaded Broadcaster server .

Broadcaster Server

import java.util.*;
import java.io.*;
import java.net.*;


class SynchList
{
	ArrayList <OutputStream> it;
	SynchList()
	{
		it=new ArrayList <OutputStream> ();
	}

	synchronized OutputStream get(int i)
	{
	  return it.get(i);
	}

	synchronized void add(OutputStream o)
	{
		it.add(o);
	}

	synchronized int size()
	{
		return it.size();
	}
}

class broadcasterWithList
{
static SynchList Outputs= new SynchList();
static int i=0;
 public static void main(String[] argv) throws Exception
  {ServerSocket s = new ServerSocket(5000);
   Transaction k;
   while (true)  {k = new Transaction(i,s.accept(),Outputs);k.start();i++;
           System.out.println("client joined");}//wait for client to connect
   }
}   
 
class Transaction extends Thread
{
 SynchList outputs;
 int n;
 Socket t;
 InputStream b;
 OutputStream p;
 public Transaction(int i,Socket s, SynchList v) throws Exception
  {
    outputs=v;
    n=i;t=s; b = t.getInputStream();
    p =t.getOutputStream();
    outputs.add(p);
   }
 
 public void run() 
  {
   int c;
   try{
   while((c=b.read())!=-1) 
      {
      	for (int j=0;j<outputs.size();j++)
       	{
	 //if (j!=n) 
	          {
			(outputs.get(j)).write(c);
       			(outputs.get(j)).flush();
		  }
       }
       System.out.print((char)c);
       System.out.print("size of ArrayList :"+outputs.size()); 
       
       }
        System.out.print("left loop");
      }
     
   catch (Exception e)
   { System.out.print(e);}   
  }
 
}

Lab (3) Exercises for 3 Feb 2014

  1. Write a program similar to test1 above. This program must contain two threads a and b. Thread a must call method f. Thread b must call method g. Methods f and g must both contain infinite loops one printing out hello, the other goodbye. What is the difference when f and g are synchronized and not synchronized? What is this an example of? (answer - starvation)

  2. Compile and run the broadcaster and make it crash by killing a client.

Lecture 4

Removing Clients from the List

If you kill one of the clients communication with the Multithreaded Broadcaster, there is a disaster. Explain why this happens. We solve this by removing the corresponding outputStream from the list when a client dies.

Watch Broadcaster server which removes dead clients from its list.

A String Broadcaster

Watch Video about the String Brodcaster server . We can use Scanners and Printstreams in our broadcaster server so that it can read and write strings of input to and from the clients. This makes things a bit easier. In the following program, the server tells each client where the message has come from.
import java.util.*;
import java.io.*;
import java.net.*;


class SynchList
{
	ArrayList <PrintStream> it;
	
	
	SynchList()
	{
		it=new ArrayList <PrintStream> ();
	}

	synchronized PrintStream get(int i)
	{
	  return it.get(i);
	}

	synchronized void add(PrintStream o)
	{
		it.add(o);
	}

	synchronized int size()
	{
		return it.size();
	}

	synchronized void remove(PrintStream o)
	{
		 it.remove(o);
	}
}

class StringBroadcaster
{
static SynchList Outputs= new SynchList();
static int i=0;
 public static void main(String[] argv) throws Exception
  {ServerSocket s = new ServerSocket(5000);
   Transaction k;
   while (true)  {k = new Transaction(Outputs.size(),s.accept(),Outputs);k.start();
           System.out.println("client joined");}//wait for client to connect
   }
}   
 
class Transaction extends Thread
{
 SynchList outputs;
 int n;
 Socket t;
 InputStream b;
 OutputStream p;
 PrintStream pp;
 public Transaction(int i,Socket s, SynchList v) throws Exception
  {
    outputs=v;
    n=i;t=s; b = t.getInputStream();
    p =t.getOutputStream();
    pp =new PrintStream(p);
    outputs.add(pp);
   }
 
 public void run() 
  {
   Scanner s= new Scanner(b);
   
   int c;
   try{
   while(s.hasNext()) 
      {
      	String it=s.next();
	for (int j=0;j<outputs.size();j++)
       	{
	 //if (j!=n) 
	          {
			(outputs.get(j)).println(n+":"+it);
       			(outputs.get(j)).flush();
		  }
       }
       System.out.println(it);
      // System.out.print("size of ArrayList :"+outputs.size()); 
       
       }
        System.out.print("client " + n + " left loop");
	outputs.remove(pp);
      }
     
   catch (Exception e)
   { outputs.remove(pp);System.out.print(e);}   
  }
 
}

Lab (4) Exercises for 10 Feb 2014

  1. Rewrite the StringBroadcaster so the first message of each client is the name of the user. This gets transmitted each time instead of the number. ( See Video about the multi-threaded Broadcaster server )

  2. (optional) Rewrite the server so the port is given on the command line.
  3. (optional) Run your broadcaster on igor and see if your friends can connect to your server. You may need to use ans SSH tunnel.

  4. (optional) One person in the class set up the broadcaster server. See if everyone in the class can connect and chat together!

Lecture 5: Object Serialization: Sending and Receiving Objects

See Notes on Object Serialiazation.

A Client that sends Objects and Receives Objects

import java.io.*;
public class Student implements Serializable
{

String name;
int mark;

 public Student (String n, int a)
 {
	mark=a;name=n;
 } 

 public String toString()
 {

	return name+" "+age;
 }

}


class objectClient1
{
 public static void main(String[] argv) throws Exception
  {Socket s = new Socket("localhost",5000);
   ObjectOutputStream p =new ObjectOutputStream(s.getOutputStream());
   ObjectInputStream q =new ObjectInputStream(s.getInputStream());
   Scanner b = new Scanner(System.in); 
   int c;
   while(b.hasNext()) {
   			    String name=b.nextLine();	
   			    int mark=Integer.parseInt(b.nextLine());
			    p.writeObject(new Student(name,mark));
			    p.flush();
			    System.out.println(q.readObject());
		      
		      }
                                   
  
  }
}

An Object Echo Server

import java.io.*;
import java.net.*;

class objectEchoServer
{
 public static void main(String[] argv) throws Exception
  {ServerSocket s = new ServerSocket(5000);
   Socket t = s.accept();//wait for client to connect
   System.out.println("server connected");
   ObjectInputStream b = new ObjectInputStream(t.getInputStream());
   ObjectOutputStream q = new ObjectOutputStream(t.getOutputStream());
   Object c;
   while((c=b.readObject())!=null) { 
                       		q.writeObject(c);      
			   
			   }
                            
 			  
  }
 
}

Lab (5) Exercises 17 Feb 2014

  1. Write a Graphical Version of the objectClient1 above. answer (almost)
  2. Write a multi-threaded Object Echo Server. answer
  3. Write a client which terminates if it receives a Student Object whose name is `end' and whose mark is 0.
  4. Write an ObjectBroadcasterServer!
  5. Think about the Objects your Chatroom client (assignment) and server must communicate. For example, perhaps the Server should send the username with the message to the client. Define an Object like Student for this. The client can then choose not to display messages from blocked users.

Lecture 6: Communicating with Databases in Java

Watch Video about installing mysql in Ubuntu

import java.sql.*;


public class seb6 {
	
	
		
	public static void main(String[] args) throws Exception 
	{
	  Class.forName("com.mysql.jdbc.Driver");
	  Connection connect=
	  DriverManager.getConnection("jdbc:mysql://localhost/silly","mas01sd","seb");
	  Statement st = connect.createStatement();
         // st.executeUpdate("INSERT INTO one VALUES('" + args[0] +"','" + args[1] + "');");
          ResultSet resultSet = st.executeQuery("SELECT *  from authors");
	  while (resultSet.next()) 
	  {
              for (int i=1;i<4;i++)System.out.print(resultSet.getString(i) + " ");
              System.out.println();				
          } 
         }

}

Watch Video about accessing Databases in Java.

SSH Tunnelling to your Database Server

See http://www.whoopis.com/howtos/mysql_ssh_howto.html.

Watch Video about Tunnelling though the firewall to access your database server.

Same thing for postgres

import java.sql.*;


public class seb7 {
	
	
		
	public static void main(String[] args) throws Exception 
	{
	  Class.forName("org.postgresql.Driver");
	  Connection connect=
	  DriverManager.getConnection("jdbc:postgresql://127.0.0.1:5001/mas01sd","mas01sd","");
	  Statement st = connect.createStatement();
         // st.executeUpdate("INSERT INTO one VALUES('" + args[0] +"','" + args[1] + "');");
          ResultSet resultSet = st.executeQuery("SELECT *  from weather");
	  while (resultSet.next()) 
	  {
              for (int i=1;i<4;i++)System.out.print(resultSet.getString(i) + " ");
              System.out.println();				
          } 
         }

}

Watch Video about accessing Postgres Database Server in Java.

Avoiding Hard Wiring the Password

Watch Avoiding Hard Wiring your password.

import java.sql.*;
import java.io.Console;

public class seb8 {
	
	
		
	public static void main(String[] args) throws Exception 
	{
	  Class.forName("org.postgresql.Driver");
	   
	   Console cons;
        char[] passwd;
        String pass="";
     if ((cons = System.console()) != null &&
     (passwd = cons.readPassword("%s", "Password:")) != null) {
     for (int i=0;i<passwd.length;i++) pass+=passwd[i];
     
 }
	  Connection connect=
	  DriverManager.getConnection("jdbc:postgresql://127.0.0.1:5001/mas01sd","mas01sd",pass);
	  Statement st = connect.createStatement();
         // st.executeUpdate("INSERT INTO one VALUES('" + args[0] +"','" + args[1] + "');");
          ResultSet resultSet = st.executeQuery("SELECT *  from weather");
	  while (resultSet.next()) 
	  {
              for (int i=1;i<4;i++)System.out.print(resultSet.getString(i) + " ");
              System.out.println();				
          } 
         }

}

Lab (6) Exercises for 3 March 2014

  1. Set up a simple MYSQL or PostgreSQL database with a single table to hold data about students.
  2. Change the program above to work with your database and table.
  3. Download JDBC4 Postgresql Driver, Version 9.1-901 or mysql-connector-java-5.0.8-bin.jar.
  4. To compile use For example:
    javac -cp .:mysql-connector-java-5.0.8-bin.jar seb5.java
    

    and to run:

    For example:

    java -cp .:mysql-connector-java-5.0.8-bin.jar seb5
    
  5. Change your Multi-threaded Object Server from last week to receive Student Objects and store them in your database. Non-Student objects should be discarded. (Use if (x instanceOf Student) ... for this.)

Homework

Get all of the above to work from home using SSH tunneling.

Lecture 7: Web Crawling (Spiders)

Take a copy of jsoup-1.6.1.jar. We are going to use jsoup to parse HTML documents in order to spider through websites.

Try this!

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class SebLinks 
{
        public static void main(String[] args) throws IOException 
        {
           String url = args[0];
           Document doc = Jsoup.connect(url).get();
           Elements links = doc.select("a[href]");
           for (Element link : links) System.out.println(link.attr("abs:href"));
        }
}

Now Try this!

import java.util.ArrayList;
import java.util.HashSet;
import java.util.HashSet.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class NewSpider1 
{


static HashSet<String>  links (String url)
{
	HashSet<String> a= new HashSet<String>();
	try{org.jsoup.Connection z=Jsoup.connect(url);
           Document doc = z.get();
	   Elements links = doc.select("a[href]");
           for (Element link : links) a.add(link.attr("abs:href"));
	   
	   }
	   catch (Exception e)
	   {
	   
	   }
	 return a;  
}





static  void Spider (String url, int n)
{
	  HashSet<String> alreadyVisited = new HashSet <String> ();
          HashSet<String> toVisit = new HashSet <String> ();
	  toVisit.addAll(links(url));
	  alreadyVisited.add(url);  
	  int i=0;
	  while (i<n && !toVisit.isEmpty())
	  {
	  	
		String z= toVisit.iterator().next();
	        boolean already=alreadyVisited.contains(z);
		if (already) toVisit.remove(z);
		else
		{
		  System.out.println(z);
		  HashSet <String> k= links(z);
		  toVisit.addAll(k);
		  alreadyVisited.add(z);
		  i++;
		}  
	  }
}

        public static void main(String[] args) throws IOException 
	{
           String url = args[0];
	   Spider(url,Integer.parseInt(args[1]));           
		   
        }
}

Lab (7) Exercises for 10 March 2014

  1. Compile and run the above program: To compile do
    javac -cp jsoup-1.6.1.jar SebLinks.java
    

    To run do

    java -cp .:jsoup-1.6.1.jar SebLinks http://localhost
    
    (or something else apart from http://localhost.)

  2. Rewrite the program above so it only crawls a domain given as the second command line argument i.e. it only follows links whose name starts with the second command line argument. Solution:
    import java.util.ArrayList;
    import java.util.HashSet;
    import java.util.HashSet.*;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    import java.io.IOException;
    public class NewSpider2 
    {
    
    tatic HashSet<String>  links (String url)
    {
    	HashSet<String> a= new HashSet<String>();
    	try{org.jsoup.Connection z=Jsoup.connect(url);
               Document doc = z.get();
    	   Elements links = doc.select("a[href]");
               for (Element link : links) a.add(link.attr("abs:href"));
    	   
    	   }
    	   catch (Exception e)
    	   {
    	   
    	   }
    	 return a;  
    }
    
    
    static  void Spider (String url, int n, String contains)
    {
    	  HashSet<String> alreadyVisited = new HashSet <String> ();
              HashSet<String> toVisit = new HashSet <String> ();
    	  toVisit.addAll(links(url));
    	  alreadyVisited.add(url);  
    	  int i=0;
    	  while (i<n && !toVisit.isEmpty())
    	  {
    	  	
    		String z= toVisit.iterator().next();
    	        boolean already=alreadyVisited.contains(z);
    		if (already) toVisit.remove(z);
    		else 
    		{
    		  if (z.contains(contains))
    		  {
    		    System.out.println(z);
    		    HashSet <String> k= links(z);
    		    toVisit.addAll(k);
    		  }
    		  alreadyVisited.add(z);
    		  i++;
    		}  
    	  }
    }
    
            public static void main(String[] args) throws IOException 
    	{
               String url = args[0];
    	   Spider(url,100,args[1]);           
    		   
            }
    }
    

  3. Experiment with this:
    mport java.util.ArrayList;
    import java.util.HashSet;
    import java.util.HashSet.*;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    import java.io.IOException;
    public class Brokens 
    {
    
    	static boolean broken(String url)
    	{
    	        try {Jsoup.connect(url).get(); return false;}
    	        catch (java.net.MalformedURLException e) {return true;}	
    		
    		catch (IOException e){if (e.toString().contains("java.io.IOException: 404"))
    		                        
    					 return true;
    				         return false; //this means it exists but isn't html
    				     
    				     }
    		catch(Exception e) {
    						
    					return false; 
    		//for any other errror assume not broken - this is a guess
    		                   }
    	
    	}		
    
    static HashSet<String>  links (String url)
    {
    	HashSet<String> a= new HashSet<String>();
    	try{org.jsoup.Connection z=Jsoup.connect(url);
               Document doc = z.get();
    	   Elements links = doc.select("a[href]");
               for (Element link : links) a.add(link.attr("abs:href"));
    	   
    	   }
    	   catch (Exception e)
    	   {
    	   
    	   }
    	 return a;  
    }
    
    
    
    
    static  void Spider (String url, int n, String contains)
    {
    	  HashSet<String> alreadyVisited = new HashSet <String> ();
              HashSet<String> toVisit = new HashSet <String> ();
    	  toVisit.addAll(links(url));
    	  alreadyVisited.add(url);  
    	  int i=0;
    	  while (i<n && !toVisit.isEmpty())
    	  {
    	  	
    		String z= toVisit.iterator().next();
    	        boolean already=alreadyVisited.contains(z);
    		if (already) toVisit.remove(z);
    		else 
    		{
    		  if (z.contains(contains))
    		  {
    		    System.out.println(z);
    		    HashSet <String> k= links(z);
    		    toVisit.addAll(k);
    		  }
    		  alreadyVisited.add(z);
    		  i++;
    		}  
    	  }
    	  
    	  for (String k:alreadyVisited)
    	  if (broken(k)) System.out.println("Broken: " +k);
    }
    
            public static void main(String[] args) throws IOException 
    	{
               String url = args[0];
    	   Spider(url,100,args[1]);           
    		   
            }
    }
    

Week 9: Help with Coursework

2012 Exam paper (Do this during the holiday!)

2012 Exam Paper

28 April 2014: Revision and Coursework Deadline

28 April 2014: Revision

May 2014: Exams





s.danicic@gold.ac.uk
Sebastian Danicic BSc MSc PhD (Reader in Computer Science)
Dept of Computing, Goldsmiths, University of London, London SE14 6NW
Last updated 2014-03-18