The conclusions and future work is listed below:
Web applications present a very high risk, and an attractive target to attackers for the following reasons: Firstly, the quality of the code is often rather poor and many vulnerabilities of commonly used code are published. Second, attacks can often be performed using PHP and shell scripts, which are much easier to develop and use than buffer-overflow exploits. Thirdly, tools such as search engines provide a very easy way for attackers to locate vulnerable web applications. We believe that web servers present relatively high-value targets for attackers since they are more likely to have higher bandwidth connections than the average desktop computer. They will also typically need to access the organisation's databases and so may provide a stepping stone for an attacker who wishes to recover such data.
Although significant effort is being made to improve code quality in many web applications, the volume of existing code, and the amount of new code being written are causing the number of vulnerabilities being reported to remain quite high. (For example, the number one cross-platform vulnerability listed in the SANS Top 20 Survey is web applications.) Since the other factors - public availability, easy exploitation and web applications being easy to locate via search engines - are not likely to change significantly, we can expect to see these trends carrying on into the future.
In order to acquire a greater amount of information the deployment process will be stream-lined. Therefore we plan to develop a live CD or an easy-to-install VMware image of our honeypots. Further, the level of detail of the emulation performed by our honeypots will be increased to improve the realism of the simulation and more accurately mimic a genuinely vulnerable web application. These improvements will enable us to observe a wider range of attack patterns and threats that are launched against today's web applications. Finally it would be very interesting to monitor bogus web spiders. This could be done by setting up a new honeypot that denotes its web pages as not-to-be indexed and logs any access to them.