Server development

System organisation & network architecture this page need to be entirely corrected

«The Web is continuously growing. Currently, there are more than 20 billions pages (some sources suggest 100 billions), compared to less than 1 billion documents in 1998. Traditionally, Web-scale search engines employ large and highly replicated systems, operating on computer clusters in one or few data centers. Coping with the increasing number of user requests and indexable pages requires adding more resources. However, data centers cannot grow indefinitely. Scalability problems in information retrieval have to be addressed in the near future, and new distributed applications are likely to drive the way in which people use the Web.» http://lsdsir09.isti.cnr.it/ The system is a peer to peer network based on open-source languages, LUA and Ruby.

It is a network of peer based on shared interests. It require the creation and management of profiles, both of user members and communities, the definition of a metric of interest proximity and the management of the membership, trust and privacy policy. It has to deal with two types of informations which require two different ways of dealing with them. Therefore the overall network architecture of the system is composed of two different layers.

The indexing layer is responsible for the global information stored in the long-lasting index, statistics about visited sites, The identifiers and the signatures of existing groups, operation related with groups: join/leave operations and deriving global visits scores for the sites.

It is the backbone of the network, namely a set of \institutional", reliable and trusted peers provided by ISPs that are not expected to churn or to exploit in an improper way the privacy related data.

This layer is built upon the SPLAY system and makes use of structured indices (i.e. distributed hash tables - DHT), in order to have an easy and efficient storing and retrieval system.

The interest- proximity layer.is dealing with the second level of information the local data associated with every user. Due to privacy concerns and the high variability of this information, it is not maintained at a global level. Instead, it is stored locally in every peer and only what the users allow to use is shared and used to compute similarities and contribute to the computation of group signatures.

This layer consists in the users of the system These peers usually have a high churn rate, since they constantly and unpredictably connect and disconnect to the network. Due to their nature, they form a self-emerging structured network. Those networks are more suitable to deal with less reliable peers since they can be more easily maintained, and have proven to be more suitable for the creation of spontaneous communities.