Society increasingly depends on networks in general and the Internet in particular, for just about every aspect of daily lives. Consumers use the Internet to access information, obtain products and services, manage finances, and communicate with one another. Businesses use the Internet to conduct business with consumers and other businesses. Nations rely on the Internet to conduct the affairs of government, deliver services to their citizens, and, to some extent, manage homeland security and conduct military operations.
As the Internet increases its reach in global scope, services traditionally implemented on separate networks are increasingly subsumed by the Internet, either as overlays, gateway access, or replacement for the legacy networks. These include the PSTN (public switched telephone network - wired and wireless), SCADA (supervisory control and data acquisition) networks for managing the power grid and other critical infrastructure, sensor networks, mobile ad hoc networks, and military networks.
With this increasing dependence on Internet and the integration of services in it, increasingly severe consequences come from the disruption of networked services. Life of individuals and the quality of life, the economic viability of businesses and organizations, and the security of nations are directly linked to the resilience, survivability, and dependability of the Global Internet.
Ironically, the increased dependence and sophistication of services make the Internet more vulnerable to problems. Mobile wireless Internet access is more susceptible to the challenges of dynamicity, weakly connected channels, and unpredictable delay. The Internet is an increasingly attractive target to recreational crackers, industrial espionage, terrorists, and information warfare.
It is also generally recognized that the Internet has evolved over many years without the resilience, manageability, and security needed for the future. Enhancements to the existing Internet infrastructure are hampered by the need for backward compatibility, and this in turn has resulted in important, yet isolated, tweaks to particular parts of the infrastructure, such as the optical ring restoration mechanisms. There has been very little research on a systematic approach to Internet resilience.
We propose a fundamentally new architectural approach to Internet resilience that is multilevel and systematic. At the same time, we aim to maximize interoperability with legacy network components.
Resilience is defined as the ability of the network to provide and maintain an acceptable level of service in the face of various faults and challenges to normal operation. This service includes the ability for users and applications to access information when needed (e.g., Web browsing and sensor monitoring), the maintenance of end-to-end communication association (e.g., tele- and video conferences), and the operation of distributed processing and networked storage. The challenges that may impact normal operation include:
· unintentional misconfiguration or operational mistakes
· large-scale natural disasters (e.g., hurricanes, earthquakes, ice storms, tsunami, floods)
· malicious attacks from intelligent adversaries against the network hardware, software, or
protocol infrastructure including DDoS (distributed denial of service) attacks
· environmental challenges of mobility, weak channels, and unpredictably long delay
· unusual but legitimate traffic load such as a flash crowds
Our definition of resilience is therefore a superset of commonly used definitions for survivability, dependability, and fault tolerance.
Resilience aim can be generally achieved via a six-step strategy, which could be neatly described with the help of a castle analogy:
· Defence, according to which the Internet is made robust to challenges and attacks (analogy:
strong castle wall);
· Detection of an adverse event or challenge that has impaired normal operation of the Internet
and degraded services (analogy: guards on the castle wall);
· Remediation in which action is autonomously taken to continue operations as much as
possible and to mitigate the damage (analogy: boiling oil and fortification of internal walls when
the castle wall is breached by a trebuchet);
· Recovery to original normal operations once the adverse event has ended or the attacker has
been repelled (analogy: cleaning up the oil and repairing the hole in the castle wall);
· Diagnose the root cause of the challenge that impaired normal operation. This could be used
to improve the system design and effect the recovery to a better state (analogy: determine the
way in which enemy soldiers entered the inner walls of the castle); and
· Refinement of future behavior based on reflection of the previous cycle (analogy: construction
of a thicker wall that will defend against current and predicted trebuchet technology).
This high-level model can then be applied to particular contexts of network design, such as routing and end-to-end protocols, resulting in particular mechanisms that address specific challenges, both being a subset of the aforementioned categorizations.
In ResumeNet, besides detailing and quantifying the aforementioned framework, we investigated particular mechanisms that can be viewed as its building blocks (monitoring, learning processes, decision engines). It is, in fact, the synthesis of these blocks that will enforce resilience to the various network layers.
Last, but not least, the project picks particular network-level and service provision scenarios for deepening into the mechanism-level analysis and carrying out their experimental evaluation. The scenarios to be implemented on top of existing to-be-enhanced test beds are a well-balanced mix of networking scenarios with both short-term and longer-term potential for commercial exploitation.