A Survey on Research on the Application-Layer Traffic Optimization
(ALTO) Problem
Bell Labs, Alcatel-Lucentrimac@bell-labs.comBell Labs, Alcatel-Lucentvolkerh@bell-labs.comBell Labs, Alcatel-Lucentmarco.tomsu@alcatel-lucent.comBell Labs, Alcatel-Lucentvkg@bell-labs.comTelecom Italiaenrico.marocco@telecomitalia.it
A significant part of the Internet traffic today is generated by
peer-to-peer (P2P) applications used traditionally for
file-sharing, and more recently for real-time communications and
live media streaming. Such applications discover a route to each
other through an overlay network with little knowledge of the
underlying network topology. As a result, they may choose peers
based on information deduced from empirical measurements, which
can lead to suboptimal choices. This document, a product of the
P2P Research Group, presents a survey of existing literature on
discovering and using network topology information for
application-layer traffic optimization.
A significant part of today's Internet traffic is generated by
peer-to-peer (P2P) applications, used originally for file
sharing, and more recently for realtime multimedia
communications and live media streaming. P2P applications are
posing serious challenges to the Internet infrastructure; by
some estimates, P2P systems are so popular that they make up
anywhere between 40% to 85% of the entire Internet traffic , , , , , .
P2P systems ensure that popular content is replicated at
multiple instances in the overlay. But perhaps ironically, a
peer searching for that content may ignore the topology of the
latent overlay network and instead select among available
instances based on information it deduces from empirical
measurements, which, in some particular situations may lead to
suboptimal choices. For example, a shorter round-trip time
estimation is not indicative of the bandwidth and reliability of
the underlying links, which have more of an influence than delay
for large file transfer P2P applications.
Most distributed hash tables (DHT) -- the data structure that
imposes a specific ordering for P2P overlays -- use greedy
forwarding algorithms to reach their destination, making locally
optimal decisions that may not turn to be globally optimized
. This naturally leads to the
Application-Layer Traffic Optimization (ALTO) problem : how to best provide the topology of the
underlying network while at the same time allowing the
requesting node to use such information to effectively reach the
node on which the content resides. Thus, it would appear that
P2P networks with their application layer routing strategies
based on overlay topologies are in direct competition against
the Internet routing and topology.
One way to solve the ALTO problem is to build distributed
application-level services for location and path selection , , , , , , in order to
enable peers to estimate their position in the network and to
efficiently select their neighbors. Similar solutions have been
embedded into P2P applications such as Azureus . A slightly different approach is to have
the Internet service provider (ISP) take a pro-active role in
the routing of P2P application traffic; the means by which this
can be achieved have been proposed ,
, .
There is an intrinsic struggle between the layers -- P2P overlay
and network underlay -- when performing the same service
(routing), however there are strategies to mitigate this
dichotomy .
This document, initially intended as a complement to RFC 5693 and discussed during the
creation of the IETF ALTO Working Group, has been completed and
refined in the IRTF P2P Research Group.
Gummadi et al. compare popular DHT
algorithms and besides analyzing their resilience, provide an
accurate evaluation of how well the logical overlay topology
maps on the physical network layer. In their paper, relying
only on measurements independently performed by overlay nodes
without the support of additional location information provided
by external entities, they demonstrate that the most efficient
algorithms in terms of resilience and proximity performance are
those based on the simplest geometric concept (i.e. the ring
geometry, rather than hypercubes, tree structures and butterfly
networks).
Regardless of the geometrical properties of the distributed data
structures involved, interactions between application-layer
overlays and the underlying networks are a rich area of
investigation. The available literature in this field can be
taxonomixed in two categories (): using
application-level techniques to estimate topology and through
some kind of layer cooperation.
Estimating network topology information on the application
layer has been an area of active research. Early systems used
triangulation techniques to bound the distance between two
hosts using a common landmark host. In such a technique, given
a cost function C, a set of vertexes V and their corresponding
edges, the triangle inequality holds if for any triple {a, b,
c} in V, C(a, c) is always less than or equal to C(a, g) +
C(b, c). The cost function C could be expressed in terms of
desirable metrics such as bandwidth or latency.
We note that the techniques presented in this section are only
representative of the sizable research in this area. Rather
than trying to enumerate an exhaustive list, we have chosen
certain techniques because they represent and advance in the
area that further led to derivative works.
Francis et al. proposed IDMaps, a system where one or more
special hosts called tracers are deployed near an autonomous
system. The distance between hosts A and B is estimated as the
cumulative distance between A and its nearest tracer Ta, plus
the distance between B and its nearest tracer Tb, plus the
shortest distance from Ta to Tb. To aid in scalability beyond
that provided by the client-server design of IDMaps, Ng et
al. proposed a P2P-based global network
positioning (GNP) architecture. GNP was a network
coordinate system based on absolute coordinates computed from
modeling the Internet as a geometric space. It proposed a
two-part architecture: in the first part, a small set of
finite distributed hosts called landmarks compute their own
coordinates in a fixed geometric space. In the second part, a
host wishing to participate computes its own coordinates
relative to those of the landmark hosts. Thus, armed with the
computed coordinates, hosts can then determine interhost
distance as soon as they discover each other.
Both IDMaps and GNP require fixed network infrastructure
support in the form of tracers or landmark hosts; this often
introduces a single point of failure and inhibits
scalability. To combat this, new techniques were developed
that embedded the network topology in a low-dimensional
coordinate space to enable network distance estimation through
vector analysis. Costa et al. introduced Practical Internet Coordinates
(PIC). While PIC used the notion of landmark hosts, it
did not require explicit network support to designate specific
landmark hosts. Any node whose coordinates have been computed
could act as a landmark host. When a node joined the system,
it probed the network distance to some landmark hosts. Then,
it obtained the coordinates of each landmark host and computed
its own coordinates relative to the landmark host, subject to
the constraint of minimizing the error in the predicted
distance and computed distance.
Like PIC, Vivaldi proposed a
fully distributed network coordinate systems without any
distinguished hosts. Whenever a node A communicates with
another node B, it measures the round trip time (RTT) to that
node and learns that node's current coordinates. A
subsequently adjusts its coordinates such that it is closer
to, or further from B by computing new coordinates that
minimize the squared error. A Vivaldi node is thus constantly
adjusting it's position based on a simulation of
interconnected mass springs. Vivaldi is now being used in the
popular P2P application Azureus and studies indicate that it
scales well to very large networks .
Network coordinate systems require the embedding of the
Internet topology into a coordinate system. This is not always
possible without errors, which impacts the accuracy of
distance estimations. In particular, it has proved to be
difficult to embed the triangular inequalities found in
Internet path distances . Thus,
Meridian abandons the generality
of network coordinate systems and provides specific distance
evaluation services. In Meridian, each node keeps track of
small fixed number of neighbors and organizes them in
concentric rings, ordered by distance from the node. Meridian
locates the closest node by performing a multi-hop search
where each hop exponentially reduces the distance to the
target. Although less general than virtual coordinates,
Meridian incurs significantly less error for closest node
discovery.
The Ono project takes a different
approach and uses network measurements from
content-distribution network (CDN) like Akamai to find nearby
peers. Used as a plugin to the Azureus BitTorrent client, Ono
provides 31% average download rate improvement .
Comparison of application-level topology estimation
techniques, as reported in literature. Results in terms of
number of (D)imensions and (L)andmarks, 90th percentile
relative error.
GNP vs. IDMaps(a) (7D, 15L)PIC(b) vs. GNP (8D, 16L)Vivaldi vs. GNP (2D, 32L)Meridian vs. GNP (8D, 15L)GNP: 0.50, IDMaps: 0.97PIC: 0.38, GNP: 0.37Vivaldi: 0.65, GNP: 0.65Meridian: 0.78, GNP: 1.18
(a) Does not use dimensions or landmarks. (b) Using results
from the hybrid strategy for PIC.
summarizes the application-level
topology estimation techniques. The salient performance metric
is the relative error. While all approaches define this metric
a bit differently, it can be generalized as how close a
predicted distance comes to the corresponding measured
distance. A value of zero implies perfect prediction and a
value of 1 implies that the predicted distance is in error by
a factor of two. PIC, Vivaldi, and Meridian compare their
results with that of GNP, while GNP itself compares its
results with a precursor technique, IDMaps. Because each of
the techniques uses a different Internet topology and a
varying number of landmarks and dimensions to interpret the
data set, it is impossible to normalize the relative error
across all techniques uniformly. Thus we present the relative
error data in pairs, as reported in the literature describing
the specific technique. Readers are urged to compare the
relative error performance in each column on its own and not
draw any conclusions by comparing the data across columns.
Most of the work on estimating topology information focuses on
predicting network distance in terms of latency and does not
provide estimates for other metrics such as throughput or
packet loss rate. However, for many P2P applications latency
is not the most important performance metric and these
applications could benefit from a richer information plane.
Sophisticated methods of active network probing and passive
traffic monitoring are generally very powerful and can
generate network statistics indirectly related to performance
measures of interest, such as delay and loss rate on
link-level granularity. Extraction of these hidden attributes
can be achieved by applying statistical inference techniques
developed in the field of inferential network monitoring or
network tomography subsequent to sampling of the network
state. Thus, network tomography enables the extraction of a
richer set of topology information, but at the same time
inherently increasing complexity of a potential information
plane and introducing estimation errors. For both active and
passive methods statistical models for the measurement,
process need to be developed and the spatial and temporal
dependence of the measurements should be assessed. Moreover,
measurement methodology and statistical inference strategy
must be considered jointly. For a deeper discussion of
network tomography and recent developments in the field we
refer the reader to .
One system providing such a service is iPlane, which aims at creating a
annotated atlas of the Internet that contains information
about latency, bandwidth, capacity and loss rate. To
determine features of the Internet topology, iPlane bridges
and builds upon different ideas, such as active probing based
on packet dispersion techniques to infer available bandwidth
along path segments. These ideas are drawn from different
fields, including network measurement as described by Dovrolis
et al. in and network tomography.
Instead of estimating topology information on the application
level through distributed measurements, this information could
be provided by the entities running the physical networks --
usually ISPs or network operators. In fact, they have full
knowledge of the topology of the networks they administer and,
in order to avoid congestion on critical links, are interested
in helping applications to optimize the traffic they generate.
The remainder of this section briefly describes three recently
proposed solutions that follow such an approach to address the
ALTO problem.
The architecture proposed by Xie et al. have been adopted by the DCIA P4P working
group , an open group established by
ISPs, P2P software distributors and technology researchers
with the dual goal of defining mechanisms to accelerate
content distribution and optimize utilization of network
resources.
The main role in the P4P architecture is played by servers
called ``iTrackers'', deployed by network providers and
accessed by P2P applications (or, in general, by elements of
the P2P system) in order to make optimal decisions when
selecting a peer to connect. An iTracker may offer three
interfaces:
Info: Allows P2P elements (e.g. peers or trackers) to
get opaque information associated to an IP address.
Such information is kept opaque to hide the actual
network topology, but can be used to compute the network
distance between IP addresses.
Policy: Allows P2P elements to obtain policies and
guidelines of the network, which specify how a network
provider would like its networks to be utilized at a
high level, regardless of P2P applications.
Capability: Allows P2P elements to request network
providers' capabilities.
The P4P architecture is under evaluation with simulations,
experiments on the PlanetLab distributed testbed and in
field tests with real users. Initial simulations and
PlanetLab experiments results
indicate that improvements in BitTorrent download completion
time and link utilization in the range of 50-70% are
possible. Results observed on Comcast's network during a
field test trial conducted with a modified version of the
software used by the Pando content delivery network
(documented in RFC 5632) show
average improvements in download rate in different scenarios
varying between 57% and 85%, and a 34% to 80% drop in the
cross-domain traffic generated by such application.
In the general solution proposed by Aggarwal et al. , network providers host servers,
called "oracles", that help P2P users choose optimal
neighbours.
The mechanism is fairly simple: a P2P user sends the list of
potential peers to the oracle hosted by its ISP, which ranks
such a list based on its local policies. For instance, the
ISP can prefer peers within its network, to prevent traffic
from leaving its network; further, it can pick higher
bandwidth links, or peers that are geographically closer.
Once the application has obtained an ordered list, it is up
to it to establish connections with a number of peers it can
individually choose, but it has enough information to
perform an optimal choice.
Such a solution has been evaluated with simulations and
experiments run on the PlanetLab testbed and the results
show both improvements in content download time and a
reduction of overall P2P traffic, even when only a subset of
the applications actually query the oracle to make their
decisions.
The solution proposed by Saucez et al. is essentially a modified version of the
oracle-based approach described in ,
intended to provide a network-layer service for finding best
source and destination addresses when establishing a
connection between two endpoints in multi-homed environments
(which are common in IPv6 networking). Peer selection
optimization in P2P systems -- the ALTO problem in today's
Internet -- can be addressed by the IDIPS solution as a
specific sub-case where the options for the destination
address consist of all the peers sharing a desired resource,
while the choice of the source address is fixed. An
evaluation performed on IDIPS shows that costs for both
providing and accessing the service are negligible.
The application-level techniques described in Section provide tools for peer-to-peer applications
to estimate parameters of the underlying network
topology. Although these techniques can improve application
performance, there are limitations of what can be achieved by
operating only on the application level.
Topology estimation techniques use abstractions of the network
topology which often hide features that would be of interest to
the application. Network coordinate systems, for example, are
unable to detect overlay paths shorter than the direct path in
the Internet topology. However, these paths frequently exist in
the Internet . Similarly,
application-level techniques may not accurately estimate
topologies with multipath routing.
When using network coordinates to estimate topology information
the underlying assumption is that distance in terms of latency
determines performance. However, for file sharing and content
distribution applications there is more to performance than just
the network latency between nodes. The utility of a long-lived
data transfer is determined by the throughput of the underlying
TCP protocol, which depends on the round-trip time as well as
the loss rate experienced on the corresponding path . Hence, these applications benefit from a
richer set of topology information that goes beyond latency
including loss rate, capacity, available bandwidth.
Some of the topology estimation techniques used by P2P
applications need time to converge to a result. For example,
current BitTorrent clients implement local, passive traffic
measurements and a tit-for-tat bandwidth reciprocity mechanism
to optimize peer selection at a local level. Peers eventually
settle on a set of neighbors that maximizes their download rate
but because peers cannot reason about the value of neighbors
without actively exchanging data with them and the number of
concurrent data transfers is limited (typically to 5-7),
convergence is delayed and easily can be sub-optimal.
Skype's P2P VoIP application chooses a relay node in cases where
two peers are behind NATs and cannot connect directly. Ren et
al. measured that the relay selection
mechanism of Skype is (1) not able to discover the best possible
relay nodes in terms of minimum RTT (2) requires a long setup
and stabilization time, which degrades the end user experience
(3) is creating a non-negligible amount of overhead traffic due
to probing a large number of nodes. They further showed that the
quality of the relay paths could be improved when the underlying
network AS topology is considered.
Some features of the network topology are hard to infer through
application-level techniques and it may not be possible to infer
them at all. An example for such a features are service provider
policies and preferences such as the state and cost associated
with interdomain peering and transit links. Another example is
the traffic engineering policy of a service provider, which may
counteract the routing objective of the overlay network leading
to a poor overall performance .
Finally, application-level techniques often require applications
to perform measurements on the topology. These measurements
create traffic overhead, in particular, if measurements are
performed individually by all applications interested in
estimating topology.
Beyond a significant amount of research work on the topic, we
believe that there are sizable open issues to address in an
infrastructure-based approach to traffic optimization. The
following is not an exhaustive list, but a representative sample
of the pertinent issues.
Despite the many solutions that have been proposed for
providing applications with topology information in a fully
distributed manner, there is currently an ongoing debate in
the research community whether such solutions should focus on
estimating nodes' coordinates or path latencies. Such a
debate has recently been fed by studies showing that the
triangle inequality on which coordinate systems are based is
often proved false in the Internet .
Proposed systems following both approaches -- in particular,
Vivaldi and PIC following the former, Meridian and iPlane the
latter -- have been simulated, implemented and studied in
real-world trials, each one showing different points of
strength and weaknesses. Concentrated work will be needed to
determine which of the two solutions will be conducive to the
{ALTO} problem.
Another open issue common in most distributed environments
consisting of a large number of peers is the resistance
against malicious nodes. Security mechanisms to identify
misbehavior are based on triangle inequality checks , which however tend to fail and thus return
false positives in presence of measurement inaccuracies
induced, for example, by traffic fluctuations that occur quite
often in large networks . Beyond the
issue of using triangle inequality checks, authoritatively
authenticating the identity of an oracle, and preventing an
oracle from attacks are also important. Exploration of
existing techniques -- such as public key infrastructure or
identity-based encryption for authenticating the identity and
the use of secure multi-party computation techniques to
prevent an oracle from collusion attacks -- need to be studied
for judicious use in {ALTO}-type of solutions.
Similarly, even in controlled architectures deployed by
network operators where system elements may be authenticated
, ,, it is still possible that the information
returned to applications is deliberately altered, for example,
assigning higher priority to cheap (monetary-wise) links
instead of neutrally applying proximity criteria. What are
the effects of such deliberate alterations if multiple peers
collude to determine a different route to the target, one that
is not provided by an oracle? Similarly, what are the
consequences if an oracle targets a particular node in another
AS by redirecting an inordinate number of querying peers to it
causing, essentially, a DDoS attack on the node? Furthermore,
does an oracle broadcast or multicast a response to a query?
If so, techniques to protect the confidentiality of the
multi-cast stream will need to be investigated to thwart
``free riding'' peers.
Many systems already use RTT to account for delay when
establishing connections with peers (e.g., CAN, Bamboo). An
operator can provide not only the delay metric but other
metrics that the peer cannot figure out on its own. These
metrics may include the characteristics of the access links to
other peers, bandwidth available to peers (based on operator's
engineering of its network), network policies, and preferences
such as state and cost associated with intradomain peering
links, and so on. Exactly what kinds of metrics can an
operator provide to stabilize the network throughput will also
need to be investigated.
It is conceivable that P2P users may not be comfortable with
operator intervention to provide topology information. To
eliminate this intervention, alternative schemes to estimate
topological distance can be used. For instance, Ono uses
client redirections generated by Akamai CDN servers as an
approximation for estimating distance to peers; Vivaldi, GNP
and PIC use synthetic coordinate systems. A neutral
third-party can make available a hybrid layer cooperation
service -- without the active participation of the ISP -- that
uses alternative techniques discussed in to create a toplogical map. This map can
be subsequently used by a subset of users who may not trust
the ISP.
The literature presented in shows that
a certain level of locality-awareness in the peer selection
process of P2P algorithms is usually beneficial to the
application performance. However, an excessive localization
of the traffic might cause partitioning in the overlay
interconnecting peers, which will negatively affect the
performance experienced by the peers themselves.
Finding the right balance between localization and randomness
in peer selection is an open issue. At the time of writing, it
seems that different applications have different levels of
tolerance and should be addressed separately. Le Blond et
al. have studied the specific case of
BitTorrent, proposing a simple mechanism to prevent
partitioning in the overlay, yet reaching a high level of
cross-domain traffic reduction without adversely impacting
peers.
This draft is a survey of existing literature on topology
estimation. As such, it does not introduce any new security
considerations to be taken in account beyond what is already
discussed in each paper surveyed.
This document is a derivative work of a position paper submitted
at the IETF RAI area/MIT workshop held on May 28th, 2008 on the
topic of Peer-to-Peer Infrastructure (P2Pi). The article "A
Survey of Research on the Application-Layer Traffic Optimization
Problem and the Need for Layer Cooperation" appeared on IEEE
Communications Magazine, Vol. 47, No. 8 was also partially
derived from the same paper. The authors thank profusely Arnaud
Legout and the many people that have participated in discussions
and provided insightful feedback at any stage of this work.
The State of the Internet, Part 3
Is P2P dying or just hiding?
Controlling P2P traffic
Peer to peer network traffic may account for up to 85% of Internetâ??s bandwidth usage
The true picture of peer-to-peer filesharing
P2P fuels global bandwidth binge
The impact of DHT routing geometry on resilience and proximity
Application-Layer Traffic Optimization (ALTO) Problem
Statement
IDMaps: A global Internet host distance estimation service
Predicting internet network distance with coordinates-based approaches
Vivaldi: A Decentralized Network Coordinate System
PIC: Practical Internet coordinates for distance estimation
Meridian: A lightweight network location service without virtual coordinates
iPlane: an information plane for distributed services
Azureus BitTorrent Client
P4P: Explicit Communications for Cooperative Control Between P2P and Network Providers
Can ISPs and P2P systems co-operate for improved performance?
Preemptive Strategies to Improve Routing Performance of
Native and Overlay Layers
Network Coordinates in the Wild
Northwestern University Ono Project
Drafting behind Akamai (travelocity-based detouring)
Towards Network Triangle Inequality Violation Aware Distributed Systems
What do packet dispersion techniques measure?
Internet Tomography
DCIA P4P Working group
Comcast's ISP Experiences in a Proactive Network Provider
Participation for P2P (P4P) Technical Trial
Implementation and Preliminary Evaluation of an ISP-Driven Informed Path Selection
Modeling TCP throughput: A simple model and its empirical validation
ASAP: An AS-aware peer-relay protocol for high quality VoIP
Pushing BitTorrent Locality to the Limit