Internet Engineering Task Force A. Charny Internet-Draft Cisco Systems Intended status: Informational F. Huang Expires: April 29, 2010 Huawei Technologies G. Karagiannis U. Twente M. Menth University of Wuerzburg T. Taylor, Ed. Huawei Technologies October 26, 2009 PCN Boundary Node Behaviour for the Controlled Load (CL) Mode of Operation draft-ietf-pcn-cl-edge-behaviour-01 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 29, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Charny, et al. Expires April 29, 2010 [Page 1] Internet-Draft PCN CL Boundary Node Behaviour October 2009 Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract Precongestion notification (PCN) is a means for protecting quality of service for inelastic traffic admitted to a Diffserv domain. The overall PCN architecture is described in RFC 5559. This memo is one of a series describing possible boundary node behaviours for a PCN domain. The behaviour described here is that for three-state measurement-based load control, known informally as Controlled Load (CL). Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2. Assumed Core Network Behaviour for CL . . . . . . . . . . . . 4 3. Node Behaviours . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Behaviour of the PCN-Egress-Node . . . . . . . . . . . . . 6 3.2.1. Flow Admission . . . . . . . . . . . . . . . . . . . . 6 3.2.2. Flow Termination . . . . . . . . . . . . . . . . . . . 7 3.2.3. Reporting the PCN Data . . . . . . . . . . . . . . . . 8 3.3. Behaviour at the Decision Point . . . . . . . . . . . . . 8 3.3.1. Flow Admission . . . . . . . . . . . . . . . . . . . . 8 3.3.2. Flow Termination . . . . . . . . . . . . . . . . . . . 8 3.4. Behaviour of the Ingress Node . . . . . . . . . . . . . . 9 4. Identifying Ingress-Egress-Aggregates and Their Edge Points . 9 5. Specification of Diffserv Per-Domain Behaviour . . . . . . . . 10 5.1. Applicability . . . . . . . . . . . . . . . . . . . . . . 10 5.2. Technical Specification . . . . . . . . . . . . . . . . . 10 5.3. Attributes . . . . . . . . . . . . . . . . . . . . . . . . 10 5.4. Parameters . . . . . . . . . . . . . . . . . . . . . . . . 10 5.5. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 10 5.6. Example Uses . . . . . . . . . . . . . . . . . . . . . . . 11 5.7. Environmental Concerns . . . . . . . . . . . . . . . . . . 11 5.8. Security Considerations . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Charny, et al. Expires April 29, 2010 [Page 2] Internet-Draft PCN CL Boundary Node Behaviour October 2009 1. Introduction The objective of Pre-Congestion Notification (PCN) is to protect the quality of service (QoS) of inelastic flows within a Diffserv domain, in a simple, scalable, and robust fashion. Two mechanisms are used: admission control, to decide whether to admit or block a new flow request, and (in abnormal circumstances) flow termination to decide whether to terminate some of the existing flows. To achieve this, the overall rate of PCN-traffic is metered on every link in the domain, and PCN-packets are appropriately marked when certain configured rates are exceeded. These configured rates are below the rate of the link thus providing notification to boundary nodes about overloads before any congestion occurs (hence the "pre" part of pre- congestion notification). The level of marking allows boundary nodes to make decisions about whether to admit or terminate. For more details see [RFC5559]. Boundary node behaviours specify a detailed set of algorithms and edge node behaviours used to implement the PCN mechanisms. Since the algorithms depend on specific metering and marking behaviour at the interior nodes, it is also necessary to specify the assumptions made about interior node behaviour. Finally, because PCN uses DSCP values to carry its markings, a specification of boundary node behaviour must include the per domain behaviour (PDB) template specified in [RFC3086], filled out with the appropriate content. The present document accomplishes these tasks for the controlled load (CL) mode of operation. Some aspects of this specification are necessary for interoperability, while others are simply suggestions. This document attempts to make the distinction as it proceeds. 1.1. Terminology RFC 2119 requirements language does not seem appropriate for an Informational document. This document uses three levels of requirement: o "must" applies to requirements that affect the integrity of operation of the complete system; o "recommended" applies to procedures that appear to give superior results at time of writing, but may be replaced by other procedures directed to the same objective without affecting the integrity of operation of the complete system; o "suggested" applies to procedures that are not seen as superior at time of writing, but appear to be valid approaches for meeting a Charny, et al. Expires April 29, 2010 [Page 3] Internet-Draft PCN CL Boundary Node Behaviour October 2009 particular objective. In addition to the terms defined in [RFC5559], this document uses the following terms: decision point The node that makes the decision about which flows to admit and to terminate. In a given network deployment, this may be the ingress node or a centralized control node. Of course, regardless of the location of the decision point, the ingress node is the point where the decisions are enforced. PCN-admission-state The state ("admit" or "block") derived by the PCN-egress-node for a given ingress-egress-aggregate based on PCN packet marking statistics. The decision point decides to admit or block new flows offered to the aggregate based on the current value of the PCN-admission-state. For further details see Section 3.2.1 and Section 3.3.1. Congestion level estimate (CLE) A value derived from the measurement of PCN packets received at a PCN-egress-node for a given ingress-egress-aggregate, representing the ratio of marked to total PCN traffic (measured in octets) over a short period. This specification suggests that the CLE be calculated as an exponentially weighted moving average of the ratios observed in successive fixed-length measurement intervals, but the exact algorithms used are not critical to interoperability. For further details see Section 3.2.1. Admission decision threshold A fractional value to which the CLE is compared to determine the PCN-admission-state. If the CLE is below the admission decision threshold the PCN-admission-state is set to "admit". If the CLE is above the admission decision threshold the PCN-admission-state is set to "block". For further details see Section 3.2.1. 2. Assumed Core Network Behaviour for CL This section describes the assumed behaviour for nodes of the PCN- domain when acting in their role as PCN-interior-nodes. The CL mode of operation assumes that: o encoding of PCN status within individual packets is based on [ID.PCN-baseline], extended to provide a third PCN encoding state. Possible extensions for this purpose are documented in [ID.PCN3state] or alternatively [ID.PCN3in1]; Charny, et al. Expires April 29, 2010 [Page 4] Internet-Draft PCN CL Boundary Node Behaviour October 2009 o the domain satisfies the conditions specified in the applicable encoding extension document; o each link has been configured with a PCN-threshold-rate having a value equal to the PCN-admissible-rate for the link; o each link has been configured with a PCN-excess-rate having a value equal to the PCN-supportable-rate for the link; o PCN-interior-nodes perform threshold-marking and excess-traffic- marking of packets according to the rules specified in [ID.PCN-marking], and any additional rules specified in the applicable encoding extension document; According to [ID.PCN-baseline], the encoding extension documents should specify the allowable transitions between marking states. However, to be absolutely clear, these allowable transitions are specified here. At any interior node, the only permitted transitions are these: o a PCN packet which is not-marked (NM) MAY be threshold-marked (ThM) or excess-traffic-marked (ETM); o a PCN packet which is threshold-marked (ThM) MAY be excess- traffic-marked (ETM). An interior node MUST NOT re-mark a packet from PCN to non-PCN, or vice versa. 3. Node Behaviours 3.1. Overview The Controlled Load (CL) mode of operation supports flow admission based on the ratio of threshold-marked to total PCN-traffic observed by the PCN-egress-node (the congestion level estimate, see Section 1.1) for each ingress-egress-aggregate. The PCN-egress-node reports the latest value of the PCN-admission-state to the decision point at regular intervals. The decision point decides to admit or block new PCN flows offered to a given ingress-egress-aggregate based on the PCN-admission-state. Flow termination is triggered when the PCN-egress-node observes excess-traffic-marked packets within a given ingress-egress- aggregate. When this happens, the PCN-egress-node reports not only the PCN-admission-state, but also an estimate of the current edge-to- edge supportable rate of PCN traffic. The decision point interprets Charny, et al. Expires April 29, 2010 [Page 5] Internet-Draft PCN CL Boundary Node Behaviour October 2009 the presence of this rate in the report as an indication that flow termination is required. The amount of traffic to be terminated is calculated as the difference between the rate of admitted PCN traffic measured at the PCN-ingress-node and the estimated supportable rate. The decision point selects previously-admitted flows within the affected ingress-egress-aggregate for termination until no more excess-marked packets are observed at the PCN-egress-node. When Equal Cost Multipath (ECMP) routing is operating in the network, it is possible that some flows within a given ingress-egress- aggregate pass through the bottleneck that is resulting in excess- traffic-marking, while others do not. To ensure that the right set of flows is terminated, the PCN-egress-node supplies a list of excess-traffic-marked flows along with its estimate of the edge-to- edge supportable PCN traffic rate. The decision point should look first to this list when deciding which flows to terminate. 3.2. Behaviour of the PCN-Egress-Node The PCN-egress-node must meter received PCN traffic in order to derive periodically the following rates for each ingress-egress- aggregate passing through it: o NM-rate: octets per second of PCN traffic in PCN-unmarked packets; o ThM-rate: octets per second of PCN traffic in PCN-threshold-marked packets; o ETM-rate: octets per second of PCN traffic in PCN-excess-marked packets. This specification recommends that the interval between calculation of these quantities be in the range of 100 to 500ms to provide a reasonable tradeoff between signalling demands on the network and the time taken to react to impending congestion. This specification suggests that PCN-traffic be metered continuously, that the counts of the number of octets of PCN traffic needed to calculate the above rates be accumulated continuously throughout the interval, and that the intervals themselves be of equal length, to minimize the statistical variance introduced by the measurement process itself. 3.2.1. Flow Admission Each time the egress node has calculated the rates listed above, the egress node must calculate a ratio R of marked to total traffic. If all of the rates are zero for the interval, the ratio R must be set Charny, et al. Expires April 29, 2010 [Page 6] Internet-Draft PCN CL Boundary Node Behaviour October 2009 to zero. Otherwise, the egress node must calculate the ratio as: R = (ThM-rate + ETM-rate) / (NM-rate + ThM-rate + ETM-rate). The egress node must then use this ratio to update a congestion level estimate (CLE, see Section 1.1). Exponential smoothing is suggested for this purpose, so that updated CLE = w*R + (1-w)*previous CLE. The value of w depends on the length of the measurement interval: for the equivalent system memory, a shorter interval calls for a smaller smoothing constant. Simulation results ([I-D.babiarz-pcn-explicit-marking], [I-D.zhang-pcn-performance-evaluation]) show that the effectiveness of PCN is not sensitive to the specific value of w used. The egress node now compares the updated CLE against a decision threshold. If the CLE is less than the threshold, the PCN-admission- state for the ingress-egress-aggregate is determined to be "admit", otherwise it is determined to be "block". Simulation results ([I-D.zhang-pcn-performance-evaluation] and [Menth08f]) show that the process is also not sensitive to the value of the decision threshold. 3.2.2. Flow Termination If the PCN-egress-node observes any excess-traffic-marked packets for a given ingress-egress-aggregate, it must immediately set aside the measurements it has accumulated and begin a new measurement interval. In addition, it must send a report to the decision point indicating that the PCN-admission-state is "block" If the measurement data being set aside represent more than half of the normal calculation interval, it is suggested that the PCN- egress-node perform the end-of-interval calculations described in the previous section and report the PCN-admission-state value thus obtained before starting the new interval. Restarting the measurement interval ensures that the estimate of edge-to-edge supportable rate described below is not biased by lower PCN traffic rates prevailing before the onset of excess- traffic-marking. For subsequent intervals, as long as the calculated rate of excess- marked traffic is greater than zero, the PCN-egress-node must Charny, et al. Expires April 29, 2010 [Page 7] Internet-Draft PCN CL Boundary Node Behaviour October 2009 calculate and report to the decision point an estimate of the edge- to-edge supportable traffic rate for PCN traffic as well as the latest PCN-admission-state. It is recommended that this estimate be calculated as the sum: NM-rate + ThM-rate. In networks with multipath routing (e.g., ECMP in IP networks), the PCN-egress-node records flow identifiers of the individual flows for which excess-marked packets have been observed. These will be used by the decision point when it selects flows for termination. 3.2.3. Reporting the PCN Data The PCN-egress-node must report the latest value of the PCN- admission-state to the decision point each time it calculates it. If flow termination is required (because PCN-excess-marked packets have been observed), the egress node must also report the estimate of the supportable edge-to-edge rate of PCN traffic calculated in the previous section. If so configured, the PCN-egress-node must also report the set of flow identifiers of flows for which excess-marked packets were observed during the calculation interval. 3.3. Behaviour at the Decision Point 3.3.1. Flow Admission When the decision point (e.g., the PCN-ingress-node) receives a report indicating that the PCN-admission-state for a given ingress- egress-aggregate is "admit", it admits new flows to that aggregate. When the decision point receives a report indicating that the PCN- admission-state for a given ingress-egress-aggregate is "block", it ceases to admit new flows to that aggregate. These actions may be modified by policy in specific cases. 3.3.2. Flow Termination When the report from the egress node includes an estimate of the edge-to-edge supportable PCN traffic rate for the given ingress- egress-aggregate, the decision point must fetch the rate at which PCN-traffic has been admitted to the aggregate from the PCN-ingress- node. If the rate of admitted traffic is greater than the estimate of the edge-to-edge supportable PCN traffic rate for the given ingress-egress-aggregate, the decision point must select flows to terminate using its knowledge of the bandwidth required by individual flows gained, e.g., from resource signalling, until it determines Charny, et al. Expires April 29, 2010 [Page 8] Internet-Draft PCN CL Boundary Node Behaviour October 2009 that the PCN traffic admission rate will no longer be greater than the estimated edge-to-edge supportable PCN traffic rate provided by the egress node. Flow termination may be spread out over multiple rounds to avoid over-termination. If this is done, it is recommended that enough time elapse between successive rounds of termination to allow the effects of previous rounds to be reflected in the measurements upon which the termination decisions are based (See [I-D.satoh-pcn-performance-termination] and sections 4.2 and 4.3 of [Menth08-sub-9].) If the egress node has supplied a list of flow identifiers (Section 3.2.2), the decision point first looks to terminate flows from that list. Flow selection may be guided by policy in specific cases. 3.4. Behaviour of the Ingress Node In a specific deployment, the PCN-ingress-node may be the decision point. If so, it carries out the procedures described in the previous section. Aside from those procedures, the PCN-ingress-node has the responsibility to provide the rate of admitted PCN traffic (octets per second) on a specific ingress-egress-aggregate when the decision point must determine how much flow to terminate in that aggregate. The rate that the PCN-ingress-node supplies may be based on a quick sample taken at the time the information is required. It is recommended that such a sample be based on observation of at least 30 PCN packets to achieve reasonable statistical reliability. 4. Identifying Ingress-Egress-Aggregates and Their Edge Points The operation of PCN depends on the ability of the ingress and egress nodes to identify the aggregate to which each flow belongs. The egress node also needs to associate an aggregate with the address of the ingress node for receiving reports, if the ingress node is the decision point. The means by which this is done depends on the packet routing technology in use in the network. In general, classification of individual packets at the ingress node (for enforcement and metering of admission rates) and at the egress node must use the content of the outer packet header. The process may well require configuration of routing information in the ingress and egress nodes. Some cases will be particularly challenging, as when a packet is carried by an Charny, et al. Expires April 29, 2010 [Page 9] Internet-Draft PCN CL Boundary Node Behaviour October 2009 MPLS tunnel through the ingress node to some node short of the egress node, and then turns into an ordinary IP packet. 5. Specification of Diffserv Per-Domain Behaviour This section provides the specification required by [RFC3086] for a per-domain behaviour. 5.1. Applicability This section draws heavily upon points made in the PCN architecture document, [RFC5559]. The PCN CL boundary node behaviour specified in this document is applicable to inelastic traffic (particularly video and voice) where quality of service for admitted flows is protected primarily by admission control at the ingress to the domain. In exceptional circumstances (e.g. due to network failures) already-admitted flows may be terminated to protect the quality of service of the remainder. The CL boundary node behaviour is less likely to terminate too many flows under such circumstances than the SM boundary node behaviour ([I-D.SM-edge-behaviour]). 5.2. Technical Specification The technical specification of the PCN CL per domain behaviour is provided by the contents of [RFC5559], [ID.PCN-baseline], [ID.PCN-marking], the specification of the encoding extension (e.g. [ID.PCN3state], [ID.PCN3in1]), and the present document. 5.3. Attributes TBD -- basically low loss, low jitter. Low delay would be nice but has to be quantified 5.4. Parameters TBD. Don't think RFC 3068 is looking for the list of configurable parameters given in the architecture document. 5.5. Assumptions Assumed that a specific portion of link capacity has been reserved for PCN traffic. Assumed that recovery from overloads by flow termination should happen within 1-3 seconds. Charny, et al. Expires April 29, 2010 [Page 10] Internet-Draft PCN CL Boundary Node Behaviour October 2009 5.6. Example Uses The PCN CL behaviour may be used to carry real-time traffic, particularly voice and video. 5.7. Environmental Concerns TBD 5.8. Security Considerations Please see the security considerations in Section 6 as well as those in [RFC2474] and [RFC2475]. 6. Security Considerations [RFC5559] provides a general description of the security considerations for PCN. This memo introduces no new considerations. 7. IANA Considerations This memo includes no request to IANA. 8. Acknowledgements The content of this memo bears a family resemblance to [ID.briscoe-CL]. The authors of that document were Bob Briscoe, Philip Eardley, and Dave Songhurst of BT, Anna Charny and Francois Le Faucheur of Cisco, Jozef Babiarz, Kwok Ho Chan, and Stephen Dudley of Nortel, Giorgios Karagiannis of U. Twente and Ericsson, and Attila Bader and Lars Westberg of Ericsson. Ruediger Geib, Philip Eardley, and Bob Briscoe have helped to shape the present document with their comments. 9. References 9.1. Normative References [ID.PCN-baseline] Moncaster, T., Briscoe, B., and M. Menth, "Baseline Encoding and Transport of Pre-Congestion Information (Work in progress)", September 2009. Charny, et al. Expires April 29, 2010 [Page 11] Internet-Draft PCN CL Boundary Node Behaviour October 2009 [ID.PCN-marking] Eardley, P., "Metering and marking behaviour of PCN-nodes (Work in progress)", August 2009. [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. [RFC5559] Eardley, P., "Pre-Congestion Notification (PCN) Architecture", RFC 5559, June 2009. 9.2. Informative References [I-D.SM-edge-behaviour] Charny, A., Zhang, J., Karagiannis, G., Menth, M., and T. Taylor, "PCN Boundary Node Behaviour for the Single Marking (SM) Mode of Operation (Work in progress)", October 2009. [I-D.babiarz-pcn-explicit-marking] Liu, X. and J. Babiarz, "Simulations Results for 3sM (expired Internet Draft)", July 2007. [I-D.satoh-pcn-performance-termination] Satoh, D., Ueno, H., and M. Menth, "Performance Evaluation of Termination in CL-Algorithm (Work in progress)", July 2009. [I-D.zhang-pcn-performance-evaluation] Zhang, X., "Performance Evaluation of CL-PHB Admission and Termination Algorithms (expired Internet Draft)", July 2007. [ID.PCN3in1] Briscoe, B., "PCN 3-State Encoding Extension in a single DSCP (Work in progress)", July 2009. [ID.PCN3state] Moncaster, T., Briscoe, B., and M. Menth, "A PCN encoding using 2 DSCPs to provide 3 or more states (Work in progress)", April 2009. [ID.briscoe-CL] Charny, et al. Expires April 29, 2010 [Page 12] Internet-Draft PCN CL Boundary Node Behaviour October 2009 Briscoe, B., "An edge-to-edge Deployment Model for Pre- Congestion Notification: Admission Control over a DiffServ Region (expired Internet Draft)", 2006. [Menth08-sub-9] Menth, M. and F. Lehrieder, "PCN-Based Measured Rate Termination", July 2009, . [Menth08f] Menth, M. and F. Lehrieder, "Performance Evaluation of PCN-Based Admission Control", in Proceedings of the 16th International Workshop on Quality of Service (IWQoS)", June 2008, . [RFC3086] Nichols, K. and B. Carpenter, "Definition of Differentiated Services Per Domain Behaviors and Rules for their Specification", RFC 3086, April 2001. Authors' Addresses Anna Charny Cisco Systems 300 Apollo Drive Chelmsford, MA 01824 USA Email: acharny@cisco.com Fortune Huang Huawei Technologies Section F, Huawei Industrial Base, Bantian Longgang, Shenzhen 518129 P.R. China Phone: +86 15013838060 Email: fqhuang@huawei.com Charny, et al. Expires April 29, 2010 [Page 13] Internet-Draft PCN CL Boundary Node Behaviour October 2009 Georgios Karagiannis U. Twente Phone: Email: karagian@cs.utwente.nl Michael Menth University of Wuerzburg Am Hubland Wuerzburg D-97074 Germany Phone: +49-931-888-6644 Email: menth@informatik.uni-wuerzburg.de Tom Taylor (editor) Huawei Technologies 1852 Lorraine Ave Ottawa, Ontario K1H 6Z8 Canada Phone: +1 613 680 2675 Email: tom111.taylor@bell.net Charny, et al. Expires April 29, 2010 [Page 14]