Package com.sapportals.wcm.util.uri
Contains interfaces and classes to handle uniform resource identifiers (uri).
See:
Description
|
Interface Summary |
| IHierarchicalUri |
A RFC2396 URI interface.
|
| IRidIterator |
This interface defines an iterator for a IRidList
Copyright (c) SAP AG 2001-2002 |
| IRidList |
This interface defines a list of RID instances
Copyright (c) SAP AG 2001-2002 |
| IRidSet |
This interface defines a set of RID instances
Copyright (c) SAP AG 2001-2002 |
| IUri |
A RFC2396 URI interface.
|
| IUriIterator |
This interface defines an iterator for a IUriList
Copyright (c) SAP AG 2001-2002 |
| IUriList |
This interface defines a list of URI instances
Copyright (c) SAP AG 2001-2002 |
| IUriReference |
A RFC2396 URI reference
interface. |
Package com.sapportals.wcm.util.uri Description
Contains interfaces and classes to handle uniform resource identifiers (uri).
Package Specification
The uri package handles the two worlds of RFC 2396 uris and WCM uris
and how they interact.
RFC 2396 Uris
Internet Uris are defined in
RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax.
This RFC defines the following terms which are explained here for quick reference:
- uri:
is a string of characters used as identifier for a resource.
RFC 2396 defines how such a string can be composed, broken into components
and combined again. The most important concept is that each uri has a scheme.
The scheme of an uri defines how the rest of the uri is to be interpreted.
Example: mailto:info@sapportals.com
- hierarchical uri:
is a special kind of uri which has hierarchical
components following the scheme. Basically the uri resembles a file system
path, separating individual components with a slash character. The hierarchical
component is often named path. The path often consists of a network component,
e.g. host name and port, and an absolute path component, starting with '/'.
"http" is probably the best known uri scheme which is hierarchical.
Thus, a http uri is a hierarchical uri.
Example: http://sapportals.com/news
- uri reference:
Is a reference to a resource or even to a certain part of a resource. A reference
is given inside one document (e.g. a HTML page) to identify (point to) another
resource. A reference uri can have a postfix, starting with the '#' character,
which identifies a certain part of a resource. This postfix is known as a
fragment identifier. The fragment identifier can be missing in a uri reference.
Only uri references have fragment identifiers. The uri always identifies
the complete resource.
Example: ../index/c.html#17
- absolute uri reference:
This is the simplest form of an uri reference. It consists of an uri with
an optional fragment identifier. An absolute uri reference stands for itself.
There is no context needed to interpret it.
Example: http://sapportals.com/news#top
- relative uri reference:
are incomplete uris with an optional fragment identifier. Often the scheme
and the network path are missing. Relative uri references need a context uri
to be converted to an absolute uri reference. Such a context is called a
base uri and the operation to convert a relative uri reference to
an absolute one is called resolving.
Example: news#top
- resolving uris:
Every uri reference can, together with a base uri, be resolved to an absolute
uri reference. When the reference is absolute, the resolving is the identity
operation. When the reference is relative, RFC 2396 explains in detail how
the operation is to be performed. Most often, the path component is relative
and resolving it is similar to converting a relative file name to an absolute
one.
Example: ../news#top with base: http://sapportals.com/products/ resolves
to http://sapportals.com/news#top
- uri encoding:
is a misleading term. What most people mean is that certain characters are
represented by their octet number(s) inside an uri. For example the space
character is represented by '%20' inside uris. But not only special
characters such as '?' and '#' need escaping, but also
all characters which are not representable in US-ASCII.
Coming back to the misleading part: "uri encoding" gives rise to the
assumption that there are also unencoded (decoded) uris. This is wrong.
When you convert a '%20' back to space, you do not have an uri any longer.
This stems from the fact that decoding the encoded characters is not always
reversible. Example: 'news%23top' is a valid uri reference without a fragment
identifier. When you decode the octets, you get the string 'news#top' which
also is an uri reference, but for a completely different resource! Even worse,
when you have 'news%23top#1' it decodes to the string 'news#top#1'. Now,
which uri reference was it: 'news#top%231', 'news%23top#1' or 'news%23top%231'?
Some words on encoded chars in URIs: the historical understanding was that
the character encoding to use is ISO-8859-1. So a German umlaut 'Ä' would
be encoded as '%c4'. But this is not enough to put the Euro sign '€' or
chinese character sets into URIs. The general consensus nowadays to use
UTF-8 character encoding, which means that 'Ä' is encoded as '%c3%84'.
The URICodec in this package is prepared to handle UTF-8 encoding and
fallback to ISO-8859-1 when the octets are no valid UTF-8.
WCM URIs (Resource IDs)
WCM URIs, a better name would be Resource ID (RID), are used to identify
WCM resources on a single server. In contrast to internet uris they
- do not have a scheme,
- do not have a network path component (host name),
- have Unicode as character set and no encoded octets in the path.
What they have in common with internet uris:
- they are hierarchical,
- they identify resources,
- can have a query part separated from the path by a question mark.
Examples: '/documents/report.doc' or '/tmp/Währung-in-€.txt'.
Query Part in WCM URIs
The string representation of a query part in WCM URIs is the same as
in RFC 2396 URIs. The reason for this is that a string representation
always needs escaped characters (to represent '=' in values for example),
so the RFC 2397 method of encoding a query is applied to WCM URIs as well.
The WCM URI class has access methods to get/set a query and handle the
correct en-/decoding of query parameters internally. See
URI for details.
Mapping of WCM to/from Internet Uris
This package provides mapping functionality to convert WCM uris to Internet uris
and vice versa.
When a HTTP server/servlet wants to expose WCM URIs (resource ids) as HTTP
uris, it just needs to know where in the HTTP uri hierarchy the WCM resources
should be made available.
Example: A servlet resides at 'http://sapportals.com/apps/service' and wants
to map request uris to WCM resources. The request for '/apps/service/info/index.txt'
should be mapped to the WCM resource '/info/index.html'.
The servlet would do the following when servicing a request:
HttpUrl m_servletUrl = new HttpUrl(
m_request.getScheme(), m_request.getServerName(), m_request.getServerPort(),
m_request.getContextPath() + m_request.getServletPath(), null);
IUriReference ref = new UriReference(m_request.getRequestURI(),
m_request.getQueryString(), null);
URI uri = m_servletUrl.mapToWcmPath(ref);
if (uri == null {
// request uri outside servlet uri? Should not happen
}
First, the http url of the servlet itself is determined. Note that your servlet
can be accessible unter many different host names and ports, with or
without https. So it is wise to calculate the servlet url on every request.
Next the request uri and possible query string are placed in a IUriReference
object. As the last step, the uri reference is resolved, using the servlet's
uri as base, to the WCM URI (resource id).
The reverse mapping would also use the servlet url. Given a WCM URI, the
servlet would either generate a http url or a absolute uri reference to
hand out to the client (for example as href in a HTML document):
URI wcmpath = resource.getURI();
IUri url = m_servletUrl.mapToAbsoluteUri(wcmpath);
// or
IUriReference ref = m_servletUrl.toAbsolutePath(wcmpath);
This code would convert the resource id '/info/index.html', given the servlet
url 'http://sapportals.com/apps/service' to:
- http://sapportals.com/apps/service/info/index.html (IUri)
- /apps/service/info/index.html (IUriReference)
See IHierarchicalUri for further information.
Related Documentation
Copyright © 2004 by
SAP AG. All Rights Reserved.
SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein
as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other
countries all over the world. All other product and service names mentioned are the trademarks of their respective companies.
Data contained in this document serves informational purposes only. National product specifications may vary.
These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies
("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be
liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are
those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein
should be construed as constituting an additional warranty.