...making Linux just a little more fun!

September 2007 (#142):


NewsBytes

By Howard Dyckoff and Samuel Kotel Bisbee-vonKaufmann

bytes

thunderboltContents:

Please submit your News Bytes items in plain text; other formats may be rejected. A one- or two-paragraph summary plus a URL has a much higher chance of being published than an entire press release. Submit items to bytes@linuxgazette.net.


News in General

thunderboltCitrix to Acquire XenSource for $.5b

Citrix Systems agreed to acquire XenSource, an open source leader in virtual infrastructure solutions. Originally created at the University of Cambridge, the Xen virtualization "engine" is now developed collaboratively by an active open source community of senior engineers at many of the industry's most innovative infrastructure companies, including leading hardware vendors like Intel, IBM, HP and AMD. This open collaborative approach significantly accelerates the innovation of the Xen engine, leading to continual state-of-the-art improvements in performance, scalability and cross-platform support. The next-generation Xen architecture is widely acknowledged for its industry-leading performance, efficiency, security and native support for the latest hardware-assisted virtualization features. The acquisition announcement follows a substantial new release of XenEnterprise, the company's flagship commercial product line powered by the Xen engine.

With this deal, it appears that the Xen hypervisor and virtualization technology will tilt even more toward Windows use and more inter-play with Microsoft's Veridian technology. It may also allow Citrix to offer more than just plain Windows sessions with its terminal servers. Other virtualization technologies, like OpenVZ, may get more attention in the wake of this acquisition.

thunderboltLinuxWorld Awards Go to Ubuntu 7.08, EnterpriseDB, and Unicon.

LinuxWorld.com and IDG World Expo, producers of major trade shows and events, announced the winners of the LinuxWorld.com "Product Excellence Awards" at the recent San Francisco LinuxWorld Conference.

The awards distinguish product and service innovations by LinuxWorld exhibitors in different areas, including an overall "Best of Show" award. "Best of Show" went to Unicon Systems for their System On Display product, the world's smallest Linux computer with a touch screen. The winners were recognized during a ceremony at the annual LinuxWorld and Next Generation Data Center Conference & Expo in San Francisco. System-on-Display (SoD) is an ultra-slim Chip-on-Film (CoF) platform based on ARM9 CPU, running full Linux 2.6.19 and attached to the back of a desirably-sized touch screen. The reference design based on SoD is a platform with multiple connectivity options including WiFi, high speed USB, GSM, BlueTooth and security.

Ubuntu 7 was awarded as the best desktop solution. Other notable winners included EnterpriseDB Advanced Server 8.2 for Best Database Solution. A complete list of winners is available here: http://www.linuxworldexpo.com/live/12/media//news/CC915442

IDG World Expo, the hosts of LinuxWorld and the Next Generation Data Center Conference (NGDC), announced that the two conferences attracted more than 11,000 participants and 200 exhibitors. The events showcased open source and data center technologies with more than 100 educational sessions featuring desktop Linux, mobile Linux, and virtualization in the data center.

Ubuntu also won Enterprise Open Source Magazine's Readers' Choice Award for the "Best Linux Distribution," winning a vote by members of the open source community. The award was announced at the 2007 Enterprise Open Source Conference in New York.

thunderboltIBM Consolidates ~4000 Servers with Linux

In a major transformation of its worldwide data centers, IBM launched an effort to consolidate almost 4,000 computer servers onto about 30 refrigerator-sized "System z" mainframes running Linux, a ratio of 133:1. This mainframe and virtual server environment will consume about 80 percent less energy and save more than $250 million over five years in energy, software and system support costs. The initiative is part of "Project Big Green", a broad commitment that IBM announced in May to sharply reduce data center energy consumption and make infrastructure more flexible to evolving business needs. IBM has over 8,000,000 square feet of data center space (equivalent to 139 football fields) and operates the world's largest data center operations, with major locations in New York, Connecticut, Colorado, the United Kingdom, Japan, and Australia. IBM's new global infrastructure will run on Linux and support over 350,000 users.

"As one of the world's largest technology providers, IBM consistently assesses how our systems can be maximized to support our employees and clients," said Mark Hennessy, Vice President and Chief Information Officer of Enterprise On Demand Transformation. "A global account consolidation truly demonstrates that IBM is committed to driving stronger energy and technology optimization, and cost savings."

The 3,900 servers - almost 25% of 16,000 servers at IBM worldwide - will be recycled by IBM Global Asset Recovery Services, which will process and properly dispose of the 3,900 reclaimed systems. Newer units will be refurbished and resold through IBM's sales force and partner network, while older systems will be harvested for parts or sold for scrap. Prior to disposition the machines will be scrubbed of all sensitive data. Any unusable e-waste will be properly disposed following environmentally compliant processes. The IBM mainframe's ability to run the Linux operating system is key to the consolidation project and leverages the ability of a single mainframe to behave as hundreds of individual servers. Each of these virtual servers acts like a physical machine. The links between virtual servers, provided by HiperSockets technology on System z servers, provides faster communication when compared to Ethernet links. The plan calls for v-server migration only using a portion of each mainframe, leaving room for future growth.

More information on IBM's data center consolidation is available at: http://www-03.ibm.com/systems/optimizeit/cost_efficiency/energy_efficiency/services.html
Project Big Green information is at: http://www.ibm.com/press/greendatacenter

thunderboltIBM Expands Support for the Solaris OS on x86 Systems

Turning the data center world upside-down, IBM will now distribute the Solaris's Operating System (OS) and Solaris Subscriptions for select x86-based IBM clients. The agreement is an extension of IBM's existing support for the Solaris OS on select IBM BladeCenter servers.

One reason that IBM is now supporting Solaris is that the OS is supported on more than 820 x86 platforms and runs more than 3,000 unique x86 applications including IBM Websphere, Lotus, DB2, Rational and Tivoli. IBM and Sun's support of interoperability via open standards also helps customers by connecting new platforms easily while preserving their initial investments.

As part of the deal, Sun and IBM will invest in testing and system qualification so joint customers will realize Solaris' leading performance and reliability on BladeCenter and System x servers. IBM servers that will support the Solaris OS include: IBM BladeCenter HS21 and LS41, and IBM System x3650, System x3755, and System x3850 servers.

In the teleconference announcing the deal, Sun CEO Jonathan Schwartz called it "...a tectonic shift in the market landscape."

"IBM is the first major x86 vendor to have such an agreement with Sun, and the first big vendor apart from Sun to offer Solaris on blade servers. Today we expand that agreement to help clients migrate to Solaris on IBM x86-based System x servers," said Bill Zeitler, Senior Vice President and Group Executive for the IBM Systems and Technology Group.

The Solaris OS was recently open-sourced and offers Solaris ZFS, Predictive Self-Healing, and Solaris Dynamic Tracing (DTrace) to improve uptime and cut operational costs.

IBM's "Implementing Sun Solaris on IBM BladeCenter Servers" Redbook on-line: www.redbooks.ibm.com/abstracts/redp4269.html

thunderboltIBM and Novell Join Forces on Open Source Collaboration

At the opening of LinuxWorld in San Francisco, IBM and Novell announced that they will join forces in the growing open source application server market. Under the agreement Novell will deliver and support WebSphere Application Server Community Edition (WAS CE) as part of SUSE Linux Enterprise Server. The two companies will also work on an open collaboration client for Linux desktop users.

The agreement comes on the heels of the one millionth distribution of WAS CE, which is based on Apache Geronimo and free to download and use. IBM and Novell will offer support and migration tools to help customers using JBoss to quickly and easily move to WAS CE. Combining WAS CE with SUSE Linux Enterprise Server from Novell provides an unbeatable combination for the small and medium-size business market, where Novell excels, and a compelling offering for enterprise customers with distributed computing environments. IBM and Novell will team up on joint marketing and sales campaigns targeting customers around the world.

The partnership will provide customers with an enterprise-ready open source alternative to JBoss, while developers will have the opportunity to build on a tested platform in WAS CE, with the full support of IBM, Novell, and the open source community.

"Customers today are looking for an integrated solution to solve their application development needs," said Roger Levy, Senior Vice President and General Manager of Open Platform Solutions for Novell. "Novell and IBM have partnered to bring customers a best-in-class experience by delivering two powerful software platforms through one channel. Now, when customers need support for their open source operating system or their open source application server, they can get it with one phone call to Novell."

In addition to the agreement with Novell, IBM is introducing WAS CE 2.0. WAS CE 2.0 will have full Java EE 5 standard support. A research report from Evans Data found that WAS CE is growing quickly and has gained market share nearly three times as fast as JBoss in 2006.

WAS CE 2.0 will be available later this year.

WebSphere: www.ibm.com/software/webservers/appserv/community

Conferences and Events

Calls for Papers

SCALE 6x
Main conference: http://www.socallinuxexpo.org/scale6x/documents/scale6x-cfp.pdf
Women in Open Source mini-conference: http://www.socallinuxexpo.org/scale6x/documents/scale6x-wios-cfp.pdf
Open Source in Education mini-conference: http://www.socallinuxexpo.org/scale6x/documents/scale6x-education-cfp.pdf

September

LinuxWorld Conference & Expo
September 3 - 7, 2007; Beijing; http://www.linuxworldchina.com

Linux Kernel '07 Developers Summit
September 4 - 6; Cambridge, U.K.; http://www.usenix.org/events/kernel07/

1st International LDAPv3 Conference
September 6 - 7, 2007; Cologne, Germany; http://www.guug.de/veranstaltungen/ldapcon2007/

Rich Web Experience Conference
September 6 - 8; Fairmont Hotel, San Jose, CA

BEAWorld 2007
September 10 - 12; Moscone Convention Center, San Francisco, CA; http://www.bea.com/beaworld/us/index.jsp?PC=DEVELOPER

VMWorld 2007
September 11 - 13; Moscone Convention Center, San Francisco, CA; www.vmware.com/vmworld/

Mozilla 24
September 15 - 16; 24hr community web event; http://www.mozilla24.com

IT SECURITY WORLD 2007
September 17 - 19; Fairmont Hotel, San Francisco, CA; http://www.misti.com/default.asp?Page=65&Return=70&ProductID=7154

RailsConf Europe 2007
September 17 - 19; Berlin, Germany

Gartner Open Source and Web Innovation Summits
September 17 - 21; Las Vegas, NV; https://www.gartner.com/EvReg/evRegister?EvCd=OS3

Intel Developer Forum 2007 September 18 - 20; Moscone Center West, San Francisco, CA; http://developer.intel.com/IDF

Software Development Best Practices 2007
and Embedded Systems Conference
September 18 - 21; Boston, MA; http://www.sdexpo.com/2007/sdbp

RFID World: Boston
September 19 - 20; Boston, MA; http://www.shorecliffcommunications.com/boston

SecureWorld Expo 2007 San Francisco
September 19 - 20; South San Francisco Conference Center; https://secureworldexpo.com/rsvp/index.php
Discount code - SNFEBG7

AJAXWorld Conference West 07
September 24 - 26; Santa Clara, CA; http://www.ajaxworld.com
Discount registration code: AJAX5000

Digital ID World 07
September 24 - 26; Hilton Hotel, San Francisco, CA; https://orders.cxo.com/conferences/enroll.html?conferenceID=8
$1,000 off with priority code "radeb"

Semantic Web Strategies Conference 2007
September 30 - October 1; San Jose Marriott, San Jose, CA; http://www.semanticwebstrategies.com

October

BEAWorld 2007 Barcelona
October 2 - 4; Palau de Congressos de Catalunya; http://www.bea.com/beaworld/es/index.jsp?PC=1AUGAATDM

Zend/PHP Conference & Expo 2007
October 8 - 11; San Francisco, California; http://www.zend.com/store/zend_php_conference?emlstr=en-early-bird-0708

Designing and Building Business Ontologies
October 9 - 12; San Francisco, California; http://www.wilshireconferences.com/seminars/Ontologies/

Ethernet Expo 2007
October 15 - 17; Hilton New York, New York; http://www.lightreading.com/live/event_information.asp?survey_id=306

ISPCON FALL 2007
October 16 - 18; San Jose, CA; http://www.ispcon.com

Interop New York
October 22 - 26; http://www.interop.com

LinuxWorld Conference & Expo
October 24 - 25; London, UK; http://www.linuxworldexpo.co.uk

November

CSI 2007
November 3 - 9; Hyatt Regency Crystal City, Washington, D.C.; http://www.csiannual.com

Interop Berlin; November 6 - 8; Berlin, Germany; http://www.interop.eu

Oracle OpenWorld San Francisco
November 11 - 15; San Francisco, CA; http://www.oracle.com/openworld

Supercomputing 2007
November 9 - 16; Tampa, FL; http://sc07.supercomputing.org

Gartner - 26th Annual Data Center Conference
November 27 - 30; Las Vegas, NV

Distro News

thunderboltOpenSUSE Beta and New Build Service

On the second anniversary of the openSUSE project in August, the community program marked two new milestones: the availability of the first Beta of openSUSE 10.3 and an expansion of the openSUSE Build Service.

OpenSUSE 10.3 offers a state of the art operating system based on Linux kernel 2.6.22 with a large variety of the latest open source applications for desktops, servers, and application development. The first Beta of openSUSE 10.3 is now available at http://www.opensuse.org/download.

The new openSUSE Build Service provides an infrastructure for software developers to easily create and compile packages for multiple Linux distros. In the first version of the build service, users of many distros - openSUSE, SUSE Linux Enterprise, Fedora, Debian, Ubuntu, or Mandriva - can search and browse for new software for their distribution. Users of the upcoming openSUSE 10.3 can install their software with one click, directly from the Web interface. In the past four months more than 13 million packages have been downloaded from the openSUSE Build Service as developers build packages for various distributions using the tool.

The openSUSE Build Service is an open source build system that helps developers provide high quality packages for multiple distributions from the same source code. With the system imaging tool, KIWI, open source developers can quickly build a Linux distribution that meets their needs, rigorously test it to ensure product quality, and easily package it for quick installation.

The openSUSE Build Service is completely open source, giving developers and users free and full access to build their choice of Linux packages. Those packages may be based on openSUSE, SUSE Linux Enterprise, Fedora, Debian, Ubuntu, or other projects.

openSUSE Build Service: http://www.opensuse.org/Build_Service

thunderboltFirst Debian 4.0 "Etch" Update Released

The Debian project has announced the availability of the first point revision of Debian GNU/Linux 4.0, code name "Etch". This update adds security updates to the stable release, together with a few corrections to serious problems and miscellaneous bug fixes.

This release for Etch includes an updated installer, which includes the following changes: kernels used in the installer have been updated to ABI 2.6.18-5, updated mirror list, support added for certain USB CD drives that were not being detected, incorrect setup of GKSu fixed when user chooses to install with the root account disabled, and the removal of the vdrift package.

thunderboltMEPIS Begins a Return to Debian With Version 7 pre-Beta

SimplyMEPIS 6.9.51 pre-Beta is a preview of upcoming SimplyMEPIS 7. It is available from the MEPIS subscriber site and the MEPIS public mirrors.

MEPIS has discontinued using Ubuntu binary packages in favor of a combination of MEPIS packaged binaries based on Debian and Ubuntu source code, which is combined with a Debian Stable OS core, and extra packages from the Debian package pools.

Warren Woodford of MEPIS explains the change, "By using the latest Debian and Ubuntu source code for building user applications, we can provide the best latest versions of the applications users want the most. And by building on top of a Debian Stable core, we can provide a release that has the the stability and long life that users want."

Warren continues, "Most Linux users are tired of having to reinstall every 6 months in order to have up-to-date applications. We expect that with this approach MEPIS can offer a superior user experience that will be incrementally upgradeable for 2 years without reinstallation of the OS."

The pre-Beta includes a 2.6.22 kernel, Debian Etch core, KDE 3.5.7, Firefox 2.0.0.5, Thunderbird 2.0.0.4, and OpenOffice 2.2.1. This is an early release with many rough edges. In particular, the 'splashy' boot splash does not run reliably, some extra kernel drivers are not yet compiled, some GUI components are not themed for MEPIS, and the pre-Beta has had very limited testing.

32 and 64 bit ISO images are available in the "testing" subdirectory at the MEPIS Subscriber's Site and at the MEPIS public mirrors.

Damn Small Linux 4.0 RC1 Announced

August saw the first release candidate of Damn Small Linux 4.0 and an update of the current stable branch, the 3.4 series, to version 3.4.1. Both are available here: ftp://ftp.oss.cc.gatech.edu/pub/linux/distributions/damnsmall"

Software and Product News

thunderboltBlack Duck Tracks Open Source License Changes

At SF LinuxWorld, Black Duck Software announced protexIP 4.4, which includes features specific to recent GPL license changes. The newest version of Black Duck increases insurance that software is in compliance with licensing requirements via a significantly enhanced KnowledgeBase of open source and vendor-added code software components. This includes detailed licensing information for more than 140,000 components, a doubling from the previous version.

Last month, the Free Software Foundation released GPL version 3. By mid-August, more than 382* open source projects have published code under the new license. The protexIP application assists developers and legal counsel in managing the use of code from open source projects that have both decided to explicitly switch to GPLv3 and those that have decided not to switch.

The solution identifies components within open source projects that have decided not to switch to GPLv3, such as the Linux kernel, or have simply not made a decision. protexIP users set policies through the product to dictate whether developers can use code governed by the various licenses, and the solution ensures throughout the development process that licenses governing code are not in conflict with each other or with company policy.

The first release of protexIP 4.4 will be available in September 2007. Pricing is based on the size of the code base managed by protexIP and the number of users accessing the solution. The new version and KnowledgeBase are delivered to existing protexIP customers automatically via a Web update when available.

thunderboltMagical Realism SciVee opens Alpha Website for Videos

Hoping to improve communications in the Sciences, a partnership of the Public Library of Science (PLoS), the National Science Foundation (NSF), and the San Diego Supercomputer Center (SDSC) have initiated a service much like YouTube, but for scientists.

According to the SciVee Web page, "SciVee, created for scientists, by scientists, moves science beyond the printed word and lecture theater taking advantage of the Internet as a communication medium where scientists young and old have a place and a voice."

Scientists can upload videos and synch them to online papers and presentations. Other scientists can freely view uploaded presentations and engage in virtual discussions with the author and other viewers. The interaction can then be made available as podcast to the Internet.

SciVee applies the Creative Commons license to all of the video content.

Alpha video content: http://www.scivee.tv/video
SciVee: http://www.scivee.tv

thunderboltLenovo to Sell PCs with SUSE Linux Installed

Lenovo and Novell announced in August an agreement to preload Linux on Lenovo ThinkPad notebook PCs and to provide support for the operating system. The companies will offer SUSE Linux Enterprise Desktop 10 to commercial customers on Lenovo notebooks including those in the popular ThinkPad T Series, a class of notebooks aimed at typical business users. The ThinkPad notebooks preloaded with Linux will also be available for purchase by individual customers.

Lenovo will provide direct support for both the hardware and operating system. Novell will provide maintenance updates for the operating system directly to ThinkPad notebook customers. For several years, Lenovo has Linux-certified its ThinkPad notebook PC line. Lenovo offers Help Center support for SUSE Linux Enterprise Desktop 10 on the ThinkPad T60p.

Lenovo Linux notebooks will be available beginning in the fourth quarter of 2007

thunderboltOracle Offers More Linux Enhancements

Touting its commitment to enhance Linux for all users and emphasizing its goal to not become another distro, Oracle announced at LinuxWorld new projects and code contributions to augment the enterprise-class capabilities of Linux. Oracle positioned itself as a support provider with a history of contributions helping to ensure enterprises' success with Linux. Oracle offered enhancements including: development of a new file system designed for superior scaling at the petabyte level; porting the popular Yet another Setup Tool (YaST) to Oracle Enterprise Linux and the fully compatible Red Hat Enterprise Linux; open-sourcing tools to streamline testing, collaborating on an interface for comprehensive data integrity and developing a new asynchronous I/O interface to reduce complexity. These contributions will be available under appropriate open source licenses.

Oracle's Chris Mason has developed the Btrfs file system to address the expanding scalability requirements of large storage subsystems. Btrfs will allow enhanced scalability and simplified management for large storage configurations, while also adding flexible snapshotting, fast incremental backups and other features missing from Linux today.

Mason told the Linux Gazette that the work on Btrfs aimed "...to better handle failures of all kinds and to have enough check-summing to find and repair data and meta-data. The goal is to replace ext3 as the main de-facto Linux file system in the future." An alpha release of the Btrfs file system is available under the GPL license at: http://oss.oracle.com/projects/btrfs/.

Oracle is working to replace the existing asynchronous I/O interface in the kernel with a more generic subsystem. The new kernel based implementation should provide a single access point, allowing most system calls to become asynchronous, thereby reducing complexity at both the kernel and application level. The new subsystem is expected to be faster for Oracle, is intended to make it easier for other applications to benefit from asynchronous programming under any workload.

Oracle has ported the popular system management tool YaST to Oracle Enterprise Linux and the compatible Red Hat Enterprise Linux (RHEL). Now available under GPL, this code can be freely accessed by anyone. Originally developed and made available under GPL by Novell, YaST has been used with openSUSE and Novell's SUSE Linux Enterprise Server (SLES) to enable easy install and configuration of system software, hardware and networks.

Now available under the GPL and Artistic licenses, the Oracle Linux Test Kit (derived from the Oracle Validated Configurations program) verifies Linux kernel functionality and stability essential for the Oracle Database. The test kit automates steps to define, execute and analyze tests and has DBT2 and DBT3 workloads as well as specialized workload simulators. The Oracle Linux Test Kit can be used for running tests on Oracle Enterprise Linux, RHEL and SLES distributions in a variety of topologies. Server and storage vendors can use the Oracle Linux Test Kit to test and verify that specific hardware and software combinations for Oracle deployments on Linux.

Monica Kumar, senior director of Linux and open source at Oracle, told the Linux Gazette that an increasing share of of its customers are choosing to run Oracle on Linux. She mentioned a GGrp study in June, by Grahm that showed Oracle holding 42% of the overall DB market , but on Linux 67%. These numbers were for revenue from commercial DB products, so the share for MySQL and other non-commercial database products is minimized.

Oracle DB on Linux grew at an 87% annual rate by revenue dollars. This is based on sales of $1.9b from Oracle on Linux in 2006.

Kumar told the Linux Gazette, "...We have two buckets of customers... the Fortune 500 and prior Oracle customers and we also have a large number of customers who are using open source and Oracle Cluster FS and are getting support from us for both Linux and OCFS. They get support for both at lower price and its better support."

On the issue of forking Linux, Kumar said, "Never... It's not in Oracle's interest to fork... there is no value proposition for us [or] for our customers. Linux is [a] commodity so our team focuses all of our energies on support. We are clear where we can add value and we are clear about not fragmenting Linux."

thunderboltAMD to release Barcelona in September

AMD is inviting partners, key customers and, analysts to a special event mid-September in San Francisco Presidio. Part of the event takes place at George Lucas's Digital Art Center.

The long awaited quad-core Opteron is supposed to pack a processor wallop and be socket compatible with the current dual core Opterons. In August, AMD announced Novell certification for the new chip with SUSE Linux and is expecting to have other OS vendors ready to support it. AMD has given samples to OS vendors so they can optimize to new features on the chip, including nested page tables for enhanced virtualization.

Meanwhile, Intel is showing its partners early samples of their new 45 nanometer chips, which are faster and consume 30% less power than the current 65 nanometer family of chips.

thunderboltCommuniGate Offers Free Usage Licenses

CommuniGate Systems has been developing Internet communication products that are based on open standards into suites aimed at larger companies. In an effort to increase their consumer pool, it was announced on August 7th that CommuniGate Systems would offer access to their hosted communications platform to the public for free for 5 years. Users are able to create their accounts at www.TalkToIP.com, CommuniGate's new portal that offers e-mail, calendar, and secure IM integration with their Flash based Pronto! interface.

Furthermore, the CommuniGate Pro Community Edition is offered for comparison by complimentary download, allowing up to five users free, full-service accounts. Those five users will be able to manage all of their Internet based communication technologies - E-mail, VOIP, IM, voice mail, a conferencing server, etc. - with this cross-platform suite.

Full Press Release: http://www.communigate.com/news/c-news_article_08072007.html
CommuniGate Systems: http://www.communigate.com

thunderboltWhite Paper on Cost and Data Retention Aware Backup Solutions

On August 13th Asigra announced the availability of a white paper entitled "Managing the Life Cycles of Backups" that it commissioned from Data Mobility Group, relating to backup life cycle management. An example of the application would be a business's tax information, which is very important in the year it must be filed, but losses importance each subsequent year. However, for accounting sake and to remain compliant with data retention laws that many countries have introduced, that data must be kept in a secure location. According to the white paper, in order to remain cost efficient the data should be kept in a central location, allowing a business's multiple locations to manage all of its data.

It is of note that the white paper does not appear to be made available to the public, forcing Asigra's consumers to rely on its press release for digestion of Data Mobility Group's findings. This is especially questionable when Asigra makes often claim that its own product was given favorable review. No response was made in time to the Linux Gazette's inquiries for this month's issue.

Asigra: http://www.asigra.com
Data Mobility Group: http://www.datamobilitygroup.com

thunderboltSolutions4ebiz Launches New Web Store for Linux Routers

On August 17th Solutions4ebiz, the Midwest distributor and online retailer of ImageStream Internet Solutions (ImageStream), launched its new online retail site: www.imagestreamsolutions.com. It offers Linux-based routers and network cards that are meant to be simple, pre-configured solutions at the home office and enterprise levels. Also featured is their bundle packages, with various configurations of routing devices, network cards, device configurations, and service contracts.

Solutions4ebiz: http://www.solutions4ebiz.com
ImageStream Internet Solutions: http://www.imagestream.com

thunderboltCentric CRM and LoopFuse Integrate Software Products

On August 21st Centric CRM, a developer of customer relationship management (CRM) software, and LoopFuse, a developer of user and web site usage tracking software, have integrated their respective products. The two companies each provide a different aspect of marketing, a sector of business that has enjoyed the Internet's tracking abilities. Software in this category often provides the ability to track sales leads, current clients (prevents overlapping in the sales department), and calculates return on investment (ROI) for marketing campaigns, all of which LoopFuse and Centric CRM will make available. Their integration will allow client history and usage data processing to interact within one application.

For example, a company can track what products a potential client views, links they click on, how often they visit, what they purchased, and their exact browsing history on their web site. This allows the company to calculate how likely it is that the user will buy a new product, become a client, etc. This requires a closer relationship between marketing and sales departments, as the data from one passes to the other: the marketing department issues a new advertisement campaign and they track who clicks on those advertisements. The sales department then tracks how many of those leads turn into sales, producing a click-to-purchase ratio. These, and other, numbers allow marketing to evaluate the success of their investment in the advertisement campaign.

[ Centric CRM correctly claims that they provide an open source solution when referencing their product Centric Team Elements. However, this is a very minor product in their catalog. Their primary product is Centric CRM, which is licensed with a proprietary license that is far from open source. Their usage of the "OSI Certified" logo appears to be legitimate, as it is only used in text referencing Centric Team Elements. --S. Bisbee ]

Full press release: http://www.linuxfriends.net/142/misc/lg_bytes/centricpressrelease.txt

Centric CRM: http://www.centriccrm.com
LoopFuse: http://www.loopfuse.com

thunderboltITerating Provides New Software Tracking Service

ITerating, a "Wiki-based Software Guide" announced on August 27th that it will be providing up-to-date information on more than 17,000 software products via a free Semantic Web service. This should allow other software information repositories, IT professionals, and other end users to pull ITerating's repository data. For example, users can select specific products and receive a feed of information on the latest updates, drivers, etc. The information is kept up-to-date by its users, utilizing wiki techniques.

"Our goal is to offer the world's first comprehensive software guide that is always up-to-date," said Nicolas Vandenberghe, CEO of ITerating, in the company's press release. "By combining a Wiki format with Semantic Web services, we are able to ensure that the information on the ITerating site is both comprehensive and up-to-date. Now everyone has the opportunity to use this powerful but simple tool to organize, share and combine information about software on the web."

ITerating has already been offering user reviews and ratings, blogs, and various methods you would expect to filter through software.

Full press release: http://www.iterating.com/aboutus?page=Press
ITerating: http://www.ITerating.com


W3C's Semantic Web specifications: http://www.w3.org/2001/sw/

Talkback: Discuss this article with The Answer Gang


Bio picture

Howard Dyckoff is a long term IT professional with primary experience at Fortune 100 and 200 firms. Before his IT career, he worked for Aviation Week and Space Technology magazine and before that used to edit SkyCom, a newsletter for astronomers and rocketeers. He hails from the Republic of Brooklyn [and Polytechnic Institute] and now, after several trips to Himalayan mountain tops, resides in the SF Bay Area with a large book collection and several pet rocks.

Howard maintains the Technology-Events blog at blogspot.com from which he contributes the Events listing for Linux Gazette. Visit the blog to preview some of the next month's NewsBytes Events.



[BIO]

Sam was born ('87) and raised in the Boston, MA area. His interest in all things electronic was established early by his electrician father and database designer mother. Teaching himself HTML and basic web design at the age of 10, Sam has spiraled deeper into the confusion that is computer science and the FOSS community. His first Linux install was Red Hat, which he installed on a Pentium 233GHz i686 when he was about 13. He found his way into the computer club in high school at Northfield Mount Hermon, a New England boarding school, which was lovingly named GEECS for Electronics, Engineering, Computers, and Science. This venue allowed him to share in and teach the Linux experience to fellow students and teachers alike. Late in high school Sam was abducted into the Open and Free Technology Community, had his first article published, and became more involved in the FOSS community as a whole. After a year at Boston University he decided the experience was not for him, striking out on his own as a software developer and contractor.


Copyright © 2007, Howard Dyckoff and Samuel Kotel Bisbee-vonKaufmann. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 142 of Linux Gazette, September 2007

Preventing Domain Expiration

By Rick Moen

If you study accounting and finance, one of the tidbits taught is that financial fraud (via embezzlement and otherwise) is far more pervasive than anyone's willing to admit. It's a loss syndrome businesses (e.g., banks) see no advantage in mentioning: Having been taken for a ride is pretty embarrassing, after all.

Internet users have an equally pervasive — and oddly similar — problem: accidental Internet domain expiration. Your Linux user group or other nonprofit group (or, hey, even your company) is relying on some vaguely defined chain of command to make sure the domain keeps getting renewed, making the assumption everything's fine as long as no disaster has yet happened (which tactic is called "management by exception" in business school — usually just before they cue the ominous music). Somebody drops the ball, the domain everyone's relying on expires when nobody's looking, and when the dust settles you find that a domain squatter's grabbed it. Yes, there are companies that make domain snatching their core business. They do well at it, too. Too well for my taste.

Victims of such raids sometimes attempt to recover using legal threats, usually trademark-based, and the ICANN Uniform Domain-Name Dispute-Resolution Policy (UDRP) to wrestle back their domains, but it's more common to pay the squatter's ransom: That might range from hundreds to tens of thousands of US dollars, depending on the domain and what the market will bear.

Equally common, though, especially for less wealthy victims, is to just quietly concede, watch the squatter deploy a so-called "search engine" site where your Net presence had been, and move your presence to some entirely new domain name you use as a replacement. Every year, I see this happen to individuals and groups: Suddenly, the established domain is a squatter site, and everyone has a new e-mail address, for reasons nobody wants to discuss.

But there's a better way. It doesn't have to happen. It can be prevented.

First up, out on the Net, I found a nice little Bourne shell script by Ryan Matteson (matty91 at gmail dot com) called "domain-check" (http://prefetch.net/code/domain-check), which queries the public WHOIS data in real time to check pending domain expiration dates, and integrates nicely with cron and optionally SMTP e-mail notification, in order give responsible parties advance notice of the need to take action. (In this article, as elsewhere, all-caps "WHOIS" refers to the TCP port 43 protocol defined in RFC 3912 for remote information lookup about domain names, etc. Not all TLDs offer that service, as detailed below.)

Ryan's script's only dependencies are awk, whois, and date, whose executable paths must all be correctly set in the script body (and require fixing on typical Linux systems). Plus, you probably need to add a line defining shell environment variable MAIL to point to a proper system outbound mailer, if you wish to do e-mail advisories. (On my system, /usr/bin/mail fits nicely.)

Once you have that set, it's fairly self-explanatory:

$ domain-check -h
Usage: /usr/local/bin/domain-check [ -e email ] [ -x expir_days ] [ -q ]
[ -a ] [ -h ]
          {[ -d domain_namee ]} || { -f domainfile}

  -a               : Send a warning message through email 
  -d domain        : Domain to analyze (interactive mode)
  -e email address : Email address to send expiration notices
  -f domain file   : File with a list of domains
  -h               : Print this screen
  -s whois server  : Whois sever to query for information
  -q               : Don't print anything on the console
  -x days          : Domain expiration interval (eg. if domain_date < days)

Example output:

$ domain-check -d linuxmafia.com

Domain                           Registrar         Status   Expires Days Left
-------------------------------- ----------------- -------- ------- ---------
linuxmafia.com                   TUCOWS INC.       Valid   17-jul-2010   1057 

Ryan's implementation of domain-check has two problems: One is that he has inadvertently made its licence technically proprietary (as of v. 1.4), by failing to include rights to modify or redistribute, in his otherwise generous licence statement. Ryan's aware of this oversight, but hasn't yet fixed it at press time.

The other: It can parse the expiration date fields from only a few top-level domains (TLDs), missing some really important ones such as .ORG. In particular, if you run it with e-mailed output (where it really shines, generally, e.g., running as a weekly cronjob to check a list of domains), it says nothing at all about domains within the many TLDs it simply can't handle.

Mind you, as editor Ben Okopnik and I were testing Ryan's script, we realised that adding to it support for additional TLDs could prove non-trivial, and we respect Ryan's accomplishment, as far as it's gone: A brief survey of the 250 country-code TLDs ("ccTLDs", such as .uk and .us) and 21 generic TLDs ("gTLDs", such as .com, .net, .org, .info, etc.) showed dozens of variations in the way expiration dates and registrar names, each needing its own parsing code.

Ryan might appreciate some help with that task: Experienced shell coders might want to send Ryan patches, especially to fill out its currently rather thin TLD coverage. However, we right away spotted the licensing issue, on top of that — and so, for ourselves, decided to switch tactics.

Introducing Ben's domain-check

Ben Okopnik fired up his mighty Perl kung fu, and crafted a second implementation, likewise called "domain-check", which now is available with GPL licensing terms at my Web site. It works a treat. Here's how it goes from the command line — obviously fashioned after Ryan's good example:

$ domain-check -d=linuxmafia.com
Processing linuxmafia.com... 


Host                    Registrar                           Exp.date/Days left
==============================================================================
linuxmafia.com          TUCOWS, INC.                        17-jul-2010 / 1057

And, of course, it supports the same e-mailed reporting mode that in Ryan's script is so nicely cron-friendly — with the bonus improvement of relying on Perl and a WHOIS client solely, and finding them via PATH without any need to tweak the script.

The Two WHOIS Clients

At present, Ben's domain-check will use, if present, the fairly sophisticated, configurable, and cache-enabled WHOIS client "jwhois" by default, on a hunch that "jwhois" is usually, generally a small bit smarter, and its caching on disk of recently received WHOIS data is usually an advantage, relative to the regular "whois" (/usr/bin/whois) implementation — with automatic fallback to the latter client. However, the WHOIS client comparison is, upon further examination, a mixed bag. For one thing, "jwhois's" results caching (defaulting to a seven-day retention period) can become a problem: Suppose it's Wednesday today, you last checked your friend's domain example.com on Sunday, and that domain's due to expire this coming Saturday. You run domain-check (and it finds "jwhois"); domain-check reports that your friend's weekend expiration is still looming.

But maybe, he/she has (unbeknownst to you) already paid for that renewal, and it took effect yesterday. domain-check won't pick this datum up (while using "jwhois" with 7-day retention), and so issues a false alarm, because it's still using the cache-hit of Sunday's data, now three days old (but already obsolete).

You can ameliorate this situation by, say, reducing the cache period (near the bottom of /etc/jwhois.conf) to 2 hours instead of the default 168 hours = 1 week — but the point is that "jwhois's" default reliance on old data can be misleading.

Nor is it always or unambiguously the case that "jwhois" is "a bit smarter". This is where things get interesting (part one, of two). The worldwide Internet domain system's "whois" data, showing contact information for each domain's owners & operators, which registrar it's enrolled through, when it will expire, and at what IPs its DNS nameservers can be found, is (like DNS itself) yet another distributed information system, with "whois" information for each TLD (among those that offer it) publicly accessible (if at all) either the WHOIS protocol, or Web-based lookup methods, or both, that can query one or more database server holding that data.

Which TLDs offer meaningful information lookup via WHOIS, and at what WHOIS server hostnames in each case? If you're reasonably lucky (regarding the six or seven TLDs you typically care about, no matter where in the world you are), the WHOIS client software you use (which on Linux will be either /usr/bin/whois or /usr/bin/jwhois) already has this knowledge built in. However, the various TLD operators, including the administrators of the 250 country-code TLDs, have an unsettling tendency to move things around, change where their WHOIS data is, terminate WHOIS service, start WHOIS service — without (much) notice. They're supposed to inform IANA of all such changes, whereupon IANA would update its TLD information pages (1, 2), but you will be "shocked, shocked!" to hear that compliance is spotty. In parallel to this official process the two client programs' authors attempt to track TLD changes, themselves. Sometimes, one of the two Linux WHOIS clients will reflect (in its auto-selection of the correct WHOIS server for a given TLD, or its claim that none exists) better information than IANA has. Sometimes, IANA has better data (and, if the system really worked, it would have the latest and best — but doesn't). More often than not, the best data are on relevant Wikipedia pages (1, 2, 3, 4). Some of the linked subpages are really entertaining: If your sense of humour is as warped as mine, check out the reasons why ".vrsn-end-of-zone-marker-dummy-record.root" is a valid TLD, and note the reasons why, in 2007, .arpa is still a robust TLD with six active subdomains — and, by the way, a useful WHOIS server.

The biggest reason Ben and I have so far favoured the jwhois client is that its internal knowledge about which WHOIS server to use for particular TLDs and subdomains is highly configurable via configuration file /etc/jwhois.conf (but beware of the mixed blessing of results caching). Whereas, the other WHOIS client is not. However, in the middle with wrestling with both clients, seeking to give domain-check the broadest possible TLD coverage, Ben found it prudent to hack domain-check's parsing code that handles its (optional) file listing which domains to check, to support an optional two-column format: domain, and what WHOIS server hostname to use for that domain. To help users, I've constructed a prototype domains file, showing a test-case host within each supported TLD (often, the "NIC" = network information centre that runs Internet infrastructure for that country or other TLD authority), plus the currently correct WHOIS host for that TLD. Separately, I am maintaining a separate file of more-verbose notes/information, but the latter is intended solely for interested humans, and isn't readable by domain-check.

Now, I figure most people who deal in domains are following this account without major problems, but a minority of readers may be thinking "What's this about determining expiration data via WHOIS?", and a smaller minority are still stuck on "What's this about domains expiring?" I should explain:

It's a Wacky World, out There

(This is part two of "Where things get interesting".)

One of the reasons I really enjoy travelling to remote and diverse parts of the world, on occasions when I have time and money for it, is that you encounter people living their lives using, quite naturally, radically different basic assumptions, sometimes assumptions differing in subtle but important ways. In return, you're rewarded with the cheerful fact that you and your people will tend to strike other nations as slightly odd and nutty, too — and may even agree. (An American comedian and entertainer named Garrison Keillor and his radio programme "A Prairie Home Companion" finally made me realise, similarly, that my own crowd of Scandinavian-Americans are extremely quirky people — manias for strong coffee and white fish, going nuts on Midsummer Day, mocking self-deprecation, and all.)

Getting back to the subject, exploring WHOIS data can earn you that same shock of unexpected strangeness, right at home. One of my first test cases for the unfolding development of domain-check was .au, i.e., our esteemed friends in Australia. Hmm, I thought, why not check Linux Users of Victoria?

$ whois luv.asn.au | more
Domain Name:             luv.asn.au
Last Modified:           Never Updated
Registrar ID:            R00016-AR
Registrar Name:          Connect West
Status:                  OK

Registrant:              Linux Users of Victoria Inc.
Registrant ID:           None given

Eligibility Type:        Other

Registrant ROID:         C0793852-AR
Registrant Contact Name: THE MANAGER
Registrant Email:        Visit whois.ausregistry.com.au for Web based WhoIs

Tech ID:                 C0793854-AR
Tech Name:               Stuart  Young
Tech Email:              Visit whois.ausregistry.com.au for Web based WhoIs

Name Server:             black-bean.cyber.com.au
Name Server IP:          203.7.155.4
Name Server:             ns2.its.monash.edu.au
Name Server IP:          130.194.7.99
Name Server:             core1.amc.com.au
Name Server IP:          203.15.175.32
Name Server:             lists.luv.asn.au
Name Server IP:          203.123.80.10

Hullo? Where's the expiration data? Turns out, none of .au offers that information via WHOIS. Nor does the public whois information browsable at the indicated Web host. What?

Well, upon inquiry, I was enlightened: It's deemed a privacy issue, and Australians using .au domains presumably suffer fewer domain snatches, and similar abuses. The same appears to be true in .de (Germany) and some others. Presumably, domain owners (as opposed to the general public) can look up their own domains' expiration data via their logged-in individual domain records, in addition, of course (in theory), to getting notification of upcoming expirations. On the downside, TLDs that conceal that data from the public prevent public-spirited neighbours from helping warn domain owners notice upcoming problems, keep people from planning for legitimate opportunities to re-register domains their owners no longer want, etc.

(By the way, if you are really serious about protecting your privacy as a domain holder, .au doesn't really qualify. .to (Kingdom of Tonga) is among the few that do.)

However, it gets stranger: There are particular country-code domains (I won't name names) where expiration data is available, and open to the public, but appears not to matter. That is, you'll find what appears to be a good test case, notice that its expiration date of record is three years ago, and then notice that the domain still works, anyway.

Your mileage may differ, but, for me, that was a culture shock: In my sadly clock-driven, Westernised existence, Internet domain expiration is a real calamity: Your domain's DNS stops working within a day (if not instantly), and you may or may not even be allowed to buy it back (re-register or renew it) at all. If you are, it may involve a ghastly "Sorry, you were asleep at the wheel" surcharge.

Some TLDs, evidently, just aren't like that, so domain-check may address a non-problem for domains in your national TLD. It's up to you to check, I suppose.

My prototype setup of domain-check runs via a weekly cronjob that runs every Sunday night, and e-mails me a notice about which domains, among the roughly 150 I have domain-check monitor, are now within 90 days from expiration, plus a separate one about what domains, if any, have already expired. You might ask, armed with that weekly briefing, what do I do? That brings us to:

The Difficult Part

This might be you: You own a domain you care considerably about, but every year, like clockwork, you put off renewal until the last minute, to "get the most for your money". You pay for only one extra year at a time, not three or four, for the same reason. Maybe you dislike your current registrar (in TLDs like com/org/net where a choice is offered), but never move your domain because that would require sending more money, and you have a vague notion that, some year, you'll move registrars right around the time of your next renewal. Maybe you literally wait until the final day, and panic and shout on the telephone to your registrar if there's a hitch, until your renewal goes through. You're now reading this article and realise I'm about to tell you you're being foolish, but nothing's going to change your mind, let alone your habits.

Why is that foolish behaviour? Because every bit of that attitude greatly increases the risk of accidental expiration. You should, actually, consider moving to a better registrar at any time, and not wait, because competing registrars almost all credit you for the full time remaining at your prior registrar, upon your domain's arrival. That is, if you have 100 days remaining at Registrar A when you initiate a transfer to Registrar B (and pay the latter for a year of service), you start with 365 + 100 days on your domain's expiration clock. So, you lose nothing at all. The bank interest you save by buying only one year in advance instead of 3-4 is absolutely negligible compared to the painful cost of recovering from accidental expiration (where that is possible), not to mention the transaction cost of swooping in and continually renewing annually to "save money" (let alone the cost of doing that mere days or hours before expiration, as many do).

I might be able to convince you, the reader, that the above syndrome is unwise, but I won't convince your friends or the organisations you care about — whose domains you might want to watch over. Which brings us back to the question: Armed with the knowledge that someone's domain expiration is imminent, what do you do about it?

Several non-technical problems become evident, as one attempts to look after friends' domains — and I really should have realised that the technical challenges of writing and debugging domain-check would be the iceberg's tip, but didn't:

Imagine a Linux user group, or a science fiction fan association that puts on an annual convention, or some other similar group that relies on an Internet domain. You're trying to get their attention to an upcoming expiration. Domain matters are probably delegated to someone technical who's believed to be handling them. The people who run the group generally are most often other people, who may not understand domain matters at all, and may assume, if you ask them about it, that you must be referring to the Web site, will forward your mail to the HTML guy / webmaster / hosting company / listadmin, and will never realise their category error.

The domain guy may be gone from e-mail for a month. He/she might have believed the responsibility was taken over by somebody else. The contact e-mail addresses shown in WHOIS for the domain may be wrong or outdated, or just unmonitored. Your warning e-mails might be mistaken as spam or a sales solicitation (strangers showing concern seems outlandish to many), and blackholed or ignored. Or everyone in the group might be assuming someone else is taking care of it. Or maybe their mail server is misconfigured or otherwise mishandling or misrouting some or all incoming mail about the domain.

Ultimately, this isn't your problem — sufficiently hapless organisations' negligence will cause them loss despite even heroic efforts to help, and that can't be helped — but it's nice to know the most common failure modes.

If you see a domain's days remaining rapidly approaching zero, and nothing's happening, one of four explanations logically applies:

As the concerned outsider, your main worry is the last scenario, which is the classic domain-loss one — which is relevant to the current question, of what you do with your knowledge of the impending expiration. The naive answer is: "Look in WHOIS, find the listed domain contacts, send them "Dude, you need to renew your domain" e-mail, check that domain off your list, and pat yourself on the back for being a good neighbour.

That's naive because, odds are, that's exactly what the registrar did, and it didn't work. Thus, you may want to be a little creative about whom to contact, and how. E.g., look on the Web site for maintained information about who currently runs the group. Bear in mind that he/she/they may not, initially, know what you're talking about (e.g., fob you off on the webmaster). Ask politely that someone in charge send you wording like "Thanks, we know about it", mention that you'll cease pestering them when that happens, and keep following up at intervals.

Be really, really diplomatic. Your queries may, themselves, cause a political kerfuffle within domain-owning groups, and cause considerable unintended irritation. People will get bothered, often despite being the wrong people to bother (e.g., the webmaster), and may get cranky. A harassed domain-admin may write back and say "It's on auto-renew, jerk." Don't be offended. Stress that you didn't know, and merely want to help them avert mishaps. From time to time, you just might get lucky and hear "Thank you."

Anyway...

I should stress that my cronjob was the result of only a few minutes' work, shortly before penning the initial draft of this article. It wouldn't be difficult to write a less-feeble shell script to do slightly more useful notifications, e.g., tailored e-mail warning texts at the 90, 60, and 30-day marks, with each being sent to groups of people appropriate to each domain, rather than all notifications being sent just to one person for all domains monitored.

However, as is so often the case with system administration, perfectionism is not your friend: Waiting around to do this job right had already caused me to delay some months from doing even this much, while I pondered the problem — and in the meantime a volunteer group I know (but will not name here) was obliged to spend about US $500 to ransom its domain back. Ouch and damn.

Moral: Do the 80% solution, the one that avoids disaster, today. Don't be proud.

And don't be a single point of failure, either. I'm encouraging all my 'Nix-using (including, without prejudice, MacOS X) friends to run this thing, and help out with redundant, overlapping checks, too.

How about you? The domain you save from disaster probably won't be your own, but it could easily be one you care about dearly, or that a friend cherishes.

Alternatively, you could think of this as your best shot at ruining a domain squatter's day. Either way, it's awfully good news for decent folk.

Talkback: Discuss this article with The Answer Gang


Bio picture Rick has run freely-redistributable Unixen since 1992, having been roped in by first 386BSD, then Linux. Having found that either one sucked less, he blew away his last non-Unix box (OS/2 Warp) in 1996. He specialises in clue acquisition and delivery (documentation & training), system administration, security, WAN/LAN design and administration, and support. He helped plan the LINC Expo (which evolved into the first LinuxWorld Conference and Expo, in San Jose), Windows Refund Day, and several other rabble-rousing Linux community events in the San Francisco Bay Area. He's written and edited for IDG/LinuxWorld, SSC, and the USENIX Association; and spoken at LinuxWorld Conference and Expo and numerous user groups.

His first computer was his dad's slide rule, followed by visitor access to a card-walloping IBM mainframe at Stanford (1969). A glutton for punishment, he then moved on (during high school, 1970s) to early HP timeshared systems, People's Computer Company's PDP8s, and various of those they'll-never-fly-Orville microcomputers at the storied Homebrew Computer Club -- then more Big Blue computing horrors at college alleviated by bits of primeval BSD during UC Berkeley summer sessions, and so on. He's thus better qualified than most, to know just how much better off we are now.

When not playing Silicon Valley dot-com roulette, he enjoys long-distance bicycling, helping run science fiction conventions, and concentrating on becoming an uncarved block.

Copyright © 2007, Rick Moen. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 142 of Linux Gazette, September 2007

Writing PostgreSQL Functions in C, Part Two

By Ron Peterson

Introduction

In my last article, I introduced the basic framework for creating your own PostgreSQL function in C. In this article, I'd like to expand on that introduction. I'll introduce:

I'm also going to eschew the use of the PostgreSQL extension building infrastructure I used last time, in order to illustrate the details of how PostgreSQL shared object files are built in Linux.

The same prerequisites as in my previous article still apply. All of the code presented here can be downloaded as a single tarball if you would prefer to avoid typing practice (and the consequent frustration of debugging typos, rather than code.)


The End

Before we begin, let's look at what we want to accomplish. Let's say we'd like to create a set of PostgreSQL functions that implement the features of Mark Galassi's excellent GNU Scientific Library. Let's pick one of the library's functions, gsl_complex_add, and see what we need to do to create a corresponding PostgreSQL function. When we're finished, we'll be able to write SQL statements like this:

> select gsl_complex_add( ROW( 3.2e4, -3.2 ), ROW( 4.1, 4.245e-3 ) );

   gsl_complex_add   
---------------------
 (32004.1,-3.195755)

I think it's appropriate to represent complex numbers in PostgreSQL as tuples, where the real and imaginary components get passed around together as a pair. Think of a tuple as a structure in C. The tuple concept jibes with the way we're taught to think about these things in other domains. We'll be using PostgreSQL's CREATE TYPE statement to define the composite type we use as follows:

DROP FUNCTION gsl_complex_add ( __complex, __complex );
DROP TYPE __complex;

CREATE TYPE __complex AS ( r float, i float );

CREATE OR REPLACE FUNCTION
  gsl_complex_add( __complex, __complex )
RETURNS
  __complex
AS
  'example.so', 'c_complex_add'
LANGUAGE
  C
STRICT
IMMUTABLE;

The Stuff in the Middle

OK, so now that we know what we would like to do, let's look at how we get there. I'll dump all of the code on you at one time, and follow up by trying to explain how it works. I won't spend too much time repeating what I say in the code comments though, because that would be redundant, just like this sentence.

// example.c:

// PostgreSQL includes
#include "postgres.h"
#include "fmgr.h"
// Tuple building functions and macros
#include "access/heapam.h"
#include "funcapi.h"

#include <string.h>

// GNU Scientific Library headers
#include <gsl/gsl_complex.h>
#include <gsl/gsl_complex_math.h>

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

// forward declaration to keep compiler happy
Datum c_complex_add( PG_FUNCTION_ARGS );

PG_FUNCTION_INFO_V1( c_complex_add );
Datum
c_complex_add( PG_FUNCTION_ARGS )
{
   // input variables
   HeapTupleHeader   lt, rt;

   bool           isNull;
   int            tuplen;
   bool           *nulls;

   // things we need to deal with constructing our composite type
   TupleDesc         tupdesc;
   Datum             values[2];
   HeapTuple         tuple;

   // See PostgreSQL Manual section 33.9.2 for base types in C language
   // functions, which tells us that our sql 'float' (aka 'double
   // precision') is a 'float8 *' in PostgreSQL C code.
   float8                *tmp;

   // defined by GSL library
   gsl_complex           l, r, ret;

   // Get arguments.  If we declare our function as STRICT, then
   // this check is superfluous.
   if( PG_ARGISNULL(0) ||
       PG_ARGISNULL(1) )
   {
      PG_RETURN_NULL();
   }

   // Get components of first complex number
   //// get the tuple
   lt = PG_GETARG_HEAPTUPLEHEADER(0);
   ////// get the first element of the tuple
   tmp = (float8*)GetAttributeByNum( lt, 1, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_REAL( &l, *tmp );
   ////// get the second element of the tuple
   tmp = (float8*)GetAttributeByNum( lt, 2, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_IMAG( &l, *tmp );

   // Get components of second complex number
   rt = PG_GETARG_HEAPTUPLEHEADER(1);
   tmp = (float8*)GetAttributeByNum( rt, 1, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_REAL( &r, *tmp );
   tmp = (float8*)GetAttributeByNum( rt, 2, &isNull );
   if( isNull ) { PG_RETURN_NULL(); }
   GSL_SET_IMAG( &r, *tmp );

   // Example of how to print informational debugging statements from
   // your PostgreSQL module.  Remember to set minimum log error
   // levels appropriately in postgresql.conf, or you might not
   // see any output.
   ereport( INFO,
            ( errcode( ERRCODE_SUCCESSFUL_COMPLETION ),
              errmsg( "tmp: %e\n", *tmp )));

   // call our GSL library function
   ret = gsl_complex_add( l, r );

   // Now we need to convert this value into a PostgreSQL composite
   // type.

   if( get_call_result_type( fcinfo, NULL, &tupdesc ) != TYPEFUNC_COMPOSITE )
      ereport( ERROR,
              ( errcode( ERRCODE_FEATURE_NOT_SUPPORTED ),
                errmsg( "function returning record called in context "
                      "that cannot accept type record" )));

   // Use BlessTupleDesc if working with Datums.  Use
   // TupleDescGetAttInMetadata if working with C strings (official
   // 8.2 docs section 33.9.9 shows usage)
   BlessTupleDesc( tupdesc );

   // WARNING: Architecture specific code!
   // GSL uses double representation of complex numbers, which
   // on x86 is eight bytes.  
   // Float8GetDatum defined in postgres.h.
   values[0] = Float8GetDatum( GSL_REAL( ret ) );
   values[1] = Float8GetDatum( GSL_IMAG( ret ) );

   tuplen = tupdesc->natts;
   nulls = palloc( tuplen * sizeof( bool ) );

   // build tuple from datum array
   tuple = heap_form_tuple( tupdesc, values, nulls );

   pfree( nulls );

   // A float8 datum palloc's space, so if we free them too soon,
   // their values will be corrupted (so don't pfree here, let
   // PostgreSQL take care of it.)
   // pfree(values);
   
   PG_RETURN_DATUM( HeapTupleGetDatum( tuple ) );
}

Wow, those comments are so illustrative, I think the article is almost finished! Alright, I'll try to explicate a few of the finer points. After all, that's what I don't get paid for.

There's nothing much new going on here relative to my last article until we see the declaration of our HeapTupleHeader variables lt and rt (for "left tuple" and "right tuple".) We're not taking simple data types as arguments here, we're taking tuple arguments that we defined with our CREATE TYPE statement. Each of our tuples have two double precision components, representing our complex number's real and imaginary components.

First, we read our tuple arguments in to rt and lt, using the PG_GETARG_HEAPTUPLEHEADER macro. Then we pick the component values out of our tuple using the GetAttributeByNum function. Refer to the Base Types in C Language Functions section of the manual (33.9.2) for information about how to represent PostgreSQL data types in your C code. In our case, this table tells us that our double precision (aka "float") values in SQL are represented in PostgreSQL C code as "float8 *".

It so happens that our GSL library's complex number functions expect "double" values as input, which on the x86 Linux platform I'm running, are conveniently eight bytes, and map directly to the float8 values used by PostgreSQL. Pay close attention here, because if your data types don't map properly, you'll get a headache.

We then use the GSL library's GSL_SET_REAL and GSL_SET_IMAG macros to construct complex number representations that we can pass to the gsl_complex_add function. We convert the data that GSL understands back into a form that PostgreSQL understands by using the Float8GetDatum function. You can see the set of other typical C type to Datum conversion functions in postgres.h.

To create the tuple we'd like to return, we first construct an array of datum values in our "values" variable. The heap_formtuple function converts this array into a PostgreSQL tuple, which the HeapTupleGetDatum function converts into a datum form we can return with PG_RETURN_DATUM.

If we were working with C strings, we would probably do things a bit differently. I'm not going to illustrate how that works, because The Fine Manual already includes a nice example. Note that the example in the manual is also illustrating how to return a set of tuples, which we are not concerning ourselves with here.

Note the ereport( INFO ... ) function in the middle of our code. I find this function very handy for printing debugging information to the SQL console while I'm developing new code. You can see how this works if you leave this uncommented when you compile and install this code.


Shake and Bake

It's time to turn this code into something we can use. Instead of using the PGXS infrastructure as I did in my last article, we'll get under the hood. It's not only educational to see how to build a shared module, but creating your own Makefile also gives you a little more latitude to tweak your build options just the way you like. It might also make it easier for you to handle building projects with lots of dependencies.

Here's a simple Makefile to illustrate how we build our shared object file. In real life, I'd probably use some automatic variables and such, but I don't want to obfuscate the basic build process with Makefile arcana. The pg_config command is your friend, and will help you ascertain where the include files and such are installed on your system. Building the shared object file is a simple matter of first building a position independent (the -fpic flag) object file, and then linking against all required libraries using the -shared flag to build the shared object file. This is all detailed in section 33.9.6 of the manual, which also includes instructions for other architectures besides Linux.

INCLUDEDIRS := -I.
INCLUDEDIRS += -I$(shell pg_config --includedir-server)
INCLUDEDIRS += -I$(shell pg_config --includedir)
# If you are using shared libraries, make sure this location can be
# found at runtime (see /etc/ld.so.conf and ldconfig command).
LIBDIR = -L$(shell pg_config --libdir)
# This is where the shared object should be installed
LIBINSTALL = $(shell pg_config --pkglibdir)

example.so: example.c Makefile
			gcc -fpic -o example.o -c example.c $(INCLUDEDIRS)
			gcc -shared -o example.so example.o $(LIBDIR) -lpq -lgsl -lgslcblas -lm
			cp example.so $(LIBINSTALL)

The Makefile copies the shared object file into the PostgreSQL library directory, so that we can execute the SQL I showed you at the beginning of this article to create our __complex composite type and our gsl_complex_add function. Just fire up psql as a user with permissions to do such things, and then type '\i example.sql' to do so. And that brings us to...


The Beginning

Well, we started at the end, so I guess that means we're finished. As you can see, once you have grasped the basic framework, you have the whole world of C library functions available for you to use directly within PostgreSQL. This gives you all of the attendant advantages of working within a transactional database system. I hope you find this prospect interesting enough to port some intriguing libraries into PostgreSQL, because Lord knows I certainly don't have time to do it all myself. :)

Happy hacking. And a special thanks to the PostgreSQL coding gurus who made this fantastic database in the first place.


Resources

Talkback: Discuss this article with The Answer Gang


Bio picture

Ron Peterson is a Network & Systems Manager at Mount Holyoke College in the happy hills of western Massachusetts. He enjoys lecturing his three small children about the maleficent influence of proprietary media codecs while they watch Homestar Runner cartoons together.


Copyright © 2007, Ron Peterson. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 142 of Linux Gazette, September 2007

SMTP Authentication with Postfix

By René Pfeiffer

Sending and receiving email is still one of the most important aspects of the Internet. Anyone who has ever worked in first level support knows this. Sending email is not a trivial task anymore, because a lot of Internet service providers fight against unsolicited email known as spam. For end users this means that you have to configure a fixed mail server that accepts email from you and relays it to other servers. This usually works fine as long as you aren't mobile. Laptop users with ever changing IP addresses sometimes need to change their relay mail server depending on their location. Accepting email from dynamically assigned addresses is best done by SMTP Authentication. I will show you how this works.

Prerequisites

Who is Who

The configuration can be done with almost all GNU/Linux distributions. Nevertheless I will focus on using Debian Etch together with Postfix. We will also use encryption, so you need to have OpenSSL and a suitable certificate at hand. The article "Migrating a Mail Server to Postfix/Cyrus/OpenLDAP" in issue #124 shows you how to prepare Postfix for encryption. Your Postfix installation will therefore need Transport Layer Security (TLS) support. On Debian you can enable TLS by installing the postfix-tls package.

We will be speaking of two distinct components when dealing with email.

I will focus on using mutt as a MUA. The confusing advantage of mutt is that it submits the email to a MTA on the local machine for delivery.

Authentication Software

We need a source for the authentication information. The easiest way is to use the Simple Authentication and Security Layer (SASL) framework, which allows us to use a variety of sources through a single mechanism. The packages sasl2-bin and libsasl2-modules are needed for our purposes. sasl2-bin contains the utilities to maintain and query the user database with the passwords and is only needed on the MTA that should use SMTP Authentication. The libsasl2-modules are needed on both sides. Some MUAs already provide code for SASL authentication.

Configuration

Now let's try to get everything to work together seamlessly.

Postfix as Inbound Mail Relay

Postfix will use the SASL Authentication daemon saslauthd in order to decide whether the authentication is correct or not. The query is done by using saslauthd's UNIX socket, usually found in /var/run/saslauthd/mux. This is a problem since Postfix runs in its own chroot environment, /var/spool/postfix/, and can't see the socket. You now have two options - give up Postfix's chroot or move saslauthd's socket to another place. Fortunately, the last option can be done easily by editing /etc/default/saslauthd:

# This needs to be uncommented before saslauthd will be run
# automatically
START=yes

# You must specify the authentication mechanisms you wish to use.
# This defaults to "pam" for PAM support, but may also include
# "shadow" or "sasldb", like this:
# MECHANISMS="pam shadow"

MECHANISMS="sasldb"
PARAMS="-m /var/spool/postfix/var/run/saslauthd"

We changed the MECHANISMS to the SASL database (we don't want to use system accounts for SMTP AUTH) and we moved the socket into Postfix' chroot by using the -m option. We still have to make sure that the path to the socket exists.

antigone:~# mkdir -p /var/spool/postfix/var/run/saslauthd

Now we can turn our attention to the Postfix configuration. It needs to be told that it should use SASL for authentication and what options it should accept. First, we create a directory and a file with the options:

antigone:~# mkdir /etc/postfix/sasl/
antigone:~# cat > /etc/postfix/sasl/smtpd.conf
pwcheck_method: saslauthd
auxprop_plugin: sasldb
saslauthd_path: /var/run/saslauthd/mux
mech_list: PLAIN LOGIN CRAM-MD5 DIGEST-MD5
^D
antigone:~# 

smtpd.conf tells Postfix to ask saslauthd's user database the path to the socket and the allowed authentication options. PLAIN and LOGIN are simple cleartext authentication methods. Leave them out in case your MTA doesn't support encryption. LOGIN is deprecated so you won't need it anyway; I just included it as example. CRAM-MD5 and DIGEST-MD5 based on challenge-response or digests respectively. Most modern MUAs know them, so it's good to allow them in this configuration.

The last thing you need to do is to add the authentication directives to the Postfix main config file /etc/postfix/main.cf:

smtpd_sasl_auth_enable      = yes
smtpd_sasl_security_options = noanonymous,noplaintext
smtpd_sasl_application_name = smtpd
smtpd_sasl_local_domain     = $mydomain
broken_sasl_auth_clients    = no

The first line enables SASL AUTH. The security options define what to accept. It is very important to include noanonymous or else the authentication allows any mail relaying, which is not what you and I want. Be careful to double-check that noanonymous is present! The application name tells Postfix the name that should be used when initiating the SASL server. It corresponds to our file smtpd.conf, which contains the options we wish to use. The SASL local domain defines the realm that should be used as the authentication realm. Every user has a login, a realm, and a password. Usually the realm corresponds to the domain your server is part of. The last option deals with the special needs of some MUAs. Set this option to yes if a Microsoft Outlook Express Version 4 and Microsoft Exchange server Version 5.0 use your Postfix as authentication mail relay. Otherwise, it is safe to use no.

We still need to tell Postfix that authenticated clients are ok. You can configure this with the smtpd_recipient_restrictions directive.

smtpd_recipient_restrictions =
    permit_mynetworks,
    reject_unlisted_recipient,
    check_recipient_access hash:/etc/postfix/rcpt_blacklist,
    check_helo_access hash:/etc/postfix/helo,
    reject_non_fqdn_sender,
    reject_unknown_sender_domain,
    permit_sasl_authenticated,
    reject_unauth_destination,
    reject_rbl_client realtimeblacklist.example.net,
    check_policy_service inet:127.0.0.1:60000,
    permit

We added a permit_sasl_authenticated right before the blacklist and greylist check. Make sure you accept the authenticated connection as soon as possible, but don't skip important checks in case the MUA gets something wrong. The files rcpt_blacklist and helo are custom hash files with blacklisted addresses and faked name in the HELO/EHLO dialog. You can skip them if you don't have them yourself. The same is true for the real time blacklist. You don't have to use one.

We're almost done. We only need the account with username and password. You can add users by using the saslpasswd2 command.

antigone:~# saslpasswd2 -u your.realm username

The tool will prompt twice for the password. Now you are all set. Reload or restart saslauthd and Postfix. Make sure the UNIX socket in Postfix's chroot environment was created. Check with telnet for the SMTP AUTH banner.

lynx@agamemnon:~ $ telnet antigone.luchs.at 25
Trying 127.0.0.1...
Connected to antigone.luchs.at.
Escape character is '^]'.
220 antigone.luchs.at ESMTP ready
EHLO client.example.net
250-antigone.luchs.at
250-PIPELINING
250-SIZE 10380902
250-ETRN
250-STARTTLS
250-AUTH DIGEST-MD5 CRAM-MD5
250 8BITMIME
QUIT
221 Bye
Connection closed by foreign host.
lynx@agamemnon:~ $ 

If everything works you should see the string 250-AUTH DIGEST-MD5 CRAM-MD5 after the HELO/EHLO command.

Postfix as Outbound Mail Relay with Authentication

Since I use mutt, the component that deals with SMTP is my local Postfix. It doesn't know about SMTP AUTH yet, but we only need two additional options in main.cf

smtp_sasl_auth_enable   = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd

The first directive enables SMTP AUTH in Postfix's SMTP client component. The second dictates which username and password to use when talking to each server. A sample sasl_passwd map looks like this:

smtp.example.net        username:seckrit
192.168.1.1             username2:geheim

Don't forget to create the hash of the map by using postmap /etc/postfix/sasl_passwd. Now point your relayhost variable to one of the servers listed in sasl_passwd and reload the Postfix configuration. Mail relay should now be using SMTP AUTH. If the login fails, check for the presence of the libsasl2-modules package. Without it Postfix will try to use authentication, but will fail because no suitable authentication methods can be found.

One Word About Encryption

Although I didn't show how to configure encryption in this example, I strongly suggest using TLS with every MTA you run. The setup isn't too hard and having encrypted SMTP AUTH sessions is the best way to protect the passwords.

Useful Resources

This is one of many articles written about this topic. You can find more details here:

Talkback: Discuss this article with The Answer Gang


Bio picture

René was born in the year of Atari's founding and the release of the game Pong. Since his early youth he started taking things apart to see how they work. He couldn't even pass construction sites without looking for electrical wires that might seem interesting. The interest in computing began when his grandfather bought him a 4-bit microcontroller with 256 byte RAM and a 4096 byte operating system, forcing him to learn assembler before any other language.

After finishing school he went to university in order to study physics. He then collected experiences with a C64, a C128, two Amigas, DEC's Ultrix, OpenVMS and finally GNU/Linux on a PC in 1997. He is using Linux since this day and still likes to take things apart und put them together again. Freedom of tinkering brought him close to the Free Software movement, where he puts some effort into the right to understand how things work. He is also involved with civil liberty groups focusing on digital rights.

Since 1999 he is offering his skills as a freelancer. His main activities include system/network administration, scripting and consulting. In 2001 he started to give lectures on computer security at the Technikum Wien. Apart from staring into computer monitors, inspecting hardware and talking to network equipment he is fond of scuba diving, writing, or photographing with his digital camera. He would like to have a go at storytelling and roleplaying again as soon as he finds some more spare time on his backup devices.


Copyright © 2007, René Pfeiffer. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 142 of Linux Gazette, September 2007

HelpDex

By Shane Collinge

These images are scaled down to minimize horizontal scrolling.

Flash problems?

Click here to see the full-sized image

Click here to see the full-sized image

All HelpDex cartoons are at Shane's web site, www.shanecollinge.com.

Talkback: Discuss this article with The Answer Gang


Bio picture Part computer programmer, part cartoonist, part Mars Bar. At night, he runs around in his brightly-coloured underwear fighting criminals. During the day... well, he just runs around in his brightly-coloured underwear. He eats when he's hungry and sleeps when he's sleepy.

Copyright © 2007, Shane Collinge. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 142 of Linux Gazette, September 2007

Tux