Many reasons to archive Web sites

13 April 2007
Amsterdam, Netherlands

Internet content changes at an incredible rate. With a good content management system, it becomes easy to adapt Web pages and electronic forms, which introduces new challenges for those ensuring quality and control over their Web content.

Web sites now provide all possible types of information including advertising, promotional instruments and product information and also include sensitive information such as reference guides, contracts and insurance policies.

This sensitive information introduces new requirements for Web site managers and compliance officers. For example, how can a bank ensure transparency and secure retrieval of online information after a visitor has viewed it? And how does an insurance provider prove who has subscribed for a policy or agreed to a clause as was shown on their Web site?

Providing archives of Web sites can be a considerable challenge for many companies. In addition to compliance related requirements, archived Web sites are increasingly important from a historical point of view. As of 2003, UNESCO explicitly included Web pages to the digital heritage. In daily business practice, old Web sites offer important insight into the evolution and development of a company. This type of information can be extremely useful for internal quality audits and for improving the performance of the organization. 

Electronic archiving is not new. Over the past decades, information such as financial data, scans of printed material, medical data and personal data have been stored in various databases and systems. Storing Web sites, however, must also include the original context; the archived version must look the same as it looked for a particular moment in time.

Financial businesses recognize the need

Tridion held an interactive session with customers from the finance sector – including ABN AMRO, the Nederlandsche bank, Delta Lloyd and Interpolis – and all of the participants recognized the need to archive Web sites. Most companies however only use traditional archiving methods or manage archiving using ad hoc solutions such as printing Web pages or storing them on a CD-ROM. 

Outside the financial sector, other sectors also feel the need to archive. Some of Tridion’s governmental customers have already faced this reality. A recent conflict involved a Belgian customer of a Dutch welfare agency. In this case, the customer referred to an old Web site, stating that all address changes are processed automatically. However, the Web site did not mention that this process only applied to residents of the Netherlands. Yet another institute regularly reviews previous Web sites to see what they have already communicated to the public. As another example, a large industrial customer that does business in the US now requires archiving solutions when confronted with strict requirements of Sarbanes-Oxley. 

Many businesses feel the need to archive, however suppliers have been surprisingly quiet. All archive vendors exclusively focus on traditional archiving methods. Tridion’s Archive Manager is the first product specifically developed to store Web sites. Archive Manager store Web sites exactly as they appeared to visitors at a specific moment in time. This sounds easier than it actually is.

Archiving challenges


One of the first challenges to archiving Web sites is that Web sites are dynamic. They run on specific software and this influences the way a Web site is displayed. Both browser settings and layout modify the way in which is Web page is displayed. The way in which related content is displayed in the byline, the banner or the links to other parts of the site influence perception of content. In addition, many Web sites are personalized. As a result, the way in which content is displayed is stipulated by the visitor’s profile. Finally, the content of a Web site is continuously refreshed, making it extremely difficult to keep up with someone has read at any given point in time.

A content management system (CMS) helps companies to create and manage content in all online channels. Tridion’s Archive Manager uses the same technology for archiving and retrieving online communication. Regardless of how big or small the change, the publication process stores all versions including all links, images and information on about the different profiles. During this storage process, unmodified content is not stored again. For example, related PDFs that are exactly the same are not stored again. This way, Archive Manager saves on storing capacity to maximize accessibility.

The need for a human touch

Archive Manager automates the entire process. Since this process is fully integrated with Tridion R5, no additional actions are necessary. Any modifications to content are stored based on date and time and can easily be retrieved. Despite these advantages, archiving Web sites is not always sufficient to meet the increasingly strict demands of compliance. Company processes are just as essential as the communication of information. Organizations must be responsible for the authorization process and need to determine the parts of their Web sites that need to be archived.