Technical notes regarding the Recreated Earthwaves WWWBoard archive. March 13, 2024. The archive of the original WWWBoard version of the Earthwaves discussion forum is now back online. This is the culmination of nearly two months of work. There are some important things to note about this archive. As you may have noticed, it is described as 'recreated'. These are not the original files, but rather recreated files using the originals as primary source material. This was necessary for many technical reasons. I still retain an archive image of the message files and data structure as they originally existed. Although the new files are not the original, as the current owner and curator of the archive, I made every effort to maintain the integrity of the original content as the original authors expressed it. Actual message content has not been altered. No messages were removed or edited (except munging email addresses). If anything, previously lost or unavailable messages have been restored. Below I will detail any changes I made, and why. ------------------------------------------------- But first, some history. I took ownership of earthwaves.org from the previous owner, Canie, at the beginning of 2013. At that time the forum was running WWWBoard, a Perl script dating from 1995. I was already considering replacing the software with something newer and more up-to-date. However, I let things stay the way they were for a year. For good or bad, January 5, 2014 saw the last post on the old forum. I had intended for the old forum to remain online, accessible as an archive for the users. Having started in late 1999, there are 14 years of discussions that I never wanted to see lost. However, bots had other ideas. Although I had disabled the ability to post or reply by simply deleting the scripts that ran the board, bots started hitting the files trying to post anyway. Less than a year later the bots were hitting the old forum so hard that I received a complaint from my webhost about the excessive traffic. I had no choice but to remove the old forum altogether. My intention was to go through the files and remove any last vestiges of anything that the bots could grab on to and put the archive back on line. Then life happpened. Lets just say my world took some unexpected turns and I had more pressing matters to worry about. Regrettably, it is now nearly 10 years later. I've never forgotten about the old archive but was never had the necessary time to do anything about it. Perhaps no one missed it anyway? But certain recent discussions on other forums motivated me to finally tackle the problem. And tackle it I have. Being that WWWBoard is such old software, every message was stored as a small .html file unto itself. And they all had vestiges of the ability to post or reply to the board. This needed to be removed as this is what the bots were attacking. The same held true for the main page and the archive pages to older messages. However, there are over 40,000 messages, thus over 40,000 html files that needed to be fixed. Certainly not something to be done by hand. Fortunately my programming skills have advanced quite a bit over the years and I 'simply' wrote a program to do all the hard work for me. It was easy enough to remove all the posting/replying code from the files. But during this process I discovered that there were a lot of errors and missing posts. During the 14 years that the WWWBoard was running, occasional glitches and webhost hiccups occured. Some of these mistakes the previous board op could fix, others were just too daunting. Thus, a lot was missing or not indexed properly. I started fixing some of these errors manually, but the list of errors just kept growing as I discoverd more in the process of implementing fixes. The problem was more complex than originaly realized. I finally decided a new approach was needed. In the end, I would write a program that would read in all the original files, store the essential data in memory, then use that database as the source to write new index files and individual html files. The program would thus fix all the indexing and threading issues automagically. All I had to do was write the code. Har har har. Not so easy. For most of the files things were straight forward, but as errors in the orignal files were encountered I had to write exceptions into the code to handle them. I also added code to seek out errors and alert me to them so I could determine an appropriate fix. This paid off as I was able to recover a couple thousand posts that simply were not available previously. Unfortunately, omissions still exist. There are some messages that the previous board owner felt needed to be removed, and these are simply gone. In order to maintain consistency in the threading structure, I have created 'placeholder' messages. These messages are placed in the correct location in the threads, and in all but one case I was able to infer the original author and time of the post. In all cases, however, the original contents are still lost. But the result is an archive that is more complete and self consistent. ------------------------------------------------- In consideration of transparency, below you will find the list of notes I made regarding what changes I had to make to each file. There are a total of 40,615 messages. There are many gaps in the numbering sequence. It was sometimes desirable to reset the message sequence to a new neat number to better keep track of issues with the board. Note that, since these are recreated files, the stylistic format may not match exactly the original files, especially from the very beginning of the board in 1999, where the style (color, logos, etc...) were still in flux. These recreated messages DO NOT reflect that original style, but the style as it existed at the end of the board's run. Main page and archives: Originally, after a period of time, the board would be archived and the main page reset. This was done to limit the size of the main board webpage as it would get pretty large. Old computers and browsers of the time were not as powerful as they are today. For older messages there was an archive page with links to the previous main pages for blocks of messages. Archiving was done fairly irregularly, mostly when the main page got too large, but also when necessary due to system glitches or issues with posters. In this recreation. The main page now contains the ENTIRE list of posts and replies in one large file. Modern browsers can easily handle this load, although it may make scanning the archive unwieldy. There is still a link to an archive page, and from there are links to individual archive sets broken down by year. Note about message numbers 900025 - 901512: About 1500 of the very first messages ever posted were unavailable previously. Fortunately, I found the missing message files were safely tucked away in their own folder. Due to later system glitches, some of the same message numbers were inadvertantly reused in the main folder. In an effort to preserve and return these previously unavailable messages, they were renumbered to the 900000 block so they would not interfere with those later messages reusing the same message numbers. Although these message numbers are higher than the last message (101593), they are in fact chronologically the first posts made. The recreated archive preserves the chronological order regardless of message number. Regarding the Predictions list: This was originally accessed using it's own Perl script, and prediction posts were also copied into their own subfolder. Instead, my program simply makes a simple html file with a chronological and alphabetical list of prediction posts. Changes made to individual messages: 2007 - found (file was missing from main message directory, but found elsewhere on the site and restored) 2557 - BLANK - placeholder created, missing body 2582 - BLANK - placeholder created, missing body 2763 - found 3164 - found 3479 - found 3768 - BLANK - placeholder created, missing body 3986 - BLANK - placeholder created, missing body 4092 - BLANK - placeholder created, missing body 4380 - found 4809 - malformed title, had '