Old forum archive issue
#1
I have been contacted by my web host complaining about the number of files in one directory. Specifically, it's all the messages from the old forum software. The code stored each message as it's own HTML file and put them all in one directory, and there's over 38,000 messages!!!

Anyway, they are threatening to delete the files in 48 hours if I don't fix the problem, saying it's degrading the performance of their servers to other websites.

I have responded, explaining the nature of the files, their purpose, that the forum has migrated to new software and therefore there won't be any additional files, that there should be little activity on those files, and that breaking them up into multiple directories will cause problems that will take more than 48 hours to resolve. I have asked for an exception or 'stay of execution'.

I have promised to resolve the issue regardless of their response, but if they are not understanding, the old forum may disappear for a while.

Nothing will be lost as I have multiple backup copies of the old forum.

But I'm hoping they let the files stay for now as I would just have to upload them all again once I reorganize things. Otherwise it's a simple matter to just move the files on the server once I have a solution.

Fixing the problem is not as simple as it might seem at first. The old software did things in a very simple brute force way. Breaking up the files into multiple smaller directories will require me to recreate new index pages for each archive. Doing so will require me to write software to parse the html of every message in order to recreate the structure of the file threads. And, it will break any links within messages that refer to other messages that lie in other directories, so those would have to be fixed as well.

In hindsight, this just goes to show that my having decided to migrate to this new software was a good move. The old software was just never meant to handle that level of activity.

Brian





Signing of Skywise Sed quis custodiet ipsos Custodes?
Reply
#2
I've taken the old forum completely offline for now.

After examining the website logs, it looks like someone or something has been methodically pounding the old website for at least the six months back that I bothered to look.

Search engines crawl the website to index the pages, but my understanding is they may only do this once a day at most.

But there are hundreds of messages that have been accessed thousands of times per month. I can think of no legitimate reason for this level of access to those message archives.

So the old forum is going to remain offline until I finish reorganizing them and can further examine my access logs to figure out who is responsible for this activity.

Brian





Signing of Skywise Sed quis custodiet ipsos Custodes?
Reply
#3
I found the exact nature of the problem.

Comment spammers.

I found this by examining the logs for a typical message that was accessed repeatedly. They were all attempts to post messages.

I checked the IP address of a dozen or so attempted posters and they all showed up in various blacklists.

Although the posting ability on the old forum is disabled by removing the posting code, the message boxes still exist within the messages themselves. Therefore the spambots thought they could still post a message.

So I have my work cut out for me. Every single message file has to be modified to remove the code that shows the posting forms before I can expose them to the world again.

Brian





Signing of Skywise Sed quis custodiet ipsos Custodes?
Reply
#4
Thanks Brian,

I think its up to you whether you do this work or not (of course). It is probably OK that links between old posts remain broken. You could add a note to the page that people have to manually find the linked post (if the post number comes up?). But, I suspect the archives are not much used.

Chris




Reply
#5
(11-16-2014, 12:14 PM)Island Chris Wrote: Thanks Brian,

I think its up to you whether you do this work or not (of course). It is probably OK that links between old posts remain broken. You could add a note to the page that people have to manually find the linked post (if the post number comes up?). But, I suspect the archives are not much used.

Chris

It's hard to say how much they're used. But part of my reason for taking over was to preserve their existence.

But maybe it doesn't matter to anyone?

I'll do the work, but it's not a rush thing. I see it as a programming exercise to help me build my skills at C++, which I've only been using a year.

Brian





Signing of Skywise Sed quis custodiet ipsos Custodes?
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)