2004 and my Bail Bond Web Scraper

A few days ago Youtube suggested a video to me how to make a "Build a Web Scraper (super simple!)". This reminded me, that I have to write one more story from my past, where I think I was a pioneer.

But let's put it into context. So the huge Online Service Provider I was working in 2002, decided to shut down our office in Munich, and set me free with a 6 month separation period. So I decided to make a 3 month visit to my sister in the United States of America, she was an US citizen since many years now and living in the south of the US.

During my time there I met many interesting peoples, including my later employer in the US. He owned a Bail Bonding Business and had a need for a reliable IT guy, since his IT guy died in a car accident a few month earlier. He had hired some IT guys since then, but they never worked out.

Since I passed an opportunity to study and work in Boston, MA when I still was a student of the Computer Science in Regensburg, Germany in 1988. So here was the chance to full fill my dream to life and work in US, so naturally I accepted his offer. And so in mid 2003 I was living in the south of the US and working in IT at a Bail Bonding Company.

At this Bail Bonding Company I was the jack of all "IT" trades. Did everything which was computer related. Trained the agents, made backups, made sure the internet was working, we had backup power supplies, the bail bond management system was working, bought new computers and replaced outdated one, replaced printer cartridges and the list of responsibilities goes on. It included even taking pictures and doing paperwork from released inmates. But after a 4-5 month period I had a grip on things, organized and optimized and brought everything up to shape and standards, introduced maintenance and to do schedules, so the job started to get boring.

But luckily for me the county jail started to publish the inmates on their web page. Before that, the agent had to call the jail for a fax or even drive there to get a copy of the new inmates. So naturally after I learned about the new web page, I told my boss, and he agreed with me and saw also the potential for our company. So I got the go and could start the task to train the agents to use the county jail webpage and the online inmates list.

But that was harder than I thought. First of all we had guys who had no or only little PC experience, they knew how to start the bail bond management program, enter data, make photos and do printing. But using a Webbrowser was for most of them new. You have to keep in mind our agents, where Ex-Cops, Ex-Military, Ex-Martian-Art Coaches and that sort, so training them in computer stuff was hard. And even writing a step-by-step training manual was not so successful.

Hence I decided to write a program. Luckily the former IT guy of the Bail Bond Company had bought a VB6 license, so I could start writing a VB program which embedded the Microsoft Web component and I could simplify all the Webbrowser stuff, so the agents could be trained much easier. I even added a button to the Bail Bond Management System, so they could start it from there not only from the Desktop.

The first version of my program was a success and it speeded up the process identify arrested clients of us. But soon I found a flaw. Since we now longer had a fax or a hardcopy of the inmate list, we had to print out the information. And printing of the web information was hard and complicated for our agents. So the agents wouldn't print information very often and therefor information got lost. Cause when an inmate was released, the information was no longer available on the webpage. As reaction to that I started to save the information in the background, and it could be retrieved later, if needed. But the retrieval was still a manual process and could only done by me. Also the problem was that the information was only available on the PC the agent was using. Since we had around 10 PC in use, it sometimes was a guessing game where to find the information to retrieve.

So I decided to improve the system. I began researching how to extract data from a webpage. Back then in 2004 Web scraping was not unknown, but you could not find many information about it. At least I couldn't find it. And Youtube was also not a thing, yet ;-) But after some research I found a java component which could parse html code and extract information from it. That was the start point for the Jail Data Collector. This was basically a java program which accessed the jail webpage inmate list, accessed the individual inmate information, downloaded all referenced inmate pictures and stored them into a folder on our server.

Since the inmate jail list and the detail pages, were simple generated html documents, it was easy to extract information and store it. After I did that, I adapted the VB6 Jail Inmate Browser, removed the Web component and replaced it with an image viewer for the pictures and a text box for the text files with the inmate information. And most importantly a working print feature. So it didn't matter anymore if the information was still present on the county jail webpage, we had it stored locally on our server.

After a while the server got cluttered with inmate information text files and pictures. So I started the transformation to a DB based system. I created a MySQL DB replication cluster on linux. We had two locations, which needed to be independent, because of power- and/or internet outages, but where regularly synchronized. First I rewrote the storage module of my Jail Data Collector from file to DB. After that worked flawless, I migrated all the old inmate text files to the new DB. Then the Jail Inmate Browser was adapted to use the information from the DB. The pictures where still coming from the central folder from the server. One at each location. Synchronized by the Jail Data Collector.

The synchronization was a simple mechanism, driven by existents, date and time only. So we lost some pictures sometimes, when the download of a picture was interrupted and the image was broken. So I added a md5 and sanity-check later. But it worked on the most part perfectly.

It took me till spring 2005 until every component of the Jail Information System was done and the system was finished, used by our company and most importantly by our agents. We saved a lot of our bond money, cause we didn't have to pay the bail to the court. And it brought us a speed benefit, cause if a former client was arrested again, we knew it before our competitors, so we could make the business with his family much earlier and also could collect outstanding payments as well. Cause Bail Bond hopping was a thing back then.

Ok, so the system was done and I could switch to maintenance mode, again, until I left late 2006 and went back to Germany. The Jail Information System ran as I learned later till 2015. Of course there were some adjustments necessary, in the 3 years after I left I did it later IT consultants and programmers did their work with it. Things what needed adjustments where e.g. a work around after introduction of a basic authentication and later a personalized login. Occasionally an OS update, like Windows Server or the Fedora Linux Updates came along, some HW faults with HD's happened and once in a while a PC broke down and needed replacement. I did the maintenance till 2009 remotely from Germany. After that it was done by IT consultants with the occsional support from me. A $500US check I once in a while gladly accepted.

Eventually the Jail Information System stopped working in 2015, because the County Jail switched to a new authentication (captcha) system and they stopped publishing the inmate pictures, too. So it ment rewriting the whole parser and data extraction module. Since I didn't wanted to do it, cause I hadn't programmed java since 2007 and all IT consultants wanted too much money, at least that is what the Bail Bonding Company owner said. So it was not updated and was shut down. Even I designed it modular, so it could be easily done by an experience java programmer. Simply write a new html retrieval and data extraction component, store the data to the existing DB, and voila, it would have worked again. Ok, one thing would have become a little challenge: Overcome the re-captcha protection. But an experienced java programmer should have found a way to overcome that.

Anyway, that was my last story, where I think I was a pioneer doing stuff not many people where doing at that point in time. At least I like to think so. But as usual I leave it up to you, if you think I was a pioneer in 2004 doing a web scraper to extract data into a database and hence giving a company a business advantage.

After that I have the feeling I was never really able to do new stuff anymore. It was already done, you could buy, or there was an open source software available. But never the less I still think it was an interesting work life in IT.

As always apply this rule: "Questions, feel free to ask. If you have ideas or find errors, mistakes, problems or other things which bother or enjoy you, use your common sense and be a self-reliant human being."

Have a good one. Alex