Website Extraction For Dummies

Computers & TechnologyInternet

  • Author Freddy A Johnson
  • Published May 29, 2011
  • Word count 475

As of 2011, there is over 5 million terabytes of data on the internet. This accounts to over 5 million home computers filled to their full capacity. And this number doubles every 5 years.

All this information is accessible to all of us and most of it is free. Unfortunately, this data is presented to us in a way that makes it easy for an average user to browse and look around. But not for a business to store, analyze and process this information.

This is where web page scraping comes handy. I have searched for weeks, if not months, looking for a solution to this problem. I found a few companies offering their web scraping services but at a ridiculously high rate. I also found some freelancer sites and found some professionals dedicated to web scraping. Better prices, but still a little high for something that a computer program could do. I'm more of a do-it-yourself kind of person anyway. So how about some DIY web scraping tools?

Although there are several out there, Helium Scraper is perhaps the easiest, yet powerful one I have ever found. It's relatively new, so you might have not heard about it. When I first tried it, I was actually quite disappointed by how elementary and plain the main screen looked. But after following the basic tutorial that comes with it, and playing with it a little, I managed to set it up to extract data that would have been impossible to extract with any other web scraper I have tried before.

This is how it works, in a nutshell:

First, you create some items called kinds. These are the way you tell Helium Scraper what is what in a web page. Basically, you highlight a few elements in a page, and say "this are phone numbers" or "this are links" or "this are whatever". Then Helium Scraper finds a pattern and recognizes what you meant by "phone numbers", "links" or "whatever".

Next, you create the actions you want Helium Scraper to perform with the kinds you just created. Here you can automate it to perform just any action you would normally do with a browser, such as clicking or navigating through links, plus, of course, extracting data. They are organized as an intuitive tree where you, for instance, would add an "Extract" and a "Navigate" action inside a "Repeat" action to have Helium Scraper repeatedly extract information from a search results page and then navigate to the next page.

Even though Helium Scraper doesn't require any programming skills, one could greatly benefit from some JavaScript knowledge. I'm myself not a computer programmer, but with a little googling, I've managed to set it up to perform more complicated tasks, such as automatically filling and submitting forms, simulate user selections in combo boxes, and processing the results before being extracted to the database.

Freddy A Johnson have been in the SEO business for more than a decade. To try Helium Scraper go to http://www.heliumscraper.com

Article source: https://articlebiz.com
This article has been viewed 1,487 times.

Rate article

Article comments

There are no posted comments.

Related articles