BookMark Magazine ......... Education with Imagination .......: Data Harvesting with Ruby: Lesson 1: Downloading Web Pages

Thursday, March 27, 2014

Data Harvesting with Ruby: Lesson 1: Downloading Web Pages

In order to harvest or scrape data from the world wide web, you have to learn how to download iles from the web. The sample code below, a simplifief method, illustrates a program that extracts the default page of bookmarktutoring.com.

require 'net/http'
url = 'http://www.bookmarktutoring.com/default.html'
uri = URI.parse(url)
response = Net::HTTP.get_response(uri)
puts response.body

In Ruby, downloading a file from the Internet requires the net/http gem. This code package or gem allows you to access (read and write) methods on files stored on remote servers.

The code extracts the main page of BookMarkTutoring.com and displays it on the screen. Note that in order to access the main page of bookmarktutoring the default or index page of the web site is used.

On the first line of the code, the net/http gem is imported into the program. Next the page to read into a variable called URL. Note that the web page is required to be in single quote marks.

The parse method of the URI is then involved. This command splits up the URL into a parsed segments so that the server where the web page resides can locate the file.

The method "get_response" is then called. This method retrieves the response of the server. The resulting response of the server is stored in the variable called "response."

The body method is then invoked on the response object. The body contains the data contained in the page returned.

Lesson 2 Ruby Data Harvesting, Downloading Web Pages, Readlines, Arrays

BookMark Magazine ......... Education with Imagination .......

Thursday, March 27, 2014

Data Harvesting with Ruby: Lesson 1: Downloading Web Pages

No comments:

Post a Comment