#22 new
sgray

Detail pages return duplicate data.

Reported by sgray | June 8th, 2009 @ 02:20 PM

I'm copying a post I made on the google groups blog that describes the issue.

You are having the same problem I reported in "Detail page problems".
There appears to be a bug in detail page processing. I was able to
make my examples work by changing detail_page_filter.rb. The change I
made is a kludge at best however. It does highlight the issue.

Here's what I changed in detail_page_filter.rb.

  #if @detail_extractor.nil?
  #  @detail_extractor = Extractor.new

@parent_pattern.extractor.mode, @parent_pattern.referenced_extractor

  #  root_results = @detail_extractor.result
  #else
  #  root_results = @detail_extractor.evaluate_extractor
  #end

  @detail_extractor = Extractor.new @parent_pattern.extractor.mode, @parent_pattern.referenced_extractor
  root_results = @detail_extractor.result

This creates a new @detail_extractor each time. There are problems in
the evaluate_extractor method. When called it returns the previous
results. I'm not sure why but the method evaluate_extractor in
extractor.rb has a couple of issues when processing detail pages that
I don't know how to resolve.

  catch :quit_next_page_loop do
    loop do
      url = get_current_doc_url #TODO need absolute address here

2/4

      @processed_pages << url
      @root_patterns.each do |root_pattern|
        @root_results.push(*root_pattern.evaluate(get_hpricot_doc,

nil))

      end

The line "url = get_current_doc_url" always returns nil. The line
"@root_results.push(*root_pattern.evaluate(get_hpricot_doc, nil))" is not executed. I falls out of the loop without doing anything and
returns the previous results.

Scott

No comments found

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

A simple to learn and use, yet powerful web scraping toolkit written in Ruby.

People watching this ticket

You can update this ticket by sending an email to from your email client. (help)