Just_Drawing_2035 | Users

Just_Drawing_2035 User

Apr 2021

This is WIP userscript for scraping book image links from archive.org
You can try scraping image links of this book https://archive.org/details/eastofsunwestofm00asbj/page/100/mode/2up

The script will work fine as long as you stay on the tab in which the website is opened. If you switch to a different tab though, the script will stop. I am guessing this problem is caused by MutationObserver. What am I doing wrong?

I tried this same script in the form of browser extension, and the same thing happens. Soon as you switch to a different tab, it stops working.

I am using firefox 87.0 (64-bit).

// ==UserScript==
// @name        archive.org ripper
// @namespace   Violentmonkey Scripts
// @include     https://archive.org/*
// @include     https://www.archive.org/*
// @grant       GM_download
// @run-at      document-idle
// @version     1.0
// @author      -
// @description Rips books from archive.org
// ==/UserScript==

console.log("archive.org...");

const targetNode = document.getElementById('IABookReaderWrapper');
const config = { attributes: true, childList: true, subtree: true };
var img_links = [];

const callback = function(mutationsList, observer) {
    for(const mutation of mutationsList) {
        if (mutation.type === 'childList' && mutation.target.classList[0] === 'BRpagecontainer' && mutation.target.classList[1] === 'BRmode2up') {
              var images = mutation.target.getElementsByClassName('BRpageimage');
              img_links.push(images[0].src);
              if(img_links.length % 2 == 0) {
                  var btn_next = document.getElementsByClassName("book_flip_next")[0];
                  btn_next.click();
                  console.log(img_links.length);
                  console.log(img_links);
              }
        }
    }
};

const observer = new MutationObserver(callback);
observer.observe(targetNode, config);

Just_Drawing_2035 User

Apr 2021

Re: @Martii:

Have you tried the script in your browser? Are you facing this problem too. I am on firefox 87.0 (64-bit).

Just_Drawing_2035 User

Apr 2021

Re: @Martii:

I tried MutationObserver, and it worked well. Thanks. Although I now have a new problem.
Here is the new code:

// ==UserScript==
// @name        archive.org ripper
// @namespace   Violentmonkey Scripts
// @include     https://archive.org/*
// @include     https://www.archive.org/*
// @grant       GM_download
// @run-at      document-idle
// @version     1.0
// @author      -
// @description Rips books from archive.org
// ==/UserScript==

console.log("archive.org ripper...");

const targetNode = document.getElementById('IABookReaderWrapper');
const config = { attributes: true, childList: true, subtree: true };
var img_links = [];

const callback = function(mutationsList, observer) {
    for(const mutation of mutationsList) {
        if (mutation.type === 'childList' && mutation.target.classList[0] === 'BRpagecontainer' && mutation.target.classList[1] === 'BRmode2up') {
              var images = mutation.target.getElementsByClassName('BRpageimage');
              img_links.push(images[0].src);
              current_url = window.location.href;
              if(img_links.length % 2 == 0) {
                  var btn_next = document.getElementsByClassName("book_flip_next")[0];
                  btn_next.click();
                  console.log(img_links.length);  
                  console.log(img_links);
              }
        }
    }
};

const observer = new MutationObserver(callback);
observer.observe(targetNode, config);

Now all this works quite well. So I open https://archive.org/details/eastofsunwestofm00asbj/page/n19/mode/2up and the code works by storing the url of current images into the array img_links and then the btn_next is clicked, and the process goes on.

The problem is that when I switch to browser-tab-B while this script is in action in browser-tab-A. The script stops working. Why is that? and how can this be solved?

Just_Drawing_2035 User

Apr 2021

// ==UserScript==
// @name        archive.org ripper
// @namespace   Violentmonkey Scripts
// @include     https://archive.org/*
// @include     https://www.archive.org/*
// @grant       GM_download
// @run-at      document-idle
// @version     1.0
// @author      -
// @description archive.org ripper
// ==/UserScript==


console.log("archive.org ripper...");

document.onreadystatechange = function () {
  if (document.readyState === 'complete') {
    var img = document.getElementsByClassName("BRpageimage");
    if(img) {
      console.log("Found");
      console.log(img);
      console.log(img.length);
      console.log(img.type);
      console.log(img[0].src);
    } else {
      console.log("Not Found");
    }
  }
}

Sometimes this script works as expected. You can try it on https://archive.org/details/eastofsunwestofm00asbj/page/n19/mode/2up

But sometimes the console throws these errors:

archive.org ripper...
Found
HTMLCollection { length: 0 }
0
Uncaught TypeError: img[0] is undefined
    onreadystatechange moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:24
    VMin0bjzz1xm9 moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:16
    VMin0bjzz1xm9 moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:88
    a moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
    v moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
    set moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
    <anonymous> moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:1
    c moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
    ScriptData moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
    onHandle moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
    c moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1

That HTMLCollection section when expanded shows two image tags but the line console.log(img.length) prints "0". Also, img[0].src is throwing error. Why is that? and how can this be resolved?

Just_Drawing_2035 User

Donate for the site OpenUserJS