Are you sure you want to go to an external site to donate a monetary value?
WARNING: Some countries laws may supersede the payment processors policy such as the GDPR and PayPal. While it is highly appreciated to donate, please check with your countries privacy and identity laws regarding privacy of information first. Use at your utmost discretion.
This is WIP userscript for scraping book image links from archive.org
You can try scraping image links of this book https://archive.org/details/eastofsunwestofm00asbj/page/100/mode/2up
The script will work fine as long as you stay on the tab in which the website is opened. If you switch to a different tab though, the script will stop. I am guessing this problem is caused by MutationObserver. What am I doing wrong?
I tried this same script in the form of browser extension, and the same thing happens. Soon as you switch to a different tab, it stops working.
I am using firefox 87.0 (64-bit).
// ==UserScript== // @name archive.org ripper // @namespace Violentmonkey Scripts // @include https://archive.org/* // @include https://www.archive.org/* // @grant GM_download // @run-at document-idle // @version 1.0 // @author - // @description Rips books from archive.org // ==/UserScript== console.log("archive.org..."); const targetNode = document.getElementById('IABookReaderWrapper'); const config = { attributes: true, childList: true, subtree: true }; var img_links = []; const callback = function(mutationsList, observer) { for(const mutation of mutationsList) { if (mutation.type === 'childList' && mutation.target.classList[0] === 'BRpagecontainer' && mutation.target.classList[1] === 'BRmode2up') { var images = mutation.target.getElementsByClassName('BRpageimage'); img_links.push(images[0].src); if(img_links.length % 2 == 0) { var btn_next = document.getElementsByClassName("book_flip_next")[0]; btn_next.click(); console.log(img_links.length); console.log(img_links); } } } }; const observer = new MutationObserver(callback); observer.observe(targetNode, config);
Re: @Martii:
Have you tried the script in your browser? Are you facing this problem too. I am on firefox 87.0 (64-bit).
Re: @Martii:
I tried MutationObserver, and it worked well. Thanks. Although I now have a new problem.
Here is the new code:
// ==UserScript== // @name archive.org ripper // @namespace Violentmonkey Scripts // @include https://archive.org/* // @include https://www.archive.org/* // @grant GM_download // @run-at document-idle // @version 1.0 // @author - // @description Rips books from archive.org // ==/UserScript== console.log("archive.org ripper..."); const targetNode = document.getElementById('IABookReaderWrapper'); const config = { attributes: true, childList: true, subtree: true }; var img_links = []; const callback = function(mutationsList, observer) { for(const mutation of mutationsList) { if (mutation.type === 'childList' && mutation.target.classList[0] === 'BRpagecontainer' && mutation.target.classList[1] === 'BRmode2up') { var images = mutation.target.getElementsByClassName('BRpageimage'); img_links.push(images[0].src); current_url = window.location.href; if(img_links.length % 2 == 0) { var btn_next = document.getElementsByClassName("book_flip_next")[0]; btn_next.click(); console.log(img_links.length); console.log(img_links); } } } }; const observer = new MutationObserver(callback); observer.observe(targetNode, config);
Now all this works quite well. So I open https://archive.org/details/eastofsunwestofm00asbj/page/n19/mode/2up and the code works by storing the url of current images into the array
img_links
and then thebtn_next
is clicked, and the process goes on.The problem is that when I switch to browser-tab-B while this script is in action in browser-tab-A. The script stops working. Why is that? and how can this be solved?
// ==UserScript== // @name archive.org ripper // @namespace Violentmonkey Scripts // @include https://archive.org/* // @include https://www.archive.org/* // @grant GM_download // @run-at document-idle // @version 1.0 // @author - // @description archive.org ripper // ==/UserScript== console.log("archive.org ripper..."); document.onreadystatechange = function () { if (document.readyState === 'complete') { var img = document.getElementsByClassName("BRpageimage"); if(img) { console.log("Found"); console.log(img); console.log(img.length); console.log(img.type); console.log(img[0].src); } else { console.log("Not Found"); } } }
Sometimes this script works as expected. You can try it on https://archive.org/details/eastofsunwestofm00asbj/page/n19/mode/2up
But sometimes the console throws these errors:
archive.org ripper... Found HTMLCollection { length: 0 } 0 Uncaught TypeError: img[0] is undefined onreadystatechange moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:24 VMin0bjzz1xm9 moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:16 VMin0bjzz1xm9 moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:88 a moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1 v moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1 set moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1 <anonymous> moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/ archive.org ripper.user.js#3:1 c moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1 ScriptData moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1 onHandle moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1 c moz-extension://68e50867-b917-4486-9109-bb3547a1b15f/sandbox/injected-web.js:1
That HTMLCollection section when expanded shows two image tags but the line
console.log(img.length)
prints "0". Also,img[0].src
is throwing error. Why is that? and how can this be resolved?