How to deal with pages with Javascript-generated content, like this webpage.
The text "777" that we see in the browser does not exist (explicitly) in the page source:
from IPython.display import Image
Image('java_pic.png',width=500)
Selenium and Selenium with Python automates browsing.
With the following code, we can launch Firefox, go to the Google search page, and do a search:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
With the following code, we can launch Firefox and go to Google.com
driver = webdriver.Firefox()
driver.get('http://google.com')
I bet youou got an error:
[Errno 2] No such file or directory: 'geckodriver': 'geckodriver'
Search this error in google and resolve your problem
With the following code, we can launch Firefox, go to the Google search page, and do a search:
driver = webdriver.Firefox()
driver.get('http://google.com')
searchbox = driver.find_element_by_name('q')
searchbox.send_keys('goose lamp')
searchbox.send_keys(Keys.RETURN)
time.sleep(5)
driver.quit()
OK, now lets use this approach to scrape the site to get our "777".
from selenium import webdriver
driver = webdriver.Firefox()
#driver = webdriver.phantomJS()
driver.get('http://blue.math.buffalo.edu/448/javascript_demo.html')
elt = driver.find_element_by_class_name('foo')
f = open('foo.txt','w')
f.write(elt.text)
f.close()
driver.quit()