Scraping dynamic page content

Saturday, 06 January 2018 23:57 Stefano Tommesani
Print

One of the most common roadblocks when scraping the content of web sites is getting the full contents of the page, including JS-generated data elements (probably, the ones you are looking for). So, when using CEFSharp to scrape a site, reading the content of the page with:

string pageSource = await _browser.GetSourceAsync(); 

will not return the JS-generated parts of the page. But the following code fragment will:

var jsResponse = await _browser.EvaluateScriptAsync(@"document.getElementsByTagName ('html')[0].innerHTML");
if (jsResponse.Success)
{
    string pageSource = jsResponse.Result.ToString();  
Quote this article on your site

To create link towards this article on your website,
copy and paste the text below in your page.




Preview :

Scraping dynamic page content
Saturday, 06 January 2018

© 2019 - Stefano Tommesani


Powered by QuoteThis © 2008
Last Updated on Sunday, 07 January 2018 00:38