JavaScript-rich sites like those built with tools like React, AngularJS, Ember, Vue, and others can be hard to crawl, as the internal links are not usually found in the HTML source. In this kind of site, a regular HTML crawler is not enough. If you take for example the EmberConf site and view the HTML source, you won’t find any <a>
tag there. But when seen in the browser, there are a few internal links indeed.
To crawl this kind of site, we’d need to render the HTML page in a web browser to get the HTML DOM fully formed, and only then can we get the links to discover the internal pages. That’s exactly what the new Dynamic Crawler does: it renders each web page using a headless Chrome browser internally.
As a pro user, you can now run validation reports for JavaScript-rich sites and get them properly crawled. All you need is to enable the new advanced option Dynamic Crawler
and we’ll do the rest for you.
We recommend using this new crawler only on sites that can’t be crawled with the standard Static Crawler, as it’s slower and consumes 1 credit per crawled web page. For example, if your JavaScript-rich site implements server-side rendering, or you can use an XML sitemap or manually provide a list of initial URLs, you might not need to use the Dynamic Crawler.