Tuesday, 2 April, 2019 UTC


Summary

HelpDev Modern JavaScript Frameworks and SEO: Tips and Tricks
Historically, search engines indexing content loaded by JavaScript have always been a big problem, and some still have it. No wonder that many SEO managers feared JavaScript for a long time like the devil fears the holy water. But there are solutions to the problem.
With the increasing prevalence of modern JavaScript frameworks such as Angular, React or Vue.js, it is almost impossible to get past the subject of SEO in the field of SEO and JavaScript. At the same time, developers should also be aware of how search engines process their pages based on JavaScript frameworks. While many search engines are still struggling in 2019 to process JavaScript-based pages, industry leader Google is getting along quite well with it. Nevertheless, there are still many stumbling blocks that can prevent a site from being found well. In this post, let’s look at the obstacles JavaScript brings with it from an SEO perspective and the best practices that Google and Co. can use to help crawl and index.
How does a search engine work?
In order for a document, usually a website, to appear in search results, every search engine must first find the document and then understand the content. The first step is therefore to get an overview of as many documents as possible. A crawler, like the Google Bot, basically does nothing but follow all the URLs to constantly discover new pages.
If the crawler lands on a URL, the HTML document is initially downloaded and the source code is searched for basic information on the one hand and links to other URLs on the other hand. Meta-information such as the Robots meta tag or the canonical tag tells you how to process the page. The crawler can follow the found links further.
For this the crawler sends the HTML file and all resources to the indexer, in Google’s case this is the caffeine. The indexer renders the document and can then index the contents of the webpage. The algorithm then makes sure that the most relevant documents can be found for appropriate search queries.
But if JavaScript is used to render the DOM (Document Object Model) on the client side, ie in the browser, the crawler will find an HTML document when visiting a URL, which simply looks like the one shown in example
<! doctype html >
< html >
< head >< /head >
< body >
< app-root >< /app-root >
< script src="runtime.js" >< /script >
< script src="polyfills.js" >< /script >
< script src="main.js" ></script >
< /body >
< /html >
While retrieving a server-side rendered document structure and content are already included in the HTML source code, comes at React, Angular and Co., the pre-DOM HTML quite empty. Now, if the entire content of a page loaded in this way, a search engine crawler, which reads only the HTML code of a page, gets virtually no information. The robot can not find any links to follow, nor any basic information about the site content, nor any meta tags.
Therefore, the crawler now has to send the HTML as well as CSS and JavaScript resources to the indexer, who will then first have to run the JavaScript to render the page before content and meta information can be processed. Only then can further links be taken from the rendered DOM, which are given back to the crawler, which thus comes to other URLs.
Can search engines render JavaScript?
So far, unfortunately, Google is the only search engine that we know to actually use a rendering engine to execute JavaScript and also provide information about the process in its documentation. The other major search engines, including Bing, are not yet there, as an experiment by Polish SEO specialist Bartosz Góralewicz has shown. Although Bing claims to render JavaScript as well, it only seems to happen there with very large and prominent pages.
In fact, we know that Google uses a Web Rendering Service (WRS) based on headless Chrome in Caffeine. Unfortunately, we also know that this is currently still on the state of Chrome version 41, so behaves like a three-year-old browser. Fortunately, the responsible team around John Mueller and Martin Splitt has already stressed that they are working hard to come up with a newer version as soon as possible and want to keep up with the Chrome updates in the future.
But as long as that is not the case, it can be read on www.caniuse.com or the Chrome Platform Status, which features are supported by just that Chrome 41 and which are not.
In any case, the prerequisite for a successful rendering is that you do not block any JavaScript and CSS resources through the robots.txt file. If Google is unable to access these resources, correct rendering will be difficult. In addition, all relevant content must be loaded before the firing of the load event. All contents that are loaded by a user event after the load event are ignored during indexing.
It’s also interesting that Google indexes and crawls here in two waves. Because JavaScript rendering is extremely resource intensive, the URL is indexed first. Here, of course, can only resort to the information that can be found directly in the pre-DOM HTML source. In a second wave, the page is then rendered and the full contents of the post-DOM HTML indexed. To test how Google renders a JS-based website, you can use the “Fetch as Google” feature in the (old) Search Console. On the one hand you can see how Google actually renders the page, on the other hand you can also see the source code.
Two other tools to test Google rendering include the mobile-friendly test and the rich-results test . Here you can find the rendered HTML, as well as the Mobile-Friendly-Test a screenshot of the rendered page.
In addition to problems with the rendering of the individual pages, however, the use of JavaScript frameworks also always involves quite different sources of error. Also the SEO basics are often forgotten.
URLs are the API for crawlers
First of all, each page needs a URL. Because URLs are the entities that search engines list in search results. JavaScript allows you to change content dynamically without changing the URL. But every single page must have a unique, distinguishable and permanent URL so that it can be indexed at all. So if a new content is loaded, a new URL with server-side support must be called. It is important to make sure that normal URLs are used and no hashes (#) or hashbangs (#!), Even if they are provided as standard in the framework:
example.com/#about
example.com/!#about
example.com/about
It is important to avoid pushSta e-bugs for internal links, so that the server-side supported URL is really called. Otherwise, it may happen that content is suddenly available on multiple URLs, becoming duplicate content.
Another source of error: If JavaScript takes over the navigation, it may happen that a URL works only from within the application. This happens if and only if the content is reloaded and the URL is well manipulated, but this server is not supported. If a user comes directly to the URL or if you reload it, there is no content on this URL.
You have to know that Google Bot always works Stateless. Cookies, Local Storage, Session Storage, IndexedDB, ServiceWorkers, etc. are not supported. The crawler visits each URL as a completely new user. Therefore, it is important to ensure that all routes, so all URLs, are always directly accessible.
Server Status Codes
In terms of URLs, the general SEO best practice is to use the server status codes correctly. If the URL of a content changes but remains available on the page, you should redirect users and search engines to the new URL with a server-side redirect (HTTP status code 301) or equivalent client-side JavaScript redirect. The forwarding not only gives you the desired content, but the redirects also pass on the authority of the old URL (keyword backlinks) to the new URL.
If content is no longer available, a 404 status should also be returned correctly. Search engines then remove these URLs from the search results. This also includes avoiding soft-404 errors: a 404 page is called a 404 page because it outputs such a status, and not just an “ups! Sorry, this page does not exist “is displayed while the server is playing a 200 (OK) code.
Use markup
HTML is a markup language: The available markup should also be used accordingly. Even though there are different ways to call URLs with JavaScript, you should use the appropriate anchor tag including href attribute for a link to another URL:
< a onclick = "location.href (, https://www.example.com/downground '); " > My bottom < /a >
< span onclick = "goTo (, / underside '); " > My bottom < /span >
< a href="/downground" > ​​My bottom < /a >
The markup also includes meta tags in the HTML head, which contain important information for search engines. These include the Title Tag, the Meta Description, Canonical Links or hreflang awards. Social media crawlers also get information here, for example OpenGraph Markup. Ideally, you create the meta tags without JavaScript. There are now common modules for the various frameworks that are relatively easy to integrate.
In HTML Head we often find coarser problems. It happens over and over again that developers on all sides play by default a general noindex tag in the markup, in order to remove it then only by JavaScript again (example 2).
< meta name="robots" content="noindex" >
< script >
var robots = document.querySelector ('meta[name="robots"] ');
document.head.removeChild (robots);
< /script >
But in the first step – we remember – the Google bot comes to the site and downloads the unfinished HTML to find the noindex there. Thus, the page is not passed on to Caffeine, where the page would be rendered, and Google does not see that in the finished DOM then just noindex would be more in the head. As a result, the page is not indexed at all.
Pre-script JavaScript on the server side
Rendering web pages is extremely resource intensive – even for industry giants like Google. Therefore, as mentioned above, the rendering process does not happen immediately after the crawler discovers a URL, but only when appropriate resources are free. It can take up to a week to render a page. This makes the quite simple process of crawling and indexing extremely complicated and inefficient.
Other crawlers of search engines and the crawlers of Facebook, Twitter, LinkedIn, etc., visit the pages, for example, for the generation of a preview box, render no JavaScript at all. To make sure that a page is also understood by crawlers outside of the Google bot, but also to relieve Google, we recommend that you pre-render pages on the server side. This ensures that Google really finds all the important content and indexes it more quickly.
Quite apart from the problems of the bots, client-side rendering can also be disadvantageous for users. At least the initial page load, ie the loading of the first page, usually takes much longer because the rendering has to be completely done by the client. In addition, the charging time depends on the quality and computing power of the respective terminal. That’s why the magic word is to make JavaScript frameworks SEO-ready: server-side rendering. Instead of having the HMTL code first calculated on the client side, the page is already pre-rendered on the server side and delivered. In addition to fee-based pre-rendering services, there are now various open source solutions that use PhantomJS, a Headless Chrome or another headless browser to render the pages already on the server. Both browser and crawler get directly the server-side pre-rendered HTML played. All the JavaScript necessary to render the page is already running on the server. On the client side, only JavaScript that results from user interaction is executed.
Netflix, for example, made major successes with this technology, completely switching its React application to server-side rendering and only running Vanilla JavaScript on the client side. This switch to server-side rendering has helped Netflix improve load times by 50 percent.
Dynamic rendering
The model, which calls Google Dynamic Rendering, distinguishes between browser and crawler. While a normal browser receives the JavaScript version of the page and has to render on the client side, crawlers get a server-side pre-rendered version.
This requires a middleware that differentiates whether the access comes from a normal browser or a bot. Here, the user agent is simply read out and, if necessary, the IP address verified by the respective bot is usually accessed. John Muller, Senior Webmaster Trends Analyst at Google, mentioned in Google I / O 18 Talk that this variation is not considered a cloaking. That the pre-rendered and the client-side version should not differ in content should be clear.
Hybrid “Isomorphic” rendering
Google itself always recommends a hybrid rendering solution, in which both normal users and search engines will initially receive a pre-rendered version of the page. Only when the user starts to interact with the page does the JavaScript start to change the source code via the DOM. So as far as possible, the JavaScript is already rendered on the server, for all other actions JavaScript is then executed on the client side. And because crawlers are stateless, they always get a pre-rendered page for every single URL.
Another advantage of this solution is that even users who have disabled JavaScript, get a working page delivered. In practice, the set-up of such a solution, however, often quite complicated, even if there are now very good modules for the most common frameworks.
Test if found
To test how Google processes its own JavaScript-based website, there are several tools that are provided by Google itself. The Mobile-Friendly Test and Rich-Results-Test tools mentioned above can be misused to see if and how the indexer can render the page. Whether the JavaScript audits practical Fetchas Google feature is integrated into the new Search Console, must first be seen.
Of course you can always download the version 41 of the Chrome browser and see how the page is rendered. The console of the local DevTools provides information about which features this old version does not yet support.
Even some SEO crawlers can now render JavaScript, for example the popular SEO Crawler Screaming Frog. In the paid version of the tool, JavaScript rendering can be activated in the Spider configuration. Under Rendered Page can then view the screenshot of the page and also the pre-DOM and post-DOM HTML can be compared directly. Because JavaScript rendering consumes a lot of resources, it’s difficult to completely crawl very large pages with the desktop application.
To test whether the contents of your own page have been indexed correctly, also offers a simple Google search. With the search operator site: example.com and a text excerpt from the page to be checked can be quickly determined whether Google finds the content.

Conclusion

The topic of SEO has arrived at the development teams of the large frameworks. One deals with the problems with crawling and indexing and develops appropriate solutions. Google also tackles the subject of JavaScript offensively and publishes documentation and assistance. As developers tackle the topic and turn their attention to the application, a JavaScript-based website and search engine findability are no longer mutually exclusive.
JavaScript brings new challenges to search engine optimizers. In the future, SEO has to delve deeper into technical details and become more involved with JavaScript in order to identify potential obstacles and get rid of them together with developers.
The post Modern JavaScript Frameworks and SEO: Tips and Tricks appeared first on HelpDev.