How to Use a Headless Browser
June 21, 2021
What is a Headless Browser?
Why Run a Headless Browser?
Why would you want to use a headless browser and browse the web without the GUI? Using a headless browser allows you to programmatically access website content and feed the content from the website into other programs. One great example of this is Google. Google’s bots crawl the website using a headless browser and feed the content found via the headless browser into their programs that evaluate and rank websites in search results.
Headless browsers can also help you test websites more easily, including for SEO purposes. For SEO, testing with a headless browser allows you to see a page similar to how Googlebot sees it. By testing in this way, you can confirm that there aren’t any issues that would prevent Google from being able to access the content on that page. Beyond SEO, headless browser testing can also be used to mimic user behavior at a larger scale to confirm everything works as intended within the website.
Headless browsers can also help you extract information from websites that don’t offer an easier method for getting that information. For example, you might want to extract pricing information from your competitor’s website. You could use a headless browser to fetch the content from the competitor’s website and then write a separate program to grab the price information out of that content.
Headless Browser Options: Puppeteer, Selenium, Command Line
There are a number of headless browser options, but the one I’ll focus on is the headless version of Google Chrome. You can execute and run Google Chrome without the graphical user interface. This is what Googlebot uses to crawl your website and if you want to test how Googlebot will see your website, this is what you should be using to test your website.
There are a few different ways you can run headless Chrome. One option is to run headless Chrome directly from the command line interface (see next section for details). This can be a bit more straightforward as a place to begin. However, the options for what to run from the command line can be limited.
Another option is Selenium, which is a popular solution for automated browser testing. By default, Selenium is not completely headless but can be configured to operate that way. While there are many similarities between Puppeteer and Selenium, one big difference is that Puppeteer works only with Chrome while Selenium supports multiple browsers, including Firefox, Edge, and Safari.
How to Run Headless Chrome from the Command Line
The best way to understand a headless browser is to run a headless browser yourself. I want to make this really simple. So, I am going to assume you aren’t a developer and that you aren’t familiar with Node.js. While more advanced functionality is available by running Puppeteer or Selenium, the following instructions will walk you through how you can run a headless version of Google Chrome directly from the command line on Windows (sorry Mac users – I’m not a Mac user myself, but most of this should work similarly within Terminal).
There is a lot but (I promise!) this is tedious more than difficult and you will like the results. Let’s take this step by step…
Step #1: Setup Google Chrome Canary
Install Google Chrome Canary
To begin, you want to download Chrome Canary. You can do this even if you have regular Chrome installed on your computer.
Locate Canary File Location
Once installed, go to your search area and search for Canary. Click “Open file location.” You may have to Expand the options to see “Open File Location”.
After locating the file, you will be looking at the shortcut for Canary. Right click on the Canary shortcut icon in the folder and select Properties. On the Properties screen, find “Target”. Copy this and save it to reference in step 2. For example, my target is:
Step #2: Launch and Fetch Code with a Headless Browser
Open Command Prompt
Press the Windows Key + R. This will open up the “Run” screen. In the run screen, type “cmd” (without quotes), then click “OK”.
Fetch the HTML
Let’s grab the HTML from a website. In the command line, we will enter in the command to fetch the HTML from a website. Here is the format to use, where TARGET is the target of Canary we found in Step 1 (wrapped in quotes) and URL is the page we want to fetch.
"TARGET" --headless --disable-gpu --enable-logging --dump-dom URL
For example, if I wanted to fetch Elementive’s home page, I’d run:
"C:\Users\matth\AppData\Local\Google\Chrome SxS\Application\chrome" --headless --disable-gpu --enable-logging --dump-dom https://www.elementive.com/
View the HTML
You’ll need to wait a few seconds for the program to run and fetch the HTML. When it does, the HTML will appear in the command prompt.
Save the HTML
Viewing the HTML in the command prompt is helpful, but often we’ll want to save that output to manipulate or review elsewhere. To save the headless browser’s HTML output as a TXT file, you can modify the command we used above as follows. First, place a “>” character after the command. Then, put the full location of a text file to save the HTML to in quotes. In this example, I’ll save the HTML output to a text file called elementive-home-page.txt located in my Documents directory:
"C:\Users\matth\AppData\Local\Google\Chrome SxS\Application\chrome" --headless --disable-gpu --enable-logging --dump-dom https://www.elementive.com/ > "C:\Users\matth\Documents\elementive-home-page.txt"
Step #3: Take Screenshot with Headless Chrome
Open Command Prompt
Looking at the HTML is helpful for understanding what bots can and can’t see. However, we also can see the output visually. Once again, open up the command prompt by pressing the Windows Key + R. This will open up the “Run” screen. In the run screen, type “cmd” (without quotes), then click “OK”.
Set Screen Size
We want to view the website at a particular browser size. I recommend grabbing screenshots for the common browser sizes used by your visitors. To do this, we will adjust the command we used above slightly. We want to add the window-size to our command, stating it as “–window-size=width,height” where we replace width and height with the sizes we want to view. In this example, we’ll look at a 411×2000 screen using the command “window-size=411,2000” and add that into our code (see below).
Set Folder to Save Screenshots
Pick a folder or create a new folder you want to save the screenshots to on your computer. We will tell the command to save the screenshot here. Once created, you will want to save the full folder location. For example, I’ll create a screenshots folder for Elementive’s website located at C:\Users\matth\Documents\Screenshots\Elementive.
Take Screenshot with Headless Browser
We’ll take that directory and add it to our command after the word “screenshot”. For example:
Bringing that and the screen size together, we end up with this code to enter into our command prompt:
"C:\Users\matth\AppData\Local\Google\Chrome SxS\Application\chrome" --headless --disable-gpu --enable-logging --window-size=411,2000 --screenshot="C:\Users\matth\Documents\Screenshots\Elementive\elementive-home-page-screenshot.png" https://www.elementive.com/
Step #4: Schedule Automatic Script to Save Code & Screenshot
By this point, you have grabbed the code and screenshot. You could skip this next step entirely and run your headless browser on a one-off basis to see the code or take a screenshot. However, it can be nice to run this automatically. For example, maybe we want to check our website’s rendered home page every day and save a screenshot. We can do this with a batch file that we schedule to run automatically.
Setup Destination Folders
Before we create the batch file, let’s first create the folders to save our screenshots and source code. Somewhere on your computer, create two new folders—one for the Source Code and one for Screenshots. I’m going to call mine Screenshots and SrcCode. Save the path to the new files you’ve created. In my case that is C:\Users\matth\Documents\FetchPages\Screenshots and C:\Users\matth\Documents\FetchPages\SrcCode.
Create a Batch File
This gets a bit more advanced, so instead of explaining why or how this works, I’ll only explain what we are doing. If you are familiar with programming, this will make sense. If you aren’t, you can copy/paste and still set up the schedule.
To begin, open Notepad. In Notepad, go to File and select Save As. Select a folder where you’d like to save this (just remember where you saved it) and name this file “fetch-page.bat”. You can call the file whatever you want–the important part is adding the “.bat” extension. Because you are changing the extension, you might get a warning from Windows asking if you want to change the file. Click yes.
Edit the New Batch File
Next, in your new Notepad file, copy and paste the following code. The bolded words are for reference here. We’ll modify those bolded words in the next step.
CALL :fetch_screenshot_and_code WIDTH, HEIGHT, FILE-NAME, URL
:fetch_screenshot_and_code "TARGET" --headless --disable-gpu --enable-logging --window-size=%~1,%2 --screenshot="SRC-DESTINATION\%DATE:/=_%-%3.png" %4
"TARGET" --headless --disable-gpu --enable-logging --window-size=%~1,%2 --dump-dom %4 > "SCREENSHOT-DESTINATION\%DATE:/=_%-%3.txt"
Finalize Batch Code
Now, let’s replace the bolded words. Let’s start at the bottom of the script.
Change both instances of TARGET to the location of your Google Chrome Canary that we installed way back in Step 1.
Change SCREENSHOT-DESTINATION to the screenshot folder you created above.
Change SRC-DESTINATION to the source code folder you created above.
Next, let’s address how we change WIDTH, HEIGHT, and URL. The WIDTH and HEIGHT should be changed to the width and height of the screenshot you want to take. The URL should be changed to the URL that you want to take the screenshot of and fetch the source code for.
Finally, FILE-NAME should be changed to the name of the file you want to save this. I’d suggest using the name of the page or the URL and a note about what device you are looking at with these screen sizes, such as home-page-mobile. You do not need an extension added to the file name as that will be automatically added. As well, the date will be automatically added to our file name with this script.
With everything dropped in, my code looks like this:
Add Additional Pages
The good news is we can fetch multiple pages with this same batch file. We can also fetch different screen sizes of the same page. All we have to do is add additional CALL lines to the top of the code. In this example, I’ve added three extra call statements to look at the home page on desktop as well as mobile and to look at our about page on mobile and desktop. Be sure to give each file you want saved a unique file name.
You can download my version of this file here. Note, you will need to adjust file names and the reference to Chrome before running this on your computer. As well, for security reasons, this will download as a TXT file. You will need to re-save as a BAT file.
Run the Batch File
Now, we have the batch file saved and we can run it. To run it, find the batch file on your computer. Right click on the icon and click “Run as administrator”. You’ll be asked if you want to allow this app to make changes to your device. All this app will do is save code and take screenshots so you are good to click “yes” and run the app. Once it runs, return to the source code and screenshot folders you created, and you will see that new files have been added.
Schedule the Batch File (Optional)
Finally, and optionally, you can schedule the batch file to run on a regular schedule. Instead of repeating the steps here, I will refer you over to this great resource from Help Desk Geek on how to schedule a batch file.
Whew, that was easy, right? While it is a lot of steps, hopefully by now you have an idea of how to use a headless browser and have a better understanding of how headless browsers work. I recommend you run your website’s key pages through a headless browser (command line or otherwise) after any major changes. By doing so, you can confirm that Google’s bots can properly see your website’s code. If you have questions or need help, please contact me.