I am currently doing some Web scraping and I wanted to take the approach of using AWS Lambda to run my Node.js scraper code. Normally using cheerio to parse a given Web page is sufficient, but this approach does not work for single page applications; you need to use a headless browser since JavaScript needs to execute to construct the page. I decided to use the Horseman npm package for this purpose, which requires that you include the PhantomJS binary file in some way. The PhantomJS README file suggests using the phantomjs-prebuilt package, an approach that worked locally but failed when I deployed my Lambda. The binary file used needs to be compatible with the AWS Lambda servers and this was not the case. This post details how I resolved the issue.
Getting the PhantomJS binary
I first downloaded and unzipped a prebuilt phantomjs package from this bitbucket repo. You need a suitable Linux x86 64 bit package; the latest stable release at the time of writing was phantomjs-2.1.1-linux-x86_64.tar.bz2. The file in the unzipped package that you require is the bin/phantomjs file; I copied this to a bin directory in the root of my Lambda directory; you can basically put this file where you like in your Lambda directory.
Including PhantomJS in the Lambda package
I use Serverless to deploy Lambdas, and Webpack to build them. I needed to get the binary file included in the zipped Lambda file and I needed it to have execute permissions. I installed three npm packages to do this, all as dev dependencies: on-build-webpack, copy-webpack-plugin, and chmod. I then altered my webpack build file so the plugin section looked like this:
// necessary imports at the top of this build file
const CopyWebpackPlugin = require('copy-webpack-plugin');
const WebpackOnBuildPlugin = require('on-build-webpack');
const chmod = require('chmod');
plugins: [
new CopyWebpackPlugin([
{ from: './bin/phantomjs' }
new WebpackOnBuildPlugin(() => {
chmod('.webpack/phantomjs', 777);
Note: .webpack is the intermediary directory that webpack uses when building the Lambda.
The copy-webpack-plugin is configured here to copy the phantomjs file into the root directory of the Lambda. In the Lambda handler that uses it, the node-horseman package needs to be told where that file is:
const PHANTOMJS_BIN_PATH = path.resolve(
const horseman = new Horseman({ phantomPath: PHANTOMJS_BIN_PATH });
is one of the automatically configured Lambda environment variables.
You could check that the created Lambda zip file contains the PhantomJS binary file and that it has the correct permissions; they should be -rwxrwxrwx
Because of the presence of the phantomjs binary file, your zipped Lambda file will be quite large (~20 MB). If you are on a bad connection, you will want to increase the AWS CLI timeout. This can be done with serverless by executing the following deploy command:
AWS_CLIENT_TIMEOUT=900000 sls deploy
You should now be able to run Horseman in AWS Lambda.
# Comments
Comments on this site are implemented using GitHub Issues. To add your comment, please add it to this GitHub Issue. It will then appear below.