How to enable CPU profiling for a Gatsby site in IntelliJ IDEA or WebStorm
10/14/2022
I'm currently maintaining a large Gatsby-based site where static HTML pages are generated out of a massive set of JSON files.
Build times are anywhere between slow and ridiculously slow, and a lot of heavy work is performed by two Gatsby plugins, gatsby-source-filesystem
and gatsby-transformer-json
. Both plugins are in action during Gatsby's "Source and transform nodes" build step that is notorious for hiding away any progress information even when a build is run in verbose mode.
When a new piece of JSON is added into the mix and "Source and transform nodes" is starting to take way over the usual bad-but-bearable 20-minute range, you start to wonder what the hell is going on under the hood.
One thing you can do is plug into Gatsby's Node APIs from your site's gatsby-node.js
and extend build output with custom logging:
const fs = require('fs');
let logStream;
exports.onPreInit = () => { const logDate = new Date(); logStream = fs.createWriteStream(`gatsby_create_node_log_${logDate.toISOString().replaceAll(":", "-")}.txt`, {flags:'a'});}
exports.onCreateNode = ({node}) => {
const nodeInfo = `${new Date().toLocaleTimeString()} ` + `A property of the node being created that you want to log: ${node.thisProperty ? node.thisProperty : "property not defined"}.`
logStream.write(nodeInfo + "\n");}
This can give you a general idea of what kinds of nodes are taking long to create. In order to dive deeper and try understand what's causing the slowness, it's best to turn to profiling and take a CPU snapshot of Gatsby's Node.js process.
Now, as an incorrigible user of all things JetBrains, if there's a profiler that comes with my IDE and it supports my application's tech stack, that's what I prefer to use. Luckily, IntelliJ IDEA, WebStorm and most other JetBrains IDEs do support Node.js profiling.
However, there's a caveat. A few caveats, in fact.
1. You need a separate run configuration
Normally, JetBrains IDEs auto-create run configurations to build and launch Gatsby sites in development mode. These run configurations are npm-based, and they do not allow profiling. To enable profiling, you need to create a separate run configuration based on the Node.js template:
IntelliJ IDEA will automatically pick up a Node interpreter, but you need to fill out the following fields in the Configuration tab:
- Working directory: this should be the root of your Gatsby installation, right where Gatsby's
package.json
andgatsby-node.js
files are. If your IntelliJ IDEA project contains more than just a Gatsby site, chances are you need to override whatever IntelliJ IDEA auto-inserts in this field. - JavaScript file: this should point to Gatsby's CLI distribution inside your Gatsby site's
node_modules
directory, relative to the working directory. Most probably, that would benode_modules/gatsby/cli.js
. - Application parameters: a Gatsby CLI command that you want to launch when you profile, such as
develop
orbuild
.
2. You need to explicitly enable CPU profiling
Now that the basic configuration fields are filled out, you still can't profile. What you need to do is open the V8 Profiling tab in run configuration settings and select two options:
- Record CPU profiling info to actually enable profiling.
- One log file for all isolates to merge profiling data from multiple threads into one CPU snapshot instead of viewing them one-by-one.
This makes the new run configuration for Node profiling complete. Time to save it, then start it, and run your Gatsby process for as long as you need to capture a CPU profile.
Oh wait, what is this?
3. You need to FIX CODE IN NODE_MODULES
When you start the profiling run configuration, Gatsby's Node process immediately crashes with an error like this:
Initiated Worker with invalid execArgv flags: --prof, --nologfile_per_isolate, --logfile=v8-14-10-2022_13-52-55-.log
What? Why? WTF?
When IntelliJ IDEA launches the run configuration, it passes a few extra arguments to the Node process. These arguments enable generating the actual Node profiler output (--prof
), choose to merge profiler logging output into a single log file (--nologfile_per_isolate
), and define that log file (--logfile=v8-14-10-2022_13-52-55-.log
).
These are probably not the most widely used Node arguments out there, but crashing because of them, seriously?
Let's google up the error.
The very first search result is a GitHub issue in the Parcel bundler project where Evan Wallace tries to run a Parcel build on a large code base, uses the --max-old-space-size
Node argument to increase memory available to Node, then receives an error that is similar to what Node throws when trying to profile.
Evan then reports that he was able to work around that problem by editing a line of Parcel code:
Let's see if this workaround can be applied to Gatsby. First of all, does Gatsby happen to use Parcel?
Yes it does. Parcel is declared in Gatsby's package.json
, and in fact, if we scroll down the output of the stack trace that Node displays along with the error, Parcel is right there.
Let's look into Gatsby's node_modules
and see if we can find that Worker.js
file that Evan updated to work around their Node argument problem. There it is: node_modules/@parcel/workers/lib/Worker.js
. And in the fork()
function at line 60, there's the offending line:
let filteredArgs = process.execArgv.filter(v => !/^--(debug|inspect|max-old-space-size=)/.test(v));
While max-old-space-size=
is no longer causing issues because Evan has submitted a PR to include it in the regular expression, the arguments that IntelliJ IDEA uses for profiling are not included and still cause the Node process to crash.
Let's modify the line to include the arguments that we need:
let filteredArgs = process.execArgv.filter(v => !/^--(debug|inspect|max-old-space-size=|prof|logfile=|nologfile_per_isolate)/.test(v));
Yes, that's editing library code, you should never do that, and even if you do, don't count on your changes to last. Disclaimers aside, that's the quickest way to try unblock the profiling experience.
Now, launch the profiling run configuration once again, and Gatsby runs!
As soon as you stop the run configuration (and you may have to do it twice in order for profiling to complete), IntelliJ IDEA will process the CPU snapshot and display it a way that gives you a decent chance of figuring out what's slow: