Retrieve your BigQuery query history with NodeJS SDK

Thomasdevshare
Towards Dev
Published in
3 min readDec 7, 2021

--

If you are encountering Cloud Overflow, your Google Cloud billing is rocketing at a pace that you cannot imagine. You might be trying to retrieve Big Query history logs to understand which queries are taking the most of the billing account, but found no instant solution on Stackoverflow or official documentation?

You are not alone. The first time I encountered this issue, I google it up all the way and found no actual working solutions. I got to dive into the official documentation, reading the Big Query API to see if there is a way I can quickly query all the log with all information I need as:

  • The billed bytes for each query
  • The SQL queries of the job
  • Timestamp and jobId of the job.
  • The table on which the queries run

At the end of the day, I found that in order to retrieve this information quickly, there is no way but to use the Big Query Client SDK.

Big Query Client SDK

I’m going to use the Big Query NodeJS client to query the data since it is more common. Reading all the data from the Big Query API needs you to perform some pagination API requests, yes, not no simple.

Get the logs to file

You cannot rely on one run to get all information you need. In order to save time, get the number of logs you want and then store it in a JSON file. That was how I did it.

As you can see in this snippet, I get all the logs (100k in my case, you can modify) I need a file called jobs.json


const {BigQuery} = require(‘@google-cloud/bigquery’);
const bigQueryClient = new BigQuery({
keyFilename: ‘[PATH_TO_YOUR_SERVICE_ACOUNT]’,
projectId: ‘[YOUR_PROJECT_ID]’
});
const fs = require(‘fs’);
(async function() {
exportJobs()
})();
async function exportJobs() {
let totalJobs = [];
let pageToken = null;
while (totalJobs.length < 100000) {
const options = {
maxResults: 1000,
allUsers: true
};
if (pageToken) {
options['pageToken'] = pageToken;
}
const [a, c, {nextPageToken, jobs}] = await bigQueryClient.getJobs(options);
pageToken = nextPageToken;
totalJobs = totalJobs.concat(jobs);
console.log(totalJobs.length);
}
fs.writeFileSync('./jobs.json', JSON.stringify(totalJobs));
}
/**
*
* @param path
* @returns {any}
*/
function readFileData(path) {
const content = fs.readFileSync(path);
return JSON.parse(content);
}

See the total billed bytes in the 100k jobs:

function sumAllBilledBytes() {
const jobs = readFileData('./jobs.json');
console.log(
jobs
.map(item => {
return (item.statistics.query && item.statistics.query.totalBytesBilled) || 0;
})
.map(item => parseInt(item))
.reduce((partial_sum, a) => partial_sum + a, 0)
);
}

Get the SQL queries for the jobs

The job API will not return you the SQL query of the job, you will need to make another round to retrieve this information.

async function getJobQueryText(jobId) {
const [a, job] = await bigQueryClient.job(jobId).get();
return job.configuration.query.query;
}
async function getQuerySQlForJobs() {
const chunk = (input, size) => {
return input.reduce((arr, item, idx) => {
return idx % size === 0
? [...arr, [item]]
: [...arr.slice(0, -1), [...arr.slice(-1)[0], item]];
}, []);
};
const jobs = readFileData('./jobs.json');
const processedJobs = jobs.map(job => {
return {
jobId: job.jobReference.jobId,
billedBytes: parseInt(job.statistics.query.totalBytesBilled),
table: job.statistics.query.referencedTables[0].tableId,
creationTime: parseInt(job.statistics.creationTime)
};
});
const jobChunks = chunk(processedJobs, 100);
let finalProcessJobs = [];
for (const jobChunk of jobChunks) {
const promises = jobChunk.map(jobItem => {
return (async () => {
const querySQL = await getJobQueryText(jobItem.jobId);
return {
...jobItem,
querySQL
};
})();
});
const updatedJobs = await Promise.all(promises);
finalProcessJobs = finalProcessJobs.concat(updatedJobs);
fs.writeFileSync('./processedJobs.json', JSON.stringify(finalProcessJobs));
console.log(finalProcessJobs.length);
}
}

This snippet will read all the jobs from the jobs.json and make API requests to the Big Query API 100 at a time. This action will require a long time to finish, depending on the number of logs from the jobs.json

Once done, you will get all the information you need right? Feel free to perform any further actions with this snippet. If you like this, give a clap. I hope that we will no longer sponsor some crazy rocketship trip of any Cloud Computing Billionaire.

Happy debugging! You can see my blog here.

--

--

I’m a developer working with React, NodeJS, GCP and Firebase. I spend my free time learning Data Science. Love digging the data with EDA, building a ML model.