Projects

Web Scraping and Gmail Notification Alert

Notifications on phone. Photographed by @jonasleupe.

In this blog post, I'll be showing and explaining how I wrote a node application that sent me an email whenever an item is in stock on Best Buy's online store. I was heavily inspired by this Reddit post from /u/GennaroIsGod.

Moving forward I'll assume you'll have node.js already installed and some understanding of async/await.

Installing libraries

Create a new directory called scrape-and-notify (or whatever you want, you're the boss in your world!) Then install these dependencies using your preferred package manager. I'll be using npm. After running npm init in your directory, install these packages with the command:

npm i axios cheerio dotenv esm googleapis nodemailer

Double-check that the dependencies have installed in your package.json.

Here's a brief explanation as to why I'm using each dependency:

  • axios - HTTP client that works with node.js.
  • cheerio - parses HTML and gives us an API for traversing/manipulating parsed HTML.
  • dotenv - load your environment variables from an .env file.
  • esm - ECMAScript module loader. Lets us use "import" in node.js (not necessary if you want use "require" to import your dependencies).
  • googleapis - client library for using Google APIs. We're going to be using this for our OAuth2 authorization for sending emails via Gmail.
  • nodemailer - send emails via node.js.

Starting your index.js and setting up your scripts

If you don't already have an index.js file, create one now.

Let's start by creating a function called startingTracking and calling it.

// index.js 

const startTracking = () => {
  console.log("Hello everyone!")
}

startTracking();

Inside your package.json create a develop script with the command "node -r esm index.js"

// package.json

{
  // other stuff here
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1",
    "develop": "node -r esm index.js"
  },
  // more stuff here
}

In your terminal, run npm run develop. You should see "Hello everyone!" in your console. When I'm in development mode, I usually have the script run via nodemon instead of node. All nodemon does is it'll restart your node application whenever there are changes in the directory. If you'd like to use nodemon, install it globally with npm i -g nodemon and then change the develop script to nodemon -r esm index.js.

The meat and potatoes of index.js

Next, create a const variable for the product URL you're going to be tracking. In my case, I'll be tracking the Nvidia RTX 3080. Add the variable in your startTracking function.

// index.js

const startTracking = () => {
  const bestBuyProductUrl = `https://www.bestbuy.com/site/nvidia-geforce-rtx-3080-10gb-gddr6x-pci-express-4-0-graphics-card-titanium-and-black/6429440.p?skuId=6429440`
}


// ... rest of code 

Now we're going to start the fun part! Import axios at the top of your index.js. If you're not using esm, you can use require. Since I'm using esm, all my imports in the rest of this post will be done the ES6 way.

Create a new async function called fetchUrl with an argument of url. Inside the fetchUrl function, add a try-catch block, and inside the try block, send a get request to the url like so: const response = await axios.get(url). Then console.log the response. If the status code is 200, statusText is 'OK', and a data property with data inside, then you've successfully sent a get request.

// index.js

const fetchUrl = async (url) => {
  try {
    const response = await axios.get(url)

    console.log(response)
  } catch (error) {
    console.error(error)
  }
}

const startTracking = () => {
  const bestBuyProductUrl = `https://www.bestbuy.com/site/nvidia-geforce-rtx-3080-10gb-gddr6x-pci-express-4-0-graphics-card-titanium-and-black/6429440.p?skuId=6429440`

  fetchUrl(bestBuyProductUrl)
}

startTracking();

console.log(response) should show:

{
  status: 200,
  statusText: 'OK',
  headers: {
    // ...
  },
  data: {
    // ...
  },
  // ...
}

We're going to be switching gears a little bit right now. Let's open the console on the product page we're scraping. There's a couple of useful data points we would want to grab from the DOM. We're going to grab the product's title, price (not necessary for our purposes, but I decided to grab the price anyways), SKU, and isInStock (the current availability of the item).

On the Nvidia 3080's page:

  • title - is in an h1 in the div with the class of sku-title.
  • price - first span child in a div with class priceView-customer-price
  • SKU - span with class product-data-value in div with class sku
  • isInStock (button with item availability) - button with class add-to-cart-button.

Now that we know which elements to grab, let's code it.

Start by importing cheerio. Then in our fetchUrl function, we can load our data response from axios into cheerio. Once our data response is loaded into cheerio, we have access to the HTML elements. We can use cheerio's APIs to grab each data point and store the data points inside an object.

import axios from "axios"
import cheerio from "cheerio"

const fetchUrl = async (url) => {
  try {
    const response = await axios.get(url)
    const { status } = response
    
    let item = {}
    if (status === 200) {
      const { data } = response
      const $ = cheerio.load(data)
    
      $("div.sku-title>h1").each((_idx, el) => {
        const itemTitle = $(el).text()
        item = { title: itemTitle }
      })
      
      $("div.priceView-customer-price>span:first-child").each((_idx, el) => {
        const itemPrice = $(el).text()
        const noDollarSignPrice = itemPrice.replace(`$`, ``)
        const numberPrice = noDollarSignPrice * 1
        item = {
          ...item,
          price: numberPrice,
        }
      })
      
      // sku for each items and generate api add to cart with sku
      $("div.sku>span.product-data-value").each((_idx, el) => {
        const sku = $(el).text()
        const skuTrimmed = sku.trim()
        const apiAddToCartUrl = `https://api.bestbuy.com/click/-/${skuTrimmed}/cart/`
        item = { ...item, sku: skuTrimmed, addToCartUrl: apiAddToCartUrl }
      })
      
      // if item is sold out or not via button
      $("button.add-to-cart-button").each((_idx, el) => {
        const itemAvail = $(el).text()
        const lowercaseItemAvail = itemAvail.toLowerCase()
        const isInStock = lowercaseItemAvail === `sold out` ? false : true
        item = { ...item, availability: lowercaseItemAvail, isInStock }
      })
      
      return item
    }
    
    console.log(`no response while getting ${url}`)
    return item
  } catch (error) {
    console.error(error)
  }
}

As we can see our item object has the properties title, price, sku, addToCartUrl, availability, and isInStock. Since I'm only grabbing data on one page, I'm using an object as my data structure. If you want to implement grabbing multiple pages and checking if the item is in stock then an array of objects would work.

To explain what I did while grabbing some of the data points from the markup:

  • itemPrice - selecting the price includes the "$". Since I wanted the price to be a number, I choose to omit the $ by using the replace method and then converting it to a number.
  • sku - selecting the sku gives us extra whitespace, so I trimmed off the whitespace as stored it in skuTrimmed.
  • apiAddToCartUrl - I'm not entirely sure how to find this API link, but I just used GennaroIsGod's post as a reference. To get the link I used a template literal and then interpolated the skuTrimmed.
  • isInStock - if the button does not say "sold out" then it is in stock.

Environment variables

Now let's begin to set up our environment variables. Create a .env file in the root directory. Then add the variables GMAIL_SEND_TO_USERNAME, GMAIL_SEND_FROM_USERNAME, OAUTH_CLIENT_ID, OAUTH_CLIENT_SECRET, OAUTH_REDIRECT_URI, and OAUTH_REFRESH_TOKEN.

Here's a quick breakdown of each variable:

  • GMAIL_SEND_TO_USERNAME - the email you want to receive notifications
  • GMAIL_SEND_FROM_USERNAME - the email you want to send notifications (if you choose to use the same email as the one that you want to receive from, you can choose to omit this part).
  • OAUTH_CLIENT_ID - our OAuth client ID
  • OAUTH_CLIENT_SECRET - OAuth secret
  • OAUTH_REDIRECT_URI - OAuth redirect URI
  • OAUTH_REFRESH_TOKEN - OAuth refresh token

Google Cloud Platform and OAuth 2.0

Now we're going to set up our emailing service with Gmail and Google Cloud Platform with OAuth 2.0. I'll be explaining briefly how to set up a new project but I followed this video on how to set up OAuth with Google Cloud Platform. So if you choose to follow the video, skip to the "Sending our notification email" section.

You can create a new email account to send yourself an email (such as instockalert@gmail.com) or just use an existing email that you already have. I choose to create one, but this step is optional.

First, go to https://console.cloud.google.com/. We will need to create a new project. Click on the dropdown menu left of the "Search products and resources" search bar in the navbar at the top of the page.

For the project name, I choose "in-stock-notifications". You can choose to name your project whatever you want. After your project is created, click on the hamburger menu icon on the top left in the navbar and then go to "OAuth consent screen" > External > Create. While filling out the "App information" do not upload an app logo. If you do, it'll take a couple of days for Google to verify your application. Fill out "Developer contact information". For the "Scopes" and "Optional info" we can leave the default options.

Now that we have the consent screen set up, we can get our credentials set up. Go to APIs & Services > Credentials. On the Credentials page create a new credential by clicking on "CREATE CREDENTIALS" > "OAuth client ID". Select "Web application". Inl the authorized redirect URI, make sure you add https://developers.google.com/oauthplayground without the trailing forward slash. You will run into problems if you do! Sweet, we now have a client ID and client secret. Now we need to authorize our API. Go to https://developers.google.com/oauthplayground. Click on the settings cog icon in the upper right corner, and "Use your own OAuth credentials". Add your client ID and client secret here. Then, in "Step 1", add https://mail.google.com to our authorized API. On the following screen, choose the account you're using to send the email. Google will say that your app isn't verified, but that's okay, just click on "Continue" > Allow. You'll be redirected to the OAuth playground site. In "Step 2", click "Exchange authorization code for tokens" and you'll get a refresh token. Add the client ID, client secret, redirect URI, and refresh token to your .env file.

Your .env file should look something like this:

GMAIL_SEND_TO_USERNAME=kendyhnguyen1991@gmail.com
GMAIL_USERNAME=example@gmail.com
OAUTH_CLIENT_ID=xxxxxxxxxxxx.apps.googleusercontent.com
OAUTH_CLIENT_SECRET=xxxxxxxxxxxxx
OAUTH_REDIRECT_URI=https://developers.google.com/oauthplayground
OAUTH_REFRESH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Sending our notification email

Now let's make use of the fetchUrl function we created. I created a variable called itemToCheck and it awaits for fetchUrl(bestBuyProductUrl). If our item is in stock, we'll send a notification, else, we'll console.log that our item is not in stock.

// index.js

// ...

const startTracking = async () => {
  const bestBuyProductUrl = `https://www.bestbuy.com/site/nvidia-geforce-rtx-3080-10gb-gddr6x-pci-express-4-0-graphics-card-titanium-and-black/6429440.p?skuId=6429440`

  const itemToCheck = await fetchUrl(bestBuyProductUrl)

  if (itemToCheck.isInStock) {
    sendNotification(itemToCheck)
  } else {
    console.log(`${itemToCheck.title} is not in stock`)
  }
}

// ...

Now we can start creating our sendNotification function.

Go ahead and place your sendNotification function above the startTracking function.

// index.js

const sendNotification = async (item) => {
  let isEmailSent = false;
  const { addToCartUrl, title, url, isInStock } = item;
  
  // going to add more here
}

// startTracking function
const startTracking = async () => {
  // ...
}

startTracking()

I want to keep track of the status of the notification email (if it's been sent or not). That's why I have the isEmailSent variable. I also went ahead and destructured the properties (addToCartUrl, title, url, isInStock) I'm going to use from the item argument. Don't forget to import google from googleapis and require("dotenv") for connecting our OAuth 2.0 client and accessing our environment variables respectively.

Now we can set up our OAuth 2.0 client:

// index.js
// ...
import { google } from "googleapis"
require("dotenv").config()

const sendNotification = async (item) => {
  let isEmailSent = false;
  const { addToCartUrl, title, url, isInStock } = item

  
  const oauth2Client = new google.auth.OAuth2 (
    process.env.OAUTH_CLIENT_ID,
    process.env.OAUTH_CLIENT_SECRET,
    process.env.OAUTH_REDIRECT_URI
  )

  oauth2Client.setCredentials({
    refresh_token: process.env.OAUTH_REFRESH_TOKEN,
  })
}

Our OAuth client is almost done getting set up. All we need to do is get our access token and then set up our mailing options for nodemailer. Our access token data is in a promise, so we will wrap it in a try-catch block. I'll also create a transporter with nodemailer with our mail options, too. Finally, I wrote up some HTML to format our email body into a table with the addToCartUrl, url, and title of our tracked item.

// index.js
// ...
import nodemailer from "nodemailer"

// timestamp
const date = new Date()
const unixTime = getUnixTime(date)
const normTime = fromUnixTime(unixTime)

const sendNotification = async (item) => {
  // ...
  
  let isEmailSent = false
  const { addToCartUrl, title, url, isInStock } = item
  
  // ...
  
  try {
    const accessToken = await oauth2Client.getAccessToken()
    const mailOptions = {
      host: "smtp.gmail.com",
      port: 465,
      secure: true,
      auth: {
        type: "OAUTH2",
        //set these in your .env file
        clientId: process.env.OAUTH_CLIENT_ID,
        clientSecret: process.env.OAUTH_CLIENT_SECRET,
      },
    }

    let transporter = nodemailer.createTransport(mailOptions)
    let textToSend = `In stock notification for ${title} at Best Buy`
    let htmlText = `<tr>
    <td width="33%" align="left" bgcolor="#EEEEEE" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #252525; padding:10px; padding-right:0;">${addToCartUrl}</a></td>
    <td width="33%" align="left" bgcolor="#EEEEEE" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #252525; padding:10px; padding-right:0;">${url}</td>
    <td width="33%" align="left" bgcolor="#EEEEEE" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #252525; padding:10px; padding-right:0;"><p>${title}</p></td>
    </tr>`

    const html = ` 
    <table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#FFFFFF">
    <colgroup span="3">
    <tr width="94%" border="0" cellpadding="0" cellspacing="0">
    <th width="33%" align="left" bgcolor="#252525" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #EEEEEE; padding:10px; padding-right:0;">Add To Cart Link</th>
    <th width="33%" align="left" bgcolor="#252525" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #EEEEEE; padding:10px; padding-right:0;">Link</th>
    <th width="33%" align="left" bgcolor="#252525" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #EEEEEE; padding:10px; padding-right:0;">Title</th>
    </tr>
    </colgroup>
    ${htmlText}
    </table>
    `

    if (isInStock) {
      try {
        const date = new Date()
        let info = await transporter.sendMail({
          from: `"instocknotificationbot" <${process.env.GMAIL_SEND_FROM_USERNAME}>`,
          to: process.env.GMAIL_SEND_TO_USERNAME,
          subject: `IN STOCK NOTIFICATION ${title}`,
          text: textToSend,
          html,
          auth: {
            user: process.env.GMAIL_SEND_FROM_USERNAME,
            refreshToken: process.env.OAUTH_REFRESH_TOKEN,
            accessToken,
          },
        })
  
        if (info.rejected.length > 0) {
          return {
            isEmailSent,
            message: `something went wrong with sending the email`,
          }
        }
  
        isEmailSent = true
        return {
          isEmailSent,
          message: `"${title}" IS IN STOCK - ${date}`,
        }
      } catch (error) {
        console.log(
          `there was an error sending an email. check the emails (receiving and sending accounts) and refresh token`
        )
        console.error(error)
        return {
          isEmailSent,
          message: `${error.message}`,
        }
      }
    }
  } catch (error) {
    console.log(`error connecting to oAuth2Client`)
    console.error(error)
    return error
  }
}

So in the if(isInStock) block, I used another try-catch block for our transporter sending an email. In the transporter.sendMail options, I added the email address I'm sending the notification from, the email address receiving the notification, the plaintext version of the message as a Unicode string, HTML body of the email, and authorization options. date is used to log the current date and time. info is an object with a lot of properties, but I used the rejected array to catch any errors sending the notification email. If there were any elements in the rejected array, then the email did not send.

I refactored the if(isInStock) block into a function called sendMail to clean things up a bit. I didn't like the nested try-catch block.

Here's what it looked like after:

// index.js

// ...

const sendEmail = async (
  isInStock,
  transporter,
  title,
  textToSend,
  html,
  accessToken
) => {
  let isEmailSent = false

  if (isInStock) {
    try {
      const date = new Date()
      let info = await transporter.sendMail({
        from: `"instocknotificationbot" <${process.env.GMAIL_SEND_FROM_USERNAME}>`,
        to: process.env.GMAIL_SEND_TO_USERNAME,
        subject: `IN STOCK NOTIFICATION ${title}`,
        text: textToSend,
        html,
        auth: {
          user: process.env.GMAIL_SEND_FROM_USERNAME,
          refreshToken: process.env.OAUTH_REFRESH_TOKEN,
          accessToken,
        },
      })

      if (info.rejected.length > 0) {
        return {
          isEmailSent,
          message: `something went wrong with sending the email`,
        }
      }

      isEmailSent = true
      return {
        isEmailSent,
        message: `"${title}" IS IN STOCK - ${date}`,
      }
    } catch (error) {
      console.log(
        `there was an error sending an email. check the emails (receiving and sending accounts) and refresh token`
      )
      console.error(error)
      return {
        isEmailSent,
        message: `${error.message}`,
      }
    }
  }
}

const sendNotification = async (item) => {
  const { addToCartUrl, title, url, isInStock } = item

  const oauth2Client = new google.auth.OAuth2(
    process.env.OAUTH_CLIENT_ID,
    process.env.OAUTH_CLIENT_SECRET,
    process.env.OAUTH_REDIRECT_URI
  )

  oauth2Client.setCredentials({
    refresh_token: process.env.OAUTH_REFRESH_TOKEN,
  })

  try {
    const accessToken = await oauth2Client.getAccessToken()
    const mailOptions = {
      host: "smtp.gmail.com",
      port: 465,
      secure: true,
      auth: {
        type: "OAUTH2",
        //set these in your .env file
        clientId: process.env.OAUTH_CLIENT_ID,
        clientSecret: process.env.OAUTH_CLIENT_SECRET,
      },
    }

    let transporter = nodemailer.createTransport(mailOptions)
    let textToSend = `In stock notification for ${title} at Best Buy`
    let htmlText = `<tr>
    <td width="33%" align="left" bgcolor="#EEEEEE" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #252525; padding:10px; padding-right:0;">${addToCartUrl}</a></td>
    <td width="33%" align="left" bgcolor="#EEEEEE" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #252525; padding:10px; padding-right:0;">${url}</td>
    <td width="33%" align="left" bgcolor="#EEEEEE" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #252525; padding:10px; padding-right:0;"><p>${title}</p></td>
    </tr>`

    const html = ` 
    <table width="100%" border="0" cellpadding="0" cellspacing="0" bgcolor="#FFFFFF">
    <colgroup span="3">
  <tr width="94%" border="0" cellpadding="0" cellspacing="0">
  <th width="33%" align="left" bgcolor="#252525" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #EEEEEE; padding:10px; padding-right:0;">Add To Cart Link</th>
  <th width="33%" align="left" bgcolor="#252525" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #EEEEEE; padding:10px; padding-right:0;">Link</th>
  <th width="33%" align="left" bgcolor="#252525" style="font-family: Verdana, Geneva, Helvetica, Arial, sans-serif; font-size: 12px; color: #EEEEEE; padding:10px; padding-right:0;">Title</th>
  </tr>
  </colgroup>
  ${htmlText}
  </table>
  `

    const mailMessage = await sendEmail(
      isInStock,
      transporter,
      title,
      textToSend,
      html,
      accessToken.token
    )
    return mailMessage
  } catch (error) {
    console.log(`error connecting to oAuth2Client`)
    console.error(error)
    return error
  }
}

// ...

Now to finish up our startTracking function. First, create a new async function called getInStockMessage. We'll be transferring our if (itemToCheck.isInStock) logic into the getInStockMessage. Then call getInStockMessage in startTracking.

Here's the code now:

// index.js

// ...

const getInStockMessage = async (itemToCheck) => {
  if (itemToCheck.isInStock) {
    try {
      const isInStockMessage = await sendNotification(itemToCheck)
      if (isInStockMessage.isEmailSent) {
        console.log(isInStockMessage.message)
      } else {
        console.log(`something went wrong while sending the email`)
      }
    } catch (error) {
      console.error(error)
    }
  } else {
    const date = new Date()
    console.log(`${itemToCheck.title} is not in stock - ${date}`)
  }
}

const startTracking = async () => {
  const bestBuyProductUrl = `https://www.bestbuy.com/site/nvidia-geforce-rtx-3080-10gb-gddr6x-pci-express-4-0-graphics-card-titanium-and-black/6429440.p?skuId=6429440`

  try {
    const itemToCheck = await fetchUrl(bestBuyProductUrl)
    await getInStockMessage(itemToCheck)
  } catch (error) {
    console.log(`error trying to tracking`)
    console.error(error)
  }
}

startTracking()

Lastly, this notification bot is no use running once, so let's run it every 10 seconds using setInterval. I'm sure you can use a CRON job, but this seems much simpler.

setInterval(() => startTracking(), 10000)

You can adjust the interval to your liking. Now we're officially done! You can check out the repo on my GitHub.

GitHubLinkedInFacebook

© 2021 Kendy Nguyen. All rights reserved.