Go Tutorial | Web Scraping with Golang
For folks who want to have a thorough walk through can watch the video linked below:-
In this tutorial, we will learn how to build a web scraper with Go and Colly. . Sometimes some things just don’t have an API. In those kinds of cases, you can always just write a little web scraper to help you get the data you need.
We’re going to be working with Go and the Colly package. The Colly package will allow us to crawl, scrape and traverse the DOM.
Prerequisites
To follow along, you will need to have Go installed.
Setting up a project directory
Let’s get started. First, change into the directory where our projects are stored. In my case this would be the Youtube folder, it may be different for you. Here we will create our project folder called demo
.
cd Youtube
mkdir demo
cd demo
In the demo
folder, we will initialize and create our go.mode
file using
go mod init demo
After this create a new main.go
file using
touch main.go
Initializing go modules
We’re going to be using the colly package to build our web scraper, so let’s install that now by running:
go get github.com/gocolly/colly
You will notice that running the above command created a go.sum
file. This file holds a list of the checksum and versions for our direct and indirect dependencies. It is used to validate the checksum of each dependency to confirm that none of them have been modified.
In the main.go
file we created earlier, let’s set up a basic package main
and func main()
.
package main
func main() {}
Analyzing the target page structure
We want to fetch the Previous close value of the stock we are interested in.
If you inspect the element, you will clearly see that the tag is a part of the HTML tbody
element and then it’s under a tr tag and finally it’s under a td tag.
So below is the full code to run the script.
Conclusion
Today we looked at How To Web Scrap any website using Go.
Follow me for updates like this.
Connect with me on:-
Twitter 👦🏻:- https://twitter.com/kmmtmm92
Youtube 📹:- https://www.youtube.com/channel/UCV-_hzlbVSlobkekurpLOZw/about
Github 💭:- https://github.com/Kavit900
Instagram 📸:- https://www.instagram.com/code_with_kavit/