Live Proxies

Web Scraping in Golang (Go): Complete Guide in 2025

Master web scraping in Go in 2025 with fast, concurrent scrapers. Learn tools, proxies, legal tips, and full code to handle any data challenge.

Web scraping in Golang
Live Proxies

Live Proxies Editorial Team

Content Manager

How To

29 May 2025

In 2025, web scraping remains one of the most powerful ways to extract, analyze, and leverage web data at scale. As more businesses realize the competitive advantage of data-driven decision-making, Golang has emerged as an exceptional choice for web scraping operations. Go’s efficient performance, strong concurrency model, and minimal memory footprint make it a solid option for high-volume, resource-intensive scraping projects.

This guide will walk you through everything you need to know about building efficient web scrapers with Go, from basic concepts to advanced techniques that can handle even the most challenging scraping scenarios.

What is Web Scraping?

Web scraping is the automated process of extracting structured data from websites. Rather than manually copying information or relying on limited APIs, scraping enables automated collection of publicly visible data points from web pages at scale.

In 2025, web scraping has become integral to numerous business operations:

  • With 94% of online shoppers comparing prices, many e-commerce businesses use scraping for competitive price tracking and market positioning, though this must be done within legal and platform-specific boundaries.
  • Market research firms extract consumer sentiment from millions of product reviews.
  • Real estate analysts track property listings across multiple platforms to identify market trends.
  • Financial analysts scrape economic indicators for algorithmic trading.

The surge in unstructured web data has made scraping an essential skill for developers and data scientists alike, with Golang emerging as a preferred language for performance-critical implementations.

Why Choose Golang for Web Scraping?

When comparing scraping tools, Go stands out for several compelling reasons that make it ideal for modern web data extraction:

Performance Advantages

Go's compiled nature delivers exceptional performance compared to interpreted languages. In benchmark tests, Go scrapers consistently outperform Python equivalents:

  • 2-4x faster Scraping.
  • 60-70% reduced memory footprint.
  • Significantly better handling of concurrent connections.

Concurrency Model

Go's goroutines and channels provide a natural way to implement concurrent scraping:

// Example of concurrent scraping with goroutines
func scrapeUrls(urls []string) []PageData {
    var wg sync.WaitGroup
    results := make([]PageData, len(urls))
    
    for i, url := range urls {
        wg.Add(1)
        go func(i int, url string) {
            defer wg.Done()
            results[i] = scrapeSinglePage(url)
        }(i, url)
    }
    
    wg.Wait()
    return results
}

This elegant concurrency model allows you to scrape hundreds of pages simultaneously without the complexity of thread management found in other languages.

Strong Standard Library

Go's standard library includes powerful networking capabilities through the net/http package, which handles everything from basic GET requests to complex connection pooling:

resp, err := http.Get("https://example.com")
if err != nil {
    // Handle error
}
defer resp.Body.Close()

Combined with third-party packages like goquery (similar to jQuery) for HTML parsing, Go provides everything needed to build robust scrapers without excessive dependencies.

Setting Up Your Go Environment

Before diving into scraping, let's get your Go development environment ready.

Installing Go

Installing Go is straightforward across all major operating systems:

For Windows:

  1. Download the installer from golang.org/dl.
  2. Run the MSI file and follow the installation prompts.
  3. Verify installation by opening a command prompt and typing go version.

For macOS:

# Using Homebrew
brew install go

# Verify installation
go version

For Linux:

# Ubuntu/Debian
sudo apt update
sudo apt install golang

# Verify installation
go version

Setting Up a Go Workspace

Go projects work best with a properly structured workspace:

# Create Go workspace directories
mkdir -p $HOME/go/{bin,src,pkg}

# Add to your profile (.bashrc, .zshrc, etc.)
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

For your scraping project, create a new directory:

mkdir -p $GOPATH/src/github.com/yourusername/goscraper
cd $GOPATH/src/github.com/yourusername/goscraper

Initialize your module:

go mod init github.com/yourusername/goscraper

Building a Basic Web Scraper in Go

Let's start with a simple scraper that extracts quotes from a website.

Fetching Web Pages

The foundation of any web scraper is the ability to fetch web pages. Go's net/http package makes this straightforward:

package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "time"
)

func main() {
    // Create a custom HTTP client with timeout
    client := &http.Client{
        Timeout: 30 * time.Second,
    }
    
    // Send GET request
    resp, err := client.Get("http://quotes.toscrape.com/")
    if err != nil {
        log.Fatalf("Failed to fetch page: %v", err)
    }
    defer resp.Body.Close()
    
    // Check status code
    if resp.StatusCode != 200 {
        log.Fatalf("Non-200 status code: %d", resp.StatusCode)
    }
    
    // Read the response body
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        log.Fatalf("Failed to read response body: %v", err)
    }
    
    fmt.Printf("Page fetched successfully! Length: %d bytes\n", len(body))
}

  Fetching Web Pages

This code establishes a solid foundation with proper error handling and timeout settings, which are essential for reliable scraping.

Parsing HTML with goquery

Now that we can fetch web pages, we need to parse and extract specific data. The goquery package provides jQuery-like syntax for HTML traversal:

Parsing HTML with goquery  


package main

import (
    "fmt"
    "log"
    "net/http"
    "strings"
    "time"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    // Initialize HTTP client
    client := &http.Client{
        Timeout: 30 * time.Second,
    }
    
    // Fetch the page
    resp, err := client.Get("http://quotes.toscrape.com/")
    if err != nil {
        log.Fatalf("Failed to fetch page: %v", err)
    }
    defer resp.Body.Close()
    
    // Check status code
    if resp.StatusCode != 200 {
        log.Fatalf("Non-200 status code: %d", resp.StatusCode)
    }
    
    // Create a goquery document from the HTTP response
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Fatalf("Failed to parse HTML: %v", err)
    }
    
    // Extract quotes using CSS selectors
    doc.Find(".quote").Each(func(i int, s *goquery.Selection) {
        // Extract text, author and tags
        text := s.Find(".text").Text()
        author := s.Find(".author").Text()
        
        var tags []string
        s.Find(".tag").Each(func(i int, t *goquery.Selection) {
            tags = append(tags, t.Text())
        })
        
        // Print extracted data
        fmt.Printf("Quote: %s\n", text)
        fmt.Printf("Author: %s\n", author)
        fmt.Printf("Tags: %v\n", tags)
        fmt.Println(strings.Repeat("-", 20))
    })
}

  Parsing HTML with goquery

First, install the required dependency:

go get github.com/PuerkitoBio/goquery

This simple example demonstrates how to:

  1. Fetch a web page.
  2. Parse the HTML using goquery.
  3. Extract specific elements using CSS selectors.
  4. Process and display the extracted data.

Advanced Scraping Techniques

Basic scraping works well for simple sites, but modern web applications often require more sophisticated approaches.

Handling JavaScript-Rendered Pages

Many websites now render content dynamically with JavaScript, making traditional HTTP requests insufficient. For these cases, Go offers headless browser automation:


package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"github.com/chromedp/chromedp"
)

func main() {
	ctx, cancel := chromedp.NewContext(context.Background())
	defer cancel()

	ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
	defer cancel()

	var quotes, authors []string

	err := chromedp.Run(ctx,
		chromedp.Navigate("https://quotes.toscrape.com/js/"),
		chromedp.WaitVisible(".quote", chromedp.ByQuery),

		// Get all quote texts
		chromedp.Evaluate(`
			Array.from(document.querySelectorAll(".quote .text")).map(q => q.textContent)
		`, &quotes),

		// Get all authors
		chromedp.Evaluate(`
			Array.from(document.querySelectorAll(".quote .author")).map(a => a.textContent)
		`, &authors),
	)

	if err != nil {
		log.Fatalf("Failed to scrape: %v", err)
	}

	for i := range quotes {
		fmt.Printf("Quote: %s\n", quotes[i])
		fmt.Printf("Author: %s\n", authors[i])
		fmt.Println("--------------------")
	}
}

 

Handling JavaScript-Rendered Pages

First, install the required dependency:

go get github.com/chromedp/chromedp

This approach uses Chrome DevTools Protocol to control a headless Chrome browser, allowing you to scrape even the most complex JavaScript-heavy websites.

Install Chrome or Chromium (if not already installed)

Since chromedp requires a Chrome/Chromium browser installed on your system, make sure it’s available. You can install it on Linux using the following commands:

  • Install Google Chrome (stable)
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
  • Install Chromium (open-source version)
sudo apt update
sudo apt install -y chromium-browser

Managing Sessions and Cookies

Many websites require authentication or session management. Here's how to handle cookies and maintain sessions in Go:

package main

import (
	"fmt"
	"log"
	"net/http"
	"net/http/cookiejar"
	"net/url"
	"io/ioutil"
)

func main() {
	// Create a cookie jar
	jar, err := cookiejar.New(nil)
	if err != nil {
		log.Fatalf("Failed to create cookie jar: %v", err)
	}

	// Create HTTP client with cookie jar
	client := &http.Client{
		Jar: jar,
	}

	// Example form data
	formData := url.Values{
		"username": {"testuser"},
		"password": {"testpass"},
	}

	// Simulate login POST request (httpbin returns what you send)
	resp, err := client.PostForm("https://httpbin.org/post", formData)
	if err != nil {
		log.Fatalf("Login POST failed: %v", err)
	}
	defer resp.Body.Close()

	body, _ := ioutil.ReadAll(resp.Body)
	fmt.Printf("Login POST Response:\n%s\n\n", string(body))

	// Set a cookie via httpbin
	_, err = client.Get("https://httpbin.org/cookies/set?sessionid=abc123")
	if err != nil {
		log.Fatalf("Failed to set cookie: %v", err)
	}

	// Now fetch cookies
	resp, err = client.Get("https://httpbin.org/cookies")
	if err != nil {
		log.Fatalf("Failed to get cookies: %v", err)
	}
	defer resp.Body.Close()

	body, _ = ioutil.ReadAll(resp.Body)
	fmt.Printf("Cookies Response:\n%s\n", string(body))
}

  Managing Sessions and Cookies

What this does:

  1. Simulates a login POST to https://httpbin.org/post.
  2. Sets a cookie via https://httpbin.org/cookies/set.
  3. Fetches the cookies via https://httpbin.org/cookies.

Avoiding Anti-Scraping Mechanisms

As web scraping becomes more prevalent, websites increasingly implement countermeasures. Here's how to build resilient scrapers.

Using Rotating Proxies

IP rotation is essential for high-volume scraping to avoid rate limiting and IP bans:

package main

import (
    "fmt"
    "log"
    "math/rand"
    "net/http"
    "net/url"
    "time"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    // List of proxy servers
    proxyURLs := []string{
        "http://proxy1.example.com:8080",
        "http://proxy2.example.com:8080",
        "http://proxy3.example.com:8080",
    }
    
    // URL to scrape
    targetURL := "https://quotes.toscrape.com/"
    
    // Choose a random proxy
    rand.Seed(time.Now().UnixNano())
    proxyURL, err := url.Parse(proxyURLs[rand.Intn(len(proxyURLs))])
    if err != nil {
        log.Fatalf("Failed to parse proxy URL: %v", err)
    }
    
    // Create a transport with the proxy
    transport := &http.Transport{
        Proxy: http.ProxyURL(proxyURL),
    }
    
    // Create client with the transport
    client := &http.Client{
        Transport: transport,
        Timeout:   30 * time.Second,
    }
    
    // Make the request through the proxy
    resp, err := client.Get(targetURL)
    if err != nil {
        log.Fatalf("Request failed: %v", err)
    }
    defer resp.Body.Close()
    
    // Parse and process as before
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Fatalf("Failed to parse HTML: %v", err)
    }
    
    // Extract data using goquery
    doc.Find(".quote").Each(func(i int, s *goquery.Selection) {
        text := s.Find(".text").Text()
        author := s.Find(".author").Text()
        fmt.Printf("Quote: %s\nAuthor: %s\n\n", text, author)
    })
}

For large-scale operations, consider using dedicated proxy services with API integration to access thousands of residential and datacenter IPs.

Pro tip: Instead of free proxies (which are often unstable), consider premium residential proxy services like Live Proxies. They offer rotating residential IPs specifically allocated for web scraping use cases, minimizing the risk of IP bans and improving scraper success rates for large-scale operations

Randomizing User Agents and Delays

Mimicking human browsing patterns is essential for avoiding detection:

package main

import (
    "fmt"
    "log"
    "math/rand"
    "net/http"
    "time"

    "github.com/PuerkitoBio/goquery"
)

// List of common user agents
var userAgents = []string{
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
    "Mozilla/5.0 (Linux; Android 13; SM-S901B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Mobile Safari/537.36",
    "Mozilla/5.0 (iPad; CPU OS 17_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/123.0.6312.87 Mobile/15E148 Safari/604.1",
}

func main() {
    // List of URLs to scrape
    urls := []string{
        "https://quotes.toscrape.com/page/1/",
        "https://quotes.toscrape.com/page/2/",
        "https://quotes.toscrape.com/page/3/",
    }
    
    // Initialize random seed
    rand.Seed(time.Now().UnixNano())
    
    // Create HTTP client
    client := &http.Client{
        Timeout: 30 * time.Second,
    }
    
    // Process each URL
    for _, url := range urls {
        // Random delay between requests (2-5 seconds)
        delay := 2 + rand.Intn(3)
        time.Sleep(time.Duration(delay) * time.Second)
        
        // Select random user agent
        userAgent := userAgents[rand.Intn(len(userAgents))]
        
        // Create request with custom headers
        req, err := http.NewRequest("GET", url, nil)
        if err != nil {
            log.Printf("Failed to create request for %s: %v", url, err)
            continue
        }
        
        // Set headers to mimic a real browser
        req.Header.Set("User-Agent", userAgent)
        req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8")
        req.Header.Set("Accept-Language", "en-US,en;q=0.5")
        req.Header.Set("Connection", "keep-alive")
        req.Header.Set("Upgrade-Insecure-Requests", "1")
        req.Header.Set("Sec-Fetch-Dest", "document")
        req.Header.Set("Sec-Fetch-Mode", "navigate")
        req.Header.Set("Sec-Fetch-Site", "none")
        req.Header.Set("Sec-Fetch-User", "?1")
        
        // Send request
        resp, err := client.Do(req)
        if err != nil {
            log.Printf("Request failed for %s: %v", url, err)
            continue
        }
        
        // Process response
        if resp.StatusCode == http.StatusOK {
            doc, err := goquery.NewDocumentFromReader(resp.Body)
            if err != nil {
                log.Printf("Failed to parse HTML for %s: %v", url, err)
                resp.Body.Close()
                continue
            }
            
            // Extract quotes
            doc.Find(".quote").Each(func(i int, s *goquery.Selection) {
                text := s.Find(".text").Text()
                author := s.Find(".author").Text()
                fmt.Printf("Quote: %s\nAuthor: %s\n\n", text, author)
            })
        }
        
        resp.Body.Close()
        fmt.Printf("Processed %s with user agent: %s\n", url, userAgent)
    }
}

  Randomizing User Agents and Delays

This approach:

  1. Randomizes delays between requests.
  2. Rotates through realistic user agent strings.
  3. Sets appropriate headers to mimic real browsers.
  4. Adds natural variability to scraping patterns.

Storing and Processing Scraped Data

Once you've extracted data, you need efficient ways to store and process it.

Writing Data to CSV Files

CSV remains one of the most versatile formats for storing structured data:

package main

import (
    "encoding/csv"
    "log"
    "net/http"
    "os"
    "strings"

    "github.com/PuerkitoBio/goquery"
)

type Quote struct {
    Text   string
    Author string
    Tags   string
}

func main() {
    // Fetch the page
    resp, err := http.Get("http://quotes.toscrape.com/")
    if err != nil {
        log.Fatalf("Failed to fetch page: %v", err)
    }
    defer resp.Body.Close()
    
    // Parse HTML
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Fatalf("Failed to parse HTML: %v", err)
    }
    
    // Prepare CSV file
    file, err := os.Create("quotes.csv")
    if err != nil {
        log.Fatalf("Failed to create CSV file: %v", err)
    }
    defer file.Close()
    
    // Initialize CSV writer
    writer := csv.NewWriter(file)
    defer writer.Flush()
    
    // Write header
    header := []string{"Text", "Author", "Tags"}
    if err := writer.Write(header); err != nil {
        log.Fatalf("Failed to write CSV header: %v", err)
    }
    
    // Extract and write quotes
    doc.Find(".quote").Each(func(i int, s *goquery.Selection) {
        text := strings.TrimSpace(s.Find(".text").Text())
        author := strings.TrimSpace(s.Find(".author").Text())
        
        var tags []string
        s.Find(".tag").Each(func(i int, t *goquery.Selection) {
            tags = append(tags, t.Text())
        })
        
        // Join tags into a comma-separated string
        tagsStr := strings.Join(tags, ",")
        
        // Write row to CSV
        record := []string{text, author, tagsStr}
        if err := writer.Write(record); err != nil {
            log.Printf("Failed to write record: %v", err)
        }
    })
    
    log.Println("Scraping completed. Data saved to quotes.csv")
}

Using Databases for Data Storage

For larger datasets or more complex data relationships, databases provide better organization and querying capabilities:

package main

import (
	"database/sql"
	"log"
	"net/http"
	"strings"

	"github.com/PuerkitoBio/goquery"
	_ "github.com/lib/pq"
)

func main() {
	// Open PostgreSQL database
	db, err := sql.Open("postgres", "host=sever_host port=5432 user=postgres password=yourpassword dbname=quotesdb sslmode=disable")
	if err != nil {
		log.Fatalf("Failed to open database: %v", err)
	}
	defer db.Close()

	// Create tables if they don't exist
	createTables(db)

	// Fetch the page
	resp, err := http.Get("http://quotes.toscrape.com/")
	if err != nil {
		log.Fatalf("Failed to fetch page: %v", err)
	}
	defer resp.Body.Close()

	// Parse HTML
	doc, err := goquery.NewDocumentFromReader(resp.Body)
	if err != nil {
		log.Fatalf("Failed to parse HTML: %v", err)
	}

	// Extract and store quotes
	doc.Find(".quote").Each(func(i int, s *goquery.Selection) {
		text := strings.TrimSpace(s.Find(".text").Text())
		author := strings.TrimSpace(s.Find(".author").Text())

		// Insert quote and get its ID
		quoteID, err := insertQuote(db, text, author)
		if err != nil {
			log.Printf("Failed to insert quote: %v", err)
			return
		}

		// Extract and store tags
		s.Find(".tag").Each(func(i int, t *goquery.Selection) {
			tag := strings.TrimSpace(t.Text())

			// Get or create tag ID
			tagID, err := getOrCreateTag(db, tag)
			if err != nil {
				log.Printf("Failed to process tag: %v", err)
				return
			}

			// Create relationship between quote and tag
			if err := linkQuoteToTag(db, quoteID, tagID); err != nil {
				log.Printf("Failed to link quote to tag: %v", err)
			}
		})
	})

	log.Println("Scraping completed. Data saved to database.")
}

func createTables(db *sql.DB) {
	// Create quotes table
	_, err := db.Exec(`
		CREATE TABLE IF NOT EXISTS quotes (
			id SERIAL PRIMARY KEY,
			text TEXT NOT NULL,
			author TEXT NOT NULL
		)
	`)
	if err != nil {
		log.Fatalf("Failed to create quotes table: %v", err)
	}

	// Create tags table
	_, err = db.Exec(`
		CREATE TABLE IF NOT EXISTS tags (
			id SERIAL PRIMARY KEY,
			name TEXT NOT NULL UNIQUE
		)
	`)
	if err != nil {
		log.Fatalf("Failed to create tags table: %v", err)
	}

	// Create quote_tags relation table
	_, err = db.Exec(`
		CREATE TABLE IF NOT EXISTS quote_tags (
			quote_id INTEGER,
			tag_id INTEGER,
			PRIMARY KEY (quote_id, tag_id),
			FOREIGN KEY (quote_id) REFERENCES quotes (id),
			FOREIGN KEY (tag_id) REFERENCES tags (id)
		)
	`)
	if err != nil {
		log.Fatalf("Failed to create quote_tags table: %v", err)
	}
}

func insertQuote(db *sql.DB, text, author string) (int64, error) {
	var id int64
	err := db.QueryRow("INSERT INTO quotes (text, author) VALUES ($1, $2) RETURNING id", text, author).Scan(&id)
	if err != nil {
		return 0, err
	}
	return id, nil
}

func getOrCreateTag(db *sql.DB, tagName string) (int64, error) {
	// Try to get existing tag
	var tagID int64
	err := db.QueryRow("SELECT id FROM tags WHERE name = $1", tagName).Scan(&tagID)
	if err == nil {
		return tagID, nil
	}

	// Create new tag if not found
	err = db.QueryRow("INSERT INTO tags (name) VALUES ($1) RETURNING id", tagName).Scan(&tagID)
	if err != nil {
		return 0, err
	}
	return tagID, nil
}

func linkQuoteToTag(db *sql.DB, quoteID, tagID int64) error {
	_, err := db.Exec("INSERT INTO quote_tags (quote_id, tag_id) VALUES ($1, $2) ON CONFLICT DO NOTHING", quoteID, tagID)
	return err
}

First, install the required dependency:

go get github.com/lib/pq

This more complex example demonstrates:

  1. Creating a relational database structure.
  2. Storing structured data with relationships.
  3. Handling duplicate entries elegantly.

Real-World Example: Scraping E-Commerce Product Prices with Go

Let's build a practical e-commerce scraper that extracts product information from Amazon. This example demonstrates how to combine various scraping techniques into a complete, production-ready solution.

Targeting an E-Commerce Website

E-commerce price monitoring is a common use case for web scraping. According to recent studies, retailers that use competitor price monitoring can increase profit margins by up to 25%.

For our example, we'll target Amazon product listings to extract:

  • Product title
  • Current price
  • Customer ratings
  • Product details

Product Title

Product Title

Price

Price

Rating Star

Rating Star

Ratings Count

Ratings Count

Implementing the Scraper

Here's a complete implementation of an Amazon product scraper in Go:

package main

import (
	"fmt"
	"log"
	"math/rand"
	"net/http"
	"strconv"
	"strings"
	"time"

	"github.com/PuerkitoBio/goquery"
)

// Product represents Amazon product data
type Product struct {
	Title   string
	Price   string
	Rating  string
	Reviews int
}

// Fetches product information from Amazon
func scrapeProduct(url string, userAgents []string) (*Product, error) {
	// Select a random user agent
	userAgent := userAgents[rand.Intn(len(userAgents))]

      // Change this to your proxy
      proxyURL := "http://127.0.0.1:8080" 
	proxyFunc := http.ProxyURL(&url.URL{
		Scheme: "http",
		Host:   "127.0.0.1:8080",
	})

      transport := &http.Transport{
		Proxy: proxyFunc,
	}
	
	// Create an HTTP client with timeout
	client := &http.Client{
		Timeout: 30 * time.Second,
            Transport: transport,
	}
	
	// Create a new request
	req, err := http.NewRequest("GET", url, nil)
	if err != nil {
		return nil, fmt.Errorf("error creating request: %v", err)
	}
	
	// Add headers to mimic a browser
	req.Header.Set("User-Agent", userAgent)
	req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml")
	req.Header.Set("Accept-Language", "en-US,en;q=0.9")
	req.Header.Set("DNT", "1")
	req.Header.Set("Connection", "keep-alive")
	req.Header.Set("Upgrade-Insecure-Requests", "1")
	
	// Send the request
	resp, err := client.Do(req)
	if err != nil {
		return nil, fmt.Errorf("error fetching URL: %v", err)
	}
	defer resp.Body.Close()
	
	// Check if the request was successful
	if resp.StatusCode != 200 {
		return nil, fmt.Errorf("failed to fetch URL. Status code: %d", resp.StatusCode)
	}
	
	// Parse the HTML document
	doc, err := goquery.NewDocumentFromReader(resp.Body)
	if err != nil {
		return nil, fmt.Errorf("error parsing HTML: %v", err)
	}
	
	// Extract product data
	product := &Product{}
	
	// Extract product title
	product.Title = strings.TrimSpace(doc.Find("#productTitle").Text())
	
	// Extract price (handling different price selectors)
	priceSelectors := []string{
		"#priceblock_ourprice", 
		".a-price .a-offscreen", 
		"#price_inside_buybox",
	}
	
	for _, selector := range priceSelectors {
		if price := doc.Find(selector).First().Text(); price != "" {
			product.Price = strings.TrimSpace(price)
			break
		}
	}
	
	// Extract rating
	ratingText := doc.Find("#acrPopover").AttrOr("title", "")
	if ratingStr := strings.TrimSpace(ratingText); ratingStr != "" {
		// Extract rating value (e.g., "4.5 out of 5 stars" -> "4.5")
		parts := strings.Split(ratingStr, " ")
		if len(parts) > 0 {
			product.Rating = parts[0]
		}
	}
	
	// Extract review count
	reviewText := doc.Find("#acrCustomerReviewText").Text()
	if reviewStr := strings.TrimSpace(reviewText); reviewStr != "" {
		// Extract number (e.g., "1,234 ratings" -> 1234)
		numStr := strings.Split(reviewStr, " ")[0]
		numStr = strings.ReplaceAll(numStr, ",", "")
		if count, err := strconv.Atoi(numStr); err == nil {
			product.Reviews = count
		}
	}
	
	return product, nil
}

func main() {
	// Initialize random seed
	rand.Seed(time.Now().UnixNano())
	
	// List of common user agents for rotation
	userAgents := []string{
		"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
		"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36",
		"Mozilla/5.0 (Linux; Android 8.1.0; ONEPLUS A5000) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Mobile Safari/537.36",
		"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
		"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/122.0",
	}
	
	// List of Amazon product URLs to scrape
	productURLs := []string{
		"https://www.amazon.com/-/dp/B09FTNMT84",
		"https://www.amazon.com/-/dp/B09NCMHTSB",
		"https://www.amazon.com/-/dp/B08PPDJWC8",
	}
	
	// Scrape each product with a delay between requests
	for i, url := range productURLs {
		// Add delay between requests (2-5 seconds)
		if i > 0 {
			delay := time.Duration(rand.Intn(3000)+2000) * time.Millisecond
			fmt.Printf("Waiting %v before next request...\n", delay)
			time.Sleep(delay)
		}
		
		fmt.Printf("Scraping product at %s\n", url)
		
		product, err := scrapeProduct(url, userAgents)
		if err != nil {
			log.Printf("Error scraping %s: %v\n", url, err)
			continue
		}
		
		// Display product info
		fmt.Println("Product Information:")
		fmt.Printf("  Title: %s\n", product.Title)
		fmt.Printf("  Price: %s\n", product.Price)
		fmt.Printf("  Rating: %s\n", product.Rating)
		fmt.Printf("  Reviews: %d\n", product.Reviews)
		fmt.Println(strings.Repeat("-", 50))
	}
	
	fmt.Println("Scraping completed!")
}

  Implementing the Scraper

This implementation demonstrates several key concepts:

  1. Robust header management: We use a realistic set of headers and rotate user agents to avoid detection.
  2. Error handling: Each step includes proper error handling and reporting.
  3. Selector flexibility: We handle multiple possible selectors for elements like price, making the scraper more resilient to site changes.
  4. Request spacing: We add random delays between requests to avoid overwhelming the server.
  5. Clean data extraction: We carefully parse text content to extract meaningful data.

Analyzing the Data

Once you've scraped the product data, you can analyze it for insights. Here's a simple example of how to calculate average prices and compare them across products:

package main

import (
	"fmt"
	"strconv"
	"strings"
)

// ParsePrice converts a price string like "$19.99" to a float64
func ParsePrice(priceStr string) (float64, error) {
	// Remove currency symbol and any commas
	clean := strings.TrimSpace(priceStr)
	clean = strings.ReplaceAll(clean, "$", "")
	clean = strings.ReplaceAll(clean, ",", "")
	
	// Convert to float64
	price, err := strconv.ParseFloat(clean, 64)
	if err != nil {
		return 0, fmt.Errorf("error parsing price '%s': %v", priceStr, err)
	}
	
	return price, nil
}

// AnalyzeProducts calculates statistics from a list of products
func AnalyzeProducts(products []*Product) {
	if len(products) == 0 {
		fmt.Println("No products to analyze")
		return
	}
	
	var totalPrice float64
	var totalRating float64
	var totalReviews int
	var validPrices int
	var validRatings int
	
	// Calculate totals
	for _, product := range products {
		// Process price
		if price, err := ParsePrice(product.Price); err == nil {
			totalPrice += price
			validPrices++
		}
		
		// Process rating
		if rating, err := strconv.ParseFloat(product.Rating, 64); err == nil {
			totalRating += rating
			validRatings++
		}
		
		totalReviews += product.Reviews
	}
	
	// Calculate averages
	avgPrice := totalPrice / float64(validPrices)
	avgRating := totalRating / float64(validRatings)
	avgReviews := float64(totalReviews) / float64(len(products))
	
	// Display results
	fmt.Println("Product Analysis:")
	fmt.Printf("  Number of products: %d\n", len(products))
	fmt.Printf("  Average price: $%.2f\n", avgPrice)
	fmt.Printf("  Average rating: %.1f/5.0\n", avgRating)
	fmt.Printf("  Average reviews: %.0f\n", avgReviews)
	
	// Find the highest and lowest priced items
	var highestProduct, lowestProduct *Product
	var highestPrice, lowestPrice float64
	
	for _, product := range products {
		price, err := ParsePrice(product.Price)
		if err != nil {
			continue
		}
		
		if highestProduct == nil || price > highestPrice {
			highestPrice = price
			highestProduct = product
		}
		
		if lowestProduct == nil || price < lowestPrice {
			lowestPrice = price
			lowestProduct = product
		}
	}
	
	if highestProduct != nil {
		fmt.Printf("\nHighest priced product:\n")
		fmt.Printf("  %s\n", highestProduct.Title)
		fmt.Printf("  Price: %s\n", highestProduct.Price)
	}
	
	if lowestProduct != nil {
		fmt.Printf("\nLowest priced product:\n")
		fmt.Printf("  %s\n", lowestProduct.Title)
		fmt.Printf("  Price: %s\n", lowestProduct.Price)
	}
}

To use this analysis function, you'd collect products into a slice and then pass them to the analyzer:

// Update the main function:
func main() {
	rand.Seed(time.Now().UnixNano())

	userAgents := []string{
		"Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/123.0.0.0 Safari/537.36",
		"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/605.1.15",
	}

	productURLs := []string{
		"https://www.amazon.com/-/dp/B09FTNMT84",
		"https://www.amazon.com/-/dp/B09NCMHTSB",
		"https://www.amazon.com/-/dp/B08PPDJWC8",
	}

	var products []*Product

	for i, url := range productURLs {
		if i > 0 {
			delay := time.Duration(rand.Intn(3000)+2000) * time.Millisecond
			fmt.Printf("Waiting %v...\n", delay)
			time.Sleep(delay)
		}

		fmt.Printf("Scraping %s...\n", url)
		product, err := scrapeProduct(url, userAgents)
		if err != nil {
			log.Printf("Error scraping %s: %v\n", url, err)
			continue
		}

		fmt.Printf("  Title: %s\n  Price: %s\n  Rating: %s\n  Reviews: %d\n",
			product.Title, product.Price, product.Rating, product.Reviews)
		fmt.Println(strings.Repeat("-", 50))

		products = append(products, product)
	}

	AnalyzeProducts(products)
}

  Analyzing the Data

This analysis:

  • Calculates average price, rating, and review count.
  • Identifies the highest and lowest priced products.
  • Provides insights that could inform purchasing or pricing decisions.

For businesses, this type of price monitoring can provide competitive advantages by:

  1. Identifying underpriced or overpriced products.
  2. Detecting price trends over time.
  3. Creating price alerts when competitors change pricing.
  4. Informing dynamic pricing strategies.

Legal and Ethical Considerations

When scraping e-commerce websites, it's crucial to understand and respect legal and ethical boundaries.

Understanding Legal Implications

Web scraping exists in a complex legal landscape. Several key considerations include:

1.Terms of Service: Most major e-commerce sites, like Amazon, prohibit scraping in their Terms of Service. While violating ToS may not always result in legal action, it can lead to IP bans, account suspension, or lawsuits depending on the use case. 2. Rate Limiting: Sending excessive automated requests may strain servers and in extreme cases, could be interpreted as abusive behavior or a denial-of-service attack. 3. Copyright Issues: Many on-site assets like images, product descriptions, and structured data may be protected under copyright or database rights, especially in the EU. 4. Legal Precedent: In the U.S., the hiQ Labs v. LinkedIn case set an important precedent by ruling that scraping publicly accessible data from LinkedIn profiles did not violate the Computer Fraud and Abuse Act (CFAA). However, this ruling is jurisdiction-specific and pertains only to public data scraping. Private, protected, or login-gated content remains unlawful in many regions. Laws and interpretations also vary globally, so it’s vital to consult legal counsel based on your specific location and scraping targets.

Ethical Scraping Practices

To conduct scraping ethically and reduce legal risk:

  1. Respect robots.txt: It’s important to check and review a site’s robots.txt file before scraping. While this file is typically a guideline rather than a legal requirement in most countries, it outlines which parts of a site the website owner prefers not to be accessed by automated tools. In some regions, particularly in Europe or when scraping personal data, ignoring these directives might have legal consequences. When uncertain, it’s a good practice to honor robots.txt rules or consult legal guidance.
  2. Implement Rate Limiting: Use reasonable delays between requests to minimize server impact.
  3. Identify Your Scraper: Consider including contact information in your user agent string.
  4. Use Public APIs When Available: Many retailers offer official APIs that provide the same data without the legal risks of scraping.
  5. Store Only What You Need: Minimize the amount of data you store, especially personal information.

Conclusion

Go is an outstanding choice for building fast, scalable web scrapers, particularly for high-volume, data-intensive tasks like e-commerce monitoring. In this guide, we’ve covered the essentials of creating a Go web scraper, including:

  • Crafting realistic HTTP requests with custom headers.
  • Implementing user-agent rotation and request pacing to stay under detection thresholds.
  • Parsing complex HTML pages using Go libraries like goquery.
  • Handling errors gracefully and ensuring scraper resilience.
  • Extracting and processing valuable product data effectively.

The benefits of using Go for web scraping lie in its impressive concurrency model, efficient memory usage, and ease of deployment, making it ideal for both small projects and enterprise-scale scraping systems.

As always, it’s essential to approach scraping responsibly. Stick to publicly available data, respect website terms of service, and stay aware of relevant legal frameworks like GDPR and CCPA.

To maximize reliability and avoid IP blocks during large scraping operations, we recommend integrating premium rotating residential proxy services like Live Proxies. Their private, dedicated IP pools help ensure your scraping tasks remain secure, scalable, and compliant.

Important: Always ensure that the use of proxies or any scraping tools complies with the target website’s Terms of Service (ToS) and adheres to local data privacy regulations. Unauthorized scraping may violate legal agreements or result in access restrictions.

With these tools and best practices in hand, you’re well-equipped to build efficient, ethical, and high-performing web scrapers in Go.

FAQ

Is Go better than Python for web scraping in 2025?

Yes, according to benchmarks in the article, Go scrapers consistently run faster, use less memory, and handle concurrent connections far better than Python. Go’s compiled nature, lightweight goroutines, and efficient memory management make it ideal for large-scale, high-concurrency scraping.

What are the best libraries for web scraping in Go?

  • Goquery: for HTML parsing (jQuery-like syntax).
  • Colly: a high-level, elegant scraping framework.
  • chromedp: for headless browser scraping and handling JavaScript-rendered content.

Can Go handle JavaScript-rendered websites?

Yes, the article provides a detailed implementation using chromedp, which controls a headless Chrome instance via the DevTools Protocol. This allows you to scrape dynamic, JavaScript-driven pages.

How do I rotate proxies when scraping with Go?

Covered in-depth, with example code randomly selecting from a list of proxy URLs and attaching them to the HTTP transport. It also recommends premium proxy services like Live Proxies for production use, to minimize bans and improve success rates.

How do I avoid getting blocked while scraping with Go?

  1. Randomizing user agents.
  2. Adding random delays between requests.
  3. Rotating proxies.
  4. Mimicking human browsing patterns.
  5. Using session management with cookies.
  6. Integrating premium rotating residential proxies for large-scale operations.

Is web scraping legal in 2025?

Yes, web scraping is generally legal in 2025 when it involves publicly available data. However, it’s essential to review and respect a website’s terms of service. Scraping data behind logins or accessing personal information may violate privacy laws such as the GDPR.

What websites can I scrape legally using Go?

You can safely scrape:

  • Public, non-authenticated pages without access restrictions.
  • Data explicitly permitted via robots.txt or APIs.

Note: Avoid scraping login-protected, copyrighted, or personal data without permission.

How can I store scraped data from Go scrapers?

  • CSV files: simple, portable, widely supported.
  • Databases: for structured, scalable storage with relational integrity.

Example code for both is included in the article.

Why does my Go scraper return empty data?

  1. Incorrect or outdated CSS selectors.
  2. JavaScript-rendered content (requires chromedp).
  3. Anti-scraping measures are blocking your requests.

Is using proxies for scraping ethical and allowed?

Ethical proxy use involves:

  • Respecting the terms of service.
  • Avoiding scraping personal or private data.

Using proxies like Live Proxies that are compliant for scraping. Always disclose your intent when appropriate, and scrape responsibly.