Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Thread Contributor: AncestorTutorial How to build a simple scraper in vb.net
#1
Hey, Here's a way you can make a simple website scraper in vb.net. The objective here will be to get the number of subscribers from a Youtube Page.

[Image: KrF0M5K.png]

After opening a new console application, we first we want to import .net and regular expressions as we will be using a webclient to scrape and regex to sort out the text we want.


Code:
Imports System.Net
Imports System.Text.RegularExpressions


Next create a new sub called scrape. This works by using the webclient to download the source as as a string. 

Code:
Sub scrape()
      Dim wc As New WebClient
      Dim source As String = wc.DownloadString("https://www.youtube.com/channel/UCttMm3hHlDSqXGYmp7tqrEA")
      Dim subregex As New Regex("subscribers"">.*?</span>")
      For Each submatch As Match In subregex.Matches(source)
          Dim subscribers As String = submatch.ToString.Split(">")(1).Split("<")(0)
          Console.WriteLine("Subscribers: " & subscribers)
      Next
  End Sub


Once we have the source it looks for the key word, in this case it's  subscribers"">.*?</span>" .  The .*? in the center indicates the number of subscriber a user has.  It's marked that way because that number will change depending on which user is scraped. You may also notice 2 quotation marks  after the word "subscribers". This is because the key contained a single quotation mark; for regex to recognize it as part of the text you have to put two.

[Image: krDyLnq.png]

Now that we're able to download the source and find the key we now need to "clean it up" so we can have just the text that we want; in this case the subscriber count. As you can see in the above code we're able to achieve this by using .split to remove everything that's not between ">" and "<". The (1) and (0) represent the first instance of that character in the text and the last instance of that character in the text; this tells the program where to start and stop while looking for the text that we want.

If you want to tinker around with it here is the full code which you can copy and paste into a new console application; it should be working as is.

FULL SOURCE CODE SCRAPER
Hidden Content Wrote:You must reply in order to see the hidden content!

If you have any questions, comments, or get stuck feel free to drop a comment below  ?
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)