Scraping the Web

Hey everyone,

So I was thinking of things to code, and recently I’ve been intreseted in Web Scraping. There is a lot that can be done with a web scraper but, I wanted to match something in my career.

That’s why I made a CVE database web scraping, that way I could expedidate my Software Reviews.

Just like most projects, I started with a requirements.txt file in a directory. The libraries I used for this are

requests

and a few other native libaries that we didn’t need to add to the requirements.txt.

I decided early on I wanted to do a CLI tool so to start, I went ahead and added

import argparse

to my import list. Then I made a parser object using

parser = argparse.ArgumentParser()

Now I needed to add the arguments. I knew I was going to need the vendor name, product name, and the version number. I added those with

parser.add_argument("-v" ,"--vendor" , help='Insert the vendor of the product')
parser.add_argument("-p" ,"--product",help='The product to look up')
parser.add_argument("-vs" ,"--version", help='Version of the product')

and initialized it with

args = parse.parse_args

Alright, now I had the CLI portion done for the most part, now it was time to an Bearer Authorization method to our header for the request to the URL we were going to use later

def requestsBearer(url,token):
    headers = {'Authorization': f'Bearer {token}'}
    response = requests.get(url,headers=headers)
    return response

Now, the arguments that are sent to the command line for the script are stored under the long name of the argument. That meant, I could pull the argument string that was entered by the user and set them to variables

vendor = args.vendor
product = args.product
version = args.version

Now for the URL, I was going to use CVEDetails API to leverage for this script so I created an API token on cvedetails.com and took the URL and broke up the parts that would come from the arguments

url = f'https://www.cvedetails.com/api/v1/vulnerability/list-by-vpv?vendorName={vendor}&productType=application&productName={product}&versionString={version}&outputFormat=json&pageNumber=1&resultsPerPage=20'
api = ApiKey

it was time to get a response from the URL and save that response to a variable. I did that with

response = requestsBearer(url,api)
r = response.json()

I had the response now as a JSON so I could do key-value pair matching for the variables I was going to look for. I set the different findings critilality to a base 0 and made a for loop to add the key values as a print. Then I took another set of criteria to set the finding levels. That came out like this

numberOfVulnerabilites = 0
lowFindings = 0
mediumFindings = 0
highFindings = 0
if r['results'] is not None:
    inputParameters = r['results']
    for inputParameters in inputParameters:
        progressBar()
        print(inputParameters['cveNumber']+': '+inputParameters['summary'])
        print('____________________________________________________________')
        numberOfVulnerabilites += 1
        if float(inputParameters['maxCvssBaseScore']) < 5.0:
            lowFindings += 1
        elif 7.5 > float(inputParameters['maxCvssBaseScore']) >= 5.0:
            mediumFindings += 1
        elif float(inputParameters['maxCvssBaseScore']) >= 7.5:
            highFindings += 1

With that, I just needed one last statement for printing out the last of vulnerability costs and i was set.

   print(f'Number of vulnerabilities: {numberOfVulnerabilites}')
    print('=================================================================')
    print(f'lows: {lowFindings}, medium: {mediumFindings}, high: {highFindings}')

Fantastic. To run the script, I just saved the script, opened a Powershell, changed to the directory with the script, and typed **python3 webScrapper.py -v adobe -p acrobat -vs {versionNumber}. It worked and I was able to pull vulnerability information for specific products and version numbers.