49
loading...
This website collects cookies to deliver better user experience
select the element in the page to inspect it
option or icon in the top left-hand side corner. This will allow you to inspect the particular element that you selected on the webpage. You can now see the element tag, id, class, and other attributes required to fetch the element's content.cURL
, which stands for client URL. The tool fetches the contents of the provided URL. It also has several parameters or arguments that can be used to modify its output. We can use the commandcurl -o meaning.txt https://www.dictionary.com/browse/computer#
/browse
which then fetches the word for you and defaults you to the /browse/word#
(the word can be any word you searched). The curl command dumps the output in the meaning.txt
or any specified file. If you see the contents of the file, it is the same as on the web. So we are going to store the meaning of the searched word in the meaning.txt file, you can customize the name and command however you like.grep
and sed
. <span class="one-click-content css-nnyc96 e1q3nk1v1">
contains the actual meaning. We just need the basic meaning, we may not need examples and long lengthy definitions on our Terminal, So we will go with filtering out the span tag with a class called one-click-content css-nnyc96 e1q3nk1v1
. To do that we can use the grep command, which can print the text or line matching the specified expression or text. Here we need the span element with the particular class name so we will use regular expressions to find it more effectively.grep -oP '(?<=<span class="one-click-content css-nnyc96 e1q3nk1v1">).*?(?=</span>)' meaning.txt >temp.txt
-oP
are the arguments that return Only the matching cases and -P
the coming expression is a Perl Regex. The command will return everything in between those tags. Finally, we are storing the result or output in temp.txt
. temp.txt
file. for that, we will introduce another tool to filter text called sed
or Stream editor. This tool allows us to filter the stream field and print or store the outcome. The following code will remove the HTML tags from the scrapped text.sed -i 's/<[^>]*>//g' temp.txt >meaning.txt
temp.txt
file using regular expressions. The -i
command allows us to store the output in a file meaning.txt
. We have used Regex to remove <>
tags from the file and hence anything in between these is also removed and we get the only pure text but it may also contain special characters and symbols. To remove that we'll again use grep
and filter the fine meaning in our file.grep -v '^\s*$\|^\s*\#' temp.txt >meaning.txt
$,#
, and others from the temp.txt file. We finally store everything filtered in the meaning.txt file. If you understood till here, the next concrete step will be super easy for you, as we will assemble everything here in a shell script.#!/bin/bash
read -p "Enter the word to find meaning : " word
output="meaning.txt"
url="https://www.dictionary.com/browse/$word#"
curl -o $output $url
clear
grep -oP '(?<=<span class="one-click-content css-nnyc96 e1q3nk1v1">).*?(?=</span>)' $output >temp.txt
sed -i 's/<[^>]*>//g' temp.txt >$output
grep -v '^\s*$\|^\s*\#' temp.txt >$output
echo "$word"
while read meaning
do
echo $meaning
done < $output
cURL
to the file which we want to store using the variable we created and the URL variable So creating variables makes our script quite easy to manage and also it improves the readability of the script. "&> /dev/null"
this will dump the curl output of network analysis. So we will only get the output of the meaning.txt file. It is optional to add the following into your code as it depends on the operating system so we can optionally use clear command to wipe out the curl output.Mathematics
, code
, and python
. It works only for the words which are on the dictionary.com website. We have successfully made a scrapper that scraps the meaning of the input word from the dictionary.com website, 49