Download files and directories from web using curl and wget

Rakesh Jain
5 min readFeb 9, 2021

This is one thing which everyone of us might have faced difficulty with or are still struggling to get a simple and exact answer.

FYI both curl and wget support HTTP, HTTPS, and FTP protocols.

Let’s get our hands dirty.

Downloading Files and Directories from web with WGET

Our first target is to download directories and underneath files from below location

wget $ wget -np -P . -r -R "index.html*" --cut-dirs=4 http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/

Let us understand this command in detail.

- -no-parent or -np
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

Note that, for HTTP (and HTTPS), the trailing slash is very important to “— -no-parent”. HTTP has no concept of a “directory”. Wget relies on you to indicate what’s a directory and what isn’t. In ‘http://foo/bar/’, Wget will consider ‘bar’ to be a directory, while in ‘http://foo/bar’ (no trailing slash), ‘bar’ will be considered a filename (so ‘ — -no-parent’ would be meaningless, as its parent is ‘/’).

-P prefix or - -directory-prefix=prefix
Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is ‘.’ (the current directory).

-r or - -recursive
Turn on recursive retrieving. The default maximum depth is 5.

‘-R rejlist or - -reject rejlist’
Specify comma-separated lists of file name suffixes or patterns to accept or reject. Note that if any of the wildcard characters, ‘*’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A “*.mp3”’ or ‘-A ‘*.mp3’’.

Like we have done for all “index.html” files -> -R “index.html*”

cut_dirs = n
If you do not want the first few directories you can Ignore n remote directory components. Equivalent to - -cut-dirs=n.

Once run you will have the directory structure like below.

Now download files only from here

http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/repodata/

Here is the command.

wget $ wget -nd -np -P . -r -R "index.html*" http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/repodata/

Here we have used one additional parameter — -nd.

-nd or - -no-directories

Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions ‘.n’).

This command will download the files on the same directory.

wget is the go to solution for us as it only supports a small number of protocols. Its very effective for downloading files and can also be used to download directory structures recursively.
For more information on various other arguments please go
here.

Downloading Files from web with CURL

curl does not provide recursive download. So we can only use it for downloading files.

Download a single file with curl

curl $ curl -O http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/repodata/repomd.xml

Download multiple files with curl

To download multiple files at the same time, use –O followed by the URL to the file that you wish to download.

Use the following syntax for this purpose:

curl -O [url1] -O [url2]

You can also make use of double quoting the URL to avoid copying it multiple times:

curl -O "http://domain/path/to/{file1,file2}"

curl $ curl -O “http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/repodata/{http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/repodata/repomd.xml, http://mirror.myfahim.com/centos/7.9.2009/updates/x86_64/repodata/repomd.xml.asc}"

If you curl without any options except for the URL, the content of the URL (whether it's a webpage, or a binary file, such as an image or a zip file) will be printed out to screen. So better you either save it with -o option or download with the original with -O option.

Downloading and Uploading files over FTP using CURL and WGET

curl and wget can be used to download files using the FTP protocol:

wget --user=ftpuser --password='ftpuserpassword' ftp://mywebsite.com/testdoc.pdf
curl -u ftpuser:ftpuserpassword 'ftp://mywebsite.com/testdoc.pdf' -o testdoc.pdf

CURL vs WGET — few differences

The main differences are:

  • wgetcan download files recursively whereas curl can not.
  • wget is a CLI utility and no libraries associated with it whereas curl is part of feature rich library libcurl.
  • curl supports FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, POP3, IMAP, SMTP, RTMP and RTSP. wget supports HTTP, HTTPS and FTP.
  • curl offers upload and sending capabilities. wget only offers plain HTTP POST support.

That’s all!

Thanks for reading.

Hope you like the article. Please let me know your feedback in the response section.

--

--