|
Overview
Wget is a network utility to retrieve files from the Web using http and
ftp, the two most widely used Internet protocols . It works
non-interactively, so it will work in the background, after having logged
off. The program supports recursive retrieval of web-authoring pages as
well as ftp sites. You can use wget to make
mirrors of archives and home pages or to travel the Web like a WWW robot.
Examples
The examples are classified into three sections,
because of clarity. The first section is a tutorial for beginners. The
second section explains some of the more complex program features. The
third section contains advice for mirror administrators, as well as even
more complex features (that some would call perverted).
wget http://foo.bar.com/
But what will happen if the connection is slow, and the
file is lengthy? The connection will probably fail before the whole file
is retrieved, more than once. In this case, Wget will try getting the file
until it either gets the whole of it, or exceeds the default number of
retries (this being 20). It is easy to change the number of tries to 45,
to insure that the whole file will arrive safely:
wget --tries=45 http://foo.bar.com/jpg/flyweb.jpg
wget -t 45 -o log http://foo.bar.com/jpg/flyweb.jpg
&
The ampersand at the end of the line makes sure that
Wget works in the background. To unlimit the number of retries, use ' -t
inf '.
wget ftp://foo.bar.com/welcome.msg
ftp://foo.download.com/welcome.msg
=> 'welcome.msg'
Connecting to foo.download.com:21... connected!
Logging in as anonymous ... Logged in!
==> TYPE I ... done. ==> CWD not needed.
==> PORT ... done. ==> RETR welcome.msg ... done.
wget -q --tries=45 -r \
http://download-east.oracle.com/otndoc/oracle9i/901_doc
wget -i file
If you specify ' - ' as file name, the URLs will be
read from standard input.
wget -r -t1 http://foo.bar.com/ -o gnulog
wget -r -l1 http://www.yahoo.com/
wget -S http://www.lycos.com/
wget -r -l1 --no-parent -A.gif http://host/dir/
It is a bit of a kludge, but it works perfectly. '
-r -l1 ' means to retrieve recursively,
with maximum depth of 1. ' --no-parent '
means that references to the parent directory are ignored, and '
-A.gif ' means to download only the GIF
files. ' -A " *.gif " ' would have worked
too.
wget -nc -r http://foo.bar.com/
wget ftp://name:password@foo.bar.com/myfile
If you wish Wget to keep a mirror of a page (or FTP
subdirectories), use ' --mirror ', which is the shorthand for ' -r -N '.
You can put Wget in the crontab file asking it to recheck a site each
Sunday:
0 0 * * 0 wget --mirror ftp://x.y.z/pub -o /var/weeklog
wget --mirror -A.html http://www.w3.org/
You find the sources of wget with all the documentation
under the following links
http://www.gnu.org/software/wget/wget.html
http://www.lns.cornell.edu/public/COMP/info/wget/wget_toc.html
http://www.interlog.com/~tcharron/wgetwin.html
|