TL;DR: if you want to mirror a site where you need to authenticate, you need to do it in two steps.
In the process of testing my app, I needed a mirror of the website I'm using. Since the website has authentication (via a form), the mirroring process has 2 parts.
First part is to log in and get the cookies:
wget \
--keep-session-cookies \
--save-cookies cookies.txt \
--post-data "login[LOGIN]&password=[PASSWORD]&module=admission&controller=login&action=logindo&auth_act=1" \
https://europa.eu/epso/application/base/index.cfm
Now, the cookies are saved in the cookies.txt file. Note that you need to specify --keep-session-cookies
. Otherwise, the file will be empty.
The second part is to actually perform the mirror:
wget --load-cookies cookies.txt \
-r \
-l 2 \
-k \
-nc \
-R css,js,gif -R "*lang=*" -R "*srln=DE" -R "*srln=FR" \
-I /epso/application/account,/epso/application/cv_new \
-Deuropa.eu \
https://europa.eu/epso/application/cv_new/index.cfm
I'll explain each flag:
-r
recursive (d'oh!)-l 2
don't exaggerate with recursivity!-k
convert links to relative-nc
don't re-download things-R...
exclude links and files (e.g. don't download css files)-I...
restrict downloading only to some paths-D...
download only from this domain
The result is that I get a mirror with all relevant items to me.
Note: Please read the wget manual for limitations on form access.
References
HTH,
Member discussion: