Web Authors: Do this BEFORE moving pages off GALLUX

Hi,

Just another friendly tip from the cyber-cave:

Some of you may wish to do mass edits on your web pages. In particular, you'll want to change your old links to point to the new address that you'll have. It's easier to do this BEFORE you move your pages, rather than after.

There are two types of "links" (more properly known as "URL's"): absolute and relative. An absolute URL is one which includes the the name of the computer, the username, and any folder or directory information, as well as the actual filename. A relative URL is one that (usually) includes only the filename. The advantage of using relative URL's in your HTML files is that the files can be moved from one computer or user to another, with relative (pun intended) ease.

Here's an example:

Absolute URL: <A HREF="http://www.gallaudet.edu/~kjcole/Tutorial/perl.html">
Relative URL: <A HREF="perl.html">

If you have used absolute URL's in your HTML documents, then it's a good idea to change them to relative URL's, BEFORE you move them off of GALLUX. GALLUX has tools that are unavailable to the average Microsoft user. One of these tools is called perl. Perl will allow you to change all the files in a single directory, with one command.

You can change some of the text now, before ITS assigns space for you on the new web server. Then after you have the new information, you can finish the job. It's tricky, and it must be done on a real operating system, (i.e. NOT Microsoft Windows), before you move the pages.

CAUTION! The following assumes you've created your web pages correctly in the first place. Among other things, it assumes that all URL's in <A HREF> tags are surrounded by quotes.
<A HREF="http://www.gallaudet.edu/~username/something.html"> is correct.
<A HREF=http://www.gallaudet.edu/~username/something.html> (without the quotes) is incorrect.
DO NOT try the following if you haven't followed the standard HTML format!

Login to your GALLUX account, (via telnet, Reflection, QVT-term, TeraTerm, or something similar), and type:

bash
cd ~/web
perl -p -i.bak -e "s|href=\"http://www.gallaudet.edu/~
username/|href=\"|gi;" *.html
perl -p -i.bak -e "s|href=\"http://www.gallaudet.edu/~
username|href=\"index.html|gi;" *.html
perl -p -i.bak -e "s|src=\"http://www.gallaudet.edu/~
username/|src=\"|gi;" *.html
perl -p -i.bak -e "s|http://www.gallaudet.edu/~
username|new URL|gi;" *.html
exit

where username is the GALLUX username that you use now, and new URL is whatever ITS assigns to you.

Notes: These commands use the vertical bar character | and the backslash character \ above. (They're often located on the same key on your keyboard.) Also, note the use of the tilde ~, often located on a key above the <TAB> key. Each line after the first one begins with the word perl and ends with *.html. If your screen or printout does not show this, it is because the text is wrapping onto the next line. When you type these commands, you should press <ENTER> only after the *.html.

Explaination

The first perl command deletes "http://www.gallaudet.edu/~username/" from each of your <A HREF=...> tags. The second searches for <A HREF=...> tags that reference your Gallaudet account but do not include a filename, and changes them to "index.html". The third line changes <IMG SRC=...> tags. Finally, the fourth line changes (we hope) any non-tag references to your pages into the new absolute URL. That would include places in your documents were you tell the person reading it, how to reach your pages.

Note: The commands above only affect files within your current subdirectory (aka folder). If you have several subdirectories (folders within folders) then you will need to repeat the above commands after changing your working directory to that folder, using the cd command, e.g.

bash
cd ~/web
perl ...
perl ...
perl ...
...
cd ~/web/eng101
perl ...
perl ...
perl ...
...
cd ~/web/eng250
perl ...
perl ...
perl ...
...
exit

where eng101 and eng250 are examples of subdirectory/folder names that someone in the English Department might use for collections of web pages related to a particular class. The advangage of typing bash at the start and exit at the end is that bash allows the computer to remember the last several commands that you've typed. So, instead of retyping the perl commands again and again, and introducing new typos in your commands, you can use the arrow keys on your keyboard to repeat commands and edit them. (Using the up arrow once recalls the last command. Using it twice recalls the command second to last command, etc. Using the right and left arrow keys allow you to alter the command, by inserting or deleting characters where the cursor is located.) Just don't forget to exit when you're finished.

After you've completed the above, you can move all the *.html files to the new computer and test out the results. If satisfied, you can wipe out the backup copies by typing:

rm *.bak

If you're NOT satisfied with the results, you can rename the files back to their original names and try again, with the command:

mv filename.html.bak filename.html

(Unfortunately, I don't remember the way to do a mass rename, so you'll have to repeat the above mv command for each file that you want to recover.)

A more detailed explaination, for the curious

perl says to call the perl program. -p says that the program you are including should be applied to multiple files. -i tells perl to work "in-place" overwriting the original file. Adding .bak to the -i instructs perl to create a backup file named filename.html.bak before messing up your originals. The courageous and/or foolish (and I count myself among their number) can leave off the .bak and save some disk space. -e means "execute the following string (enclosed in quotes) as a set of instructions to perl".

The instruction string "s|old text|new text|options;" substitutes new text for old text. The vertical bar (|) is known in the biz as a delimiter and tells the substitute command where the start and end of the old text and the new text are. (If the string you wish to substitute actually contains a vertical bar, you can use another character as a delimiter. Just remember to use it three times: at the beginning of the old text, between the old and new text, and at the end of the new text. For example you could use the letter a as a delimiter, if it doesn't appear anywhere in your old or new text. Like so:
"saold textanew textaoptions;".

The g option tells perl to perform the substitution globally, meaning everywhere within a file. The i option says to ignore case, meaning that it will replace <A HREF="http://www.gallaudet.edu..."> or <a href="http://www.gallaudet.edu..."> or <A hReF="http://www.gallaudet.edu..."> or <A HREF="HTTP://WWW.Gallaudet.edu..."> or ... you get the picture.

Several times you see \" above. The \ is an escape character that instructs perl to use the next character as part of the command. If you simply used the quote character alone, without the \, perl would assume that it indicated the end of the command. (Notice that the entire substitute command is surrounded by quotes. These quotes determine the start and end of the command.)

The *.html at the end of each line tells perl to perform the operation on all files ending with .html in the current directory.

Perl can do considerably more complex substitutions, as well as a lot of other neat stuff, but that's beyond the scope of this document...


Back to the Tutorials Table of Contents