Technical and Sociological Aspects of Developing Campus-Wide Webs: UIUC College of Engineering

Kaitlin Duck Sherwood


ABSTRACT: The University of Illinois College of Engineering Web is a comprehensive information resource. In addition to being a general reference, the freshman orientation class has used the College Web extensively. In this presentation, we discuss the College Web: its anthropological and technical history as well as novel features.

Interestingly, this project was not a result of high-level academic vision, but was in many ways a bottom-up, volunteer, emergent system. This could have meant a withering of enthusiasm, but instead translated into a revolutionary zeal.

Thanks to the microcomputer revolution, the information that the College maintains and distributes lives on disk drives throughout the university. The major challenge turned out to be not creating information, but locating the data and extracting it from its owners. The project's success is thus attributable more to personal networks and contacts than to technological brute force.

Frequently the data was available in raw ASCII and translated using scripts. We will show that it is possible and desirable to do what appears to be extremely complicated data extraction with relatively simple scripts.

The information that is available is extensive and comprehensive. Of particular note are:

Introduction

The College of Engineering Web at the University of Illinois is quite comprehensive and useful. It has seven of eleven departments fully on-line (with two more in progress), students' course reviews, Engineering Council and student societies, awards, the Engineering Placement Office, the International Minor program, a campus map, building directories, research labs, research papers, the Engineering Student Handbook, the engineering library, and the college student newsletter on-line. In addition, a great deal of information that started its life in the College Web has migrated into the University Web, including the courses catalog, timetable, and resume book.

It was and is a very large and complex project, and its evolution can perhaps be instructive to those working not only on campus Webs, but to those working on any large project. I will talk about both technical issues and sociological issues that I encountered constructing this large Web.

Technical Issues

I address technical issues not to show how daunting they are, but to show how manageable they are. Hyperlinks can be inserted automatically, very simple tools can aid HTML generation substantially, and maintenance and problems with platform dependencies can be minimized with a bit of foresight.

Document Conversion

I have wondered from time to time why the Web didn't burst onto the scene five, ten, fifteen, twenty, or even twenty-five years ago. Although one could argue that workstations would not have been powerful enough to comfortably run a Mosaic-like browser before around 1988, any of the text-based browsers could have been developed much earlier than that. We owe a debt to Tim Berners-Lee, Marc Andreessen, and their Valued Associates for seeing what was possible.

Having said that, I am glad that WWW was NOT developed fifteen years ago. All printed matter starts out its life on a disk drive now, but that was not true in 1979. People would have gotten pained looks on their faces if asked to put their information on the Web. They would have explained that they couldn't put their info on the Web because they didn't have the budget to hire someone to type it into the computer for them.

Couple that with the negative aesthetic value of ASCII viewed on a dumb terminal, and I suspect the WWW would have died stillborn for lack of content and interest. This could have killed all interest in it for possibly thirty years. "Yeah, it's a neat idea, but it has been tried. It didn't work."

Instead, we have in a situation where all the documents exist on a disk somewhere, just in the wrong format. The problem now is one of data translation, not one of data entry. While computers are lousy at reading documents, they are really good at repetitive translation!

For example, the UIUC course descriptions and timetable were already on disks. The timetable had been converted to a form that could be used by ph, and was publicly accessible:

% ph type=fall name=fr134
----------------------------------------
            name: fr134 accelerated intermediate french, ii.
            text: fall94
                : prerequisite: fr 133 or 106, or fr 103 with department approval,
                : or three semesters of college french, or a placement score
                : showing high school achievement equivalent to fr 103.
                : 4 hours.
                : 03810 lect-disc  c    10       mtu thf   325  greg hall
                : 03811 lect-disc  d    11       mtu thf   g30  for lang
                : 03812 lect-disc  e     1       mtu thf   g30  for lang

----------------------------------------
The courses catalog was straight text, grouped by department, and had entries like the following:

213. Aerodynamics, II. Equations of motion for a viscous, heat-conducting fluid; exact solutions of the Navier-Stokes' equations; boundary layer theory; inviscid approximations, vorticity, and circulation; potential flow; solutions of potential flow equations, sources, sinks, and Prandtl-Meyer flow; thin airfoil and slender body theory; and method of characteristics. Prerequisite: Aeronautical and Astronautical Engineering 212. 4 hours.

As long as you aren't impeded by the idea that "regularity" of databases implies one-word fields separated by tabs, you can see that you could HTMLify the course description pretty easily.

You can also make a table that recognizes that followed by [1-4][0-9][0-9] is a course. Then you have all the information you need to make a very nicely formatted and linked course description entry. Furthermore, if the timetable has regular URLs, you should know what the URLs for that course's timetable entry should be, so can create links to the timetable as well.

Tools

It took me less than forty hours to toss together Big Dumb Kludgy-Hack perl scripts to do this translation for the Engineering College course descriptions and timetable entries. I ended up passing them over to my Valued Associate Mike Grady at the Computing and Communications Services Office (CCSO), who extended it to work for the whole University.

In addition to writing scripts to convert the timetable and courses catalog, I used some of the standard converters to translate the Engineering Student Handbook and the Engineering International Minors Program from MS Word to HTML. After painfully putting in all the links to classes by hand in the Student Handbook, I wrote a script (based on my timetable experience) to anchor classes automatically. I have discovered that the more information there is on the Web, the longer it takes to write / convert documents because there are more links to make!

If you have an ASCII file, but think you don't have the time to HTMLify it by hand or to write a script to beautify it, try the following script:

Then if you put <pre></pre> around anything formatted (like tables), and toss in a header and title, presto! You'll have nice HTML. It won't have links, but the text will be justified regardless of the user's screen size.

Platform Dependence

MacMosaic, WinMosaic, and XMosaic have different behavior characteristics. Macs and PCs are quite slow compared to UNIX machines. This means that graphics can be really annoying unless you turn image loading off, and we can't guarantee that our users will be sophisticated enough to do so. Also, since Macs and PCs don't display ALT tags, users will completely lose any information that is embedded in a graphic. Hence the documents in the COE web are deliberately not snazzy.

The Macs and PCs have different aspect ratios than UNIX machines, and don't fit as much information on one screenful. Thus the "index" pages are kept as terse as possible.

The Macs cannot handle extremely large files. For example, my Valued Associate Mike Grady discovered that the Music and Math departments' timetable and courses catalog entries caused Macs to hang. So when he modified my scripts to work for the whole campus, he had to change it from one file per department to one file per course.

Standardization

It is extremely desirable to standardize the URLs. For example, because one person converted the course descriptions, they can be accessed in a very regular manner, e.g. On the other hand, I goofed by not conferring about faculty home pages with my colleagues in other departments early on. As a result, we have no regularity in home pages, which means that I can't write a script to automatically link faculty names to their home pages as easily as I wrote the "link to course description" script.

Hurdles

Data entry and data conversion have not been particularly thorny issues. On the other hand, getting the data has been difficult.

Bottom-Up

Sometimes my would-be sources were hesitant because they hadn't heard anything from their bosses about it. The Boss is supposed to provide leadership and direction and vision, and "if this Mosaic thing is so great, why isn't my boss telling me to find you instead of you finding me?" When you get high enough in the organizational tree, The Boss is around the same age as my mother. Mom tells me that when she was in college, she had to hand her punchcard deck in at the computer center and return a day later for her printout. By contrast, my Valued Associate Brad Whitmore doesn't remember typewriters. Who do you think is going to bump into WWW first, someone who is doing their thesis on computer control of robots, or someone who makes hardcopy of their email to read it?

The best tactic that I've found when people are waiting from The Word is to say, "You've heard of the Information Superhighway? This is it." (Some might quibble that WWW is not exactly equivalent to NIIS, but it is close enough, and the words make people happy.)

In many cases, my personal contacts lubricated the process enormously. I grew up in Champaign, IL. My parents were on the faculty, and my high school was riddled with the sons and daughters of faculty. I got my undergraduate degree at UIUC before going off to the West Coast for ten years. I frequently know The Boss personally. The Boss, knowing me, can feel comfortable that I am not going to take his or her ASCII and sell it to a direct-mail company, the Irish Republican Army, or Purdue.

Legal Concerns

Some information sources get concerned about legal issues. For example, "This document is really a legal document between the University and the student; if we put it on-line and there is a mistake in it, couldn't we get sued?"

Telling these people that you'll plaster ***UNOFFICIAL!!*** all over it helps, but frequently they need to go away and think about it and/or talk to The Boss.

Forgery Concerns

Sometimes people are concerned about forgeries. "What's to prevent someone else from generating a copy of this document that has incorrect or even malicious information in it?"

I tell these people that no, there is nothing they can do to prevent fraudulent information from being distributed. However, it is harder to get anyone to pay attention to the fraudulent information if the official information comes out first and/or if an officially owned document links to the real information. (I also point out that these documents could be forged on paper as well.) This seems to calm them down.

What's It Going To Cost?

Most people are convinced that anything even remotely associated with the word "multimedia" must be ridiculously expensive. When I tell them that the software is free, that the Computing and Communications Services Office can provide the server and disk space, and that I am not going to take any money from them, they look hard for the catch.

When confronted by this suspicion, I explain how easy it is to write HTML, and give examples of how long it took to convert specific large documents. I also allow that at some point, their organization might want to run their own server and that it might cost some thousand dollars. Knowing that there might be a cost someday tends to get them to stop hunting for the "catch".

Don't Get It

"Now, explain to me why you want to put this information on-line?" I had trouble understanding how anyone could fail to recognize the coming of a new era. I see the Web as the biggest discontinuity in the ease and cost of information distribution and acquisition since Gutenberg. The founding of the Web is the most exciting thing that has happened in my lifespan! And yet a fair number of people are just not interested.

What I was failing to pay attention to is the fact that I have spent the past ten years with a fast computer on my desk, logged in essentially continuously, and with a high-bandwidth connection to the Internet. When I need a dictionary, I am usually at a computer, and it is faster for me to look up the spelling on Mosaic than it is to hunt down a dictionary and look it up.

But if you are a freshman at your fraternity house, your patterns are different. There might be a dictionary on your shelf, while to access Mosaic you might need to walk to campus, climb two flights of stairs, and log in. Even if you have a computer on your desk, it might be a Mac SE that can't handle having Word and MacMosaic both open. If you examine the entire process from the desire to know the spelling to the time the spelling is known, in some cases it is LOTS faster to look it up on paper.

The incremental additional search time due to finding and opening a browser isn't going to change until everybody is on-line for a significant period of time for a significant fraction of the days. At that point, we'll be able to use the Web for all kinds of interesting groupware. I imagine a day when my calendar manager will not only tell me about seminars that I am interested in, but will also have a link to the campus map with the building and seminar room highlighted. However, like email, there is a critical mass that needs to form before this is possible.

To achieve this critical mass, we have to offer these disinterested parties information for which the total information search time is faster. This may mean providing information in a form which is presented in a manner that allows better organization (e.g. hyperlinking the courses catalog and timetable), difficult or slow to obtain (e.g. corporate SEC filings or the NSF Proposal Guide), or not possible to obtain (e.g. interactive maps of campus).

When people are just flat-out disinterested, I try to guess or extract from them what information they need to do their job, and work from there. I tell faculty about the NSF Proposal Guides and search for their research area in Yahoo. I show seniors all the job vacancy information and urge them to post their resume in the resume book. I show freshmen the home page of their favorite rock band.

Not Invented Here

There have been a few instances of "Not Invented Here", where people decline to give us data on the grounds that we can't do as good a job with it as they would.

I'm willing to take their word for it, and let them know that they can come to me for help if they need it.

Maintenance

A few clever people are concerned about maintenance. "It is all well and good for you to put this information out there, but what about when you graduate?" This is actually a sticky issue here, but not because there won't be someone around who is capable of maintaining it:

The problem isn't in the maintenance, it is in the accounts. CCSO has a very strong disapproval of multiple people using one account, for security reasons. They don't want to get into a situation where, for example, someone is sending threatening email to the president and they can't tell which user of the account it is.

But for many of these institutional information sources (e.g. Admissions or e.g. Department of General Engineering)), it doesn't make sense for only one person to have write permission on the files. The content should be attached to the organization and not the individual. I don't have an answer for this one, short of having each information source get their own hardware and set it up far from CCSO's prying eyes. I'm doing what I can to change the policy, I am a mere aphid on the lowest leaf of the CCSO organizational hierarchy.

Committees

Last semester, I was a maverick, riding at the edges of respectability, lassoing wild data out on the open range, and breaking it single-handedly. It was great. I was doing interesting and important things without anybody getting in my way. Heck, the Powers That Be didn't even know that there was a range to be riding!

Now I'm more like a farmer, dealing with tractor dealerships and grain silos. In both cases, food gets to the public, but in the first case there is a lot less negotiation that has to go on. Now that I am doing this as a job instead of a hobby, I find that I have to go get input from people, worry about maintenance issues, get input, and all kinds of other time-consuming things. I find myself asking permission instead of forgiveness, and that means that stuff doesn't get done as fast.

So if you are in a situation where you need a information put on the Web quickly, you might want to avoid giving your cowboys an official status.

Passing the Baton

Not only am I planning on graduating in May, I'm hoping to go to France from September 95 to May 96. Not only will I be abandoning ship, but I might not even have email to help out from afar. It is very important to me that the UIUC Web not fall apart when I leave.

Training

Training is a big concern of mine. Fortunately, my Valued Associate Brad Whitmore and I are friends with Prof. Pete DeLisle, who is in charge of the Freshman Orientation class. The three of decided that the freshmen needed to be oriented to anything, it was to Mosaic. They got an assignment to find five easy things on the Web. Furthermore, they were enticed by a cash prize to find thirteen hard items! If we did it right, there are now twelve hundred bright-eyed and bushy-tailed freshmen (and their TAs, who are active, influential undergraduates) who are hooked on the Web. In addition, my Valued Associate Brad Whitmore gave a seminar on WWW this summer to faculty and graduate students.

We've also given a large number of informal training sessions, basically to anyone who will listen. We've found that the smaller the groups, the higher chance that the people will "get it". When showing the Web to a small group, we can poll the people who don't seem to "get it" for something in their interest area. Then we show them not only the information, but how we went about finding the information. Note that it is important for the trainers to do a fair amount of surfing, so that they can quickly find web nodes for things that the audience is interested in. While I think that we should all mail a hundred dollars to Jerry Yang and David Filo and pray that they never get lives, Yahoo's Hotlist is an generalist's list. It is frequently useful to know other routes to information.

Given that the small demos take a lot of time, and that there are excellent and ubiquitous How To Use The Web documents out there, I have felt that it is more important to focus on developing content right now than to give lots of how-to seminars. This will change as people make the transition from being Web surfers to being Web spinners.

It should be noted that we have been aided immeasurably by our contact network. While it happens that I know The Bosses, my Valued Associate Brad Whitmore knows the power structure of the students, by virtue of his involvement with Engineering Council. This has meant that Brad has been able to influence enthusiastic, energetic society leaders to develop their own home pages.

Herding

I have already had a few people this semester sending me email saying, "Wow, cool! Can I help?" The last time I got one, I was feeling burdened by all the negotiation overhead I was having to do. "Great", I said to myself. "I can just see myself saddled with having to supervise a gazillion freshmen who have nothing but enthusiasm." *Groan*.

But then I realized that I could use a different paradigm. Instead of visualizing myself trying to harness and drive a team of wild cattle, I had a vision of me instead giving more subtle direction. I'd tell them what needed doing, give them help as needed, but pretty much let them go run on their own. Instead of being a wagon driver, I'd be a sheepdog. So I wrote up detailed suggestions for small, self-contained projects: "Put the football schedule on line. Review a restaurant. Make a home page for the Young Republicans or Gay Illini. Or put your class notes on-line." And now I'm just sitting back and watching them play.

Conclusion

While being technically proficient (especially being able to write scripts to translate files) is useful, the biggest challenges in creating useful pages in a large bureaucracy are nontechnical. How are you going to get your hands on the raw documents? How will you train the mass of potential users? How will you train your information providers? What type of information should you provide first? I hope that I have been able to give you a sense of the kinds of problems that you might face, and how to solve them.

BIOGRAPHY

Kaitlin Duck Sherwood received a BS from the University of Illinois at Urbana-Champaign in 1984. After three years in semiconductor manufacturing and three years programming, she spent four years consulting on timing analysis. Bored with that, she returned to UIUC hoping to fan an interest in genetic algorithms into a passion. Instead, she found her passion in the World-Wide Web.

In addition to her work on the UIUC Web, she has developed Web pages with information on tourism in France, tips for travelers, a travelogue of New Zealand, a discussion of the educational merits of shaving her head, advice for women in engineering, and a used car lot. She also was a significant contributor to the information system that Enterprise Integration Technology developed for the National Center for Manufacturing Sciences. She can be reached at ducky@netcom.com.