HomeAboutProjectsProducts & ServicesPublicationsSupport
RLG Logo
  2002 annual meeting
 
 
· Summary report
 
 
· Speakers
 
 
· Aiken
 
 
· Bloom
 
 
· Dekker
 
 
· Erway
 
 
· Foster
 
 
· Michalko
 
 


Click for printable version of this pagePrintable Version

The Evolution of the Internet into Cyberspace

Robert J. Aiken, Cisco Systems

Any time I speak I have to give my stock disclaimer so that nobody sues Cisco over what I propose or say in my talk: all views and opinions expressed in this talk are mine alone and should not be construed in any way as Cisco opinion or policy. These are my views, carved out over 20-plus years of experience with a whole set of characters. As we sail those oceans and chart those maps that Eric Ketelaar spoke of, I am not only going to tell you about some of the dragons we found in the oceans before but the sea serpents we are going to see in the future as well.

If you don't worry, you don't get action...

To begin with, I think it is important to give you a bit of background so that you know what my bias is. I'll give out a few assertions, a few thoughts, a broader context of the background as I go through this discussion. I will start getting into some of the evolution of the Internet, because I have been with it for a long time—not as long as some of the grandfathers; I am like the second generation. But I do have a long perspective that I want to share.

Then I will talk a bit about future technologies. We will touch briefly on the Grid, but Ian Foster is going to go into greater detail on that. I am basically going to jump ship into a completely different area and talk about a selection of technologies that I have been tracking. I serve on the Transportation Research Board at the National Academy of Sciences in the US for what we call telepresence—which is where I think the future is going.

Then I am going to tell you about all these horrible things that can actually derail this whole train and end up in a wreck, and also make things very interesting for you as archivists. My talk is meant to make you think, to make you question, and to make you worry. If you don't worry, you don't tend to get action out of people.

These are all extremely big chaos agents...

Background

I started with supercomputers, so I have always been dealing with the high end of computing and with networking people who tried to figure out how to use monster computers that used to take up the size of this room to do what in our terms today would be simple calculations. That computer is represented in my laptop nowadays. In 20 years we have gone from the huge room to being able to process the same information better on my desktop, with about 100 times more storage. That is going to change the way we think about how we acquire and generate data and how we analyze it and create information and knowledge from it.

Back in the early days of the Internet I also helped design and run the Department of Energy's Energy Sciences Network (ESnet), which along with the NSFNET and NASA's NSI network formed the core of the Internet in the US. It was an international network which connected with Japan, Germany, the UK, Amsterdam, Singapore, all over. I worked with the Department of Energy, where I supported the High-Performance Computing and Communications initiative. I then went to the National Science Foundation in 1991 and helped coauthor the network design for the vBNS and NAPs, which basically commercialized the Internet back in 1992. We came up with the architecture and design. It took four to five years to fully happen, but the actual constructs and prototypes were put into place in that time period.

In 1998 I went to Cisco and basically formalized their university research program for networking research. My group also works with leading National Research Networks. I worked with a good friend of mine here in The Netherlands to partner with one of the top national research networks in Europe (SURFNET).

When my daughters were much younger, I didn't want to travel a lot. The only way I could communicate without actually going on the road was to work from my home as much as possible and use technology to enable my communications, so I have been a full-blown telecommuter for the past 12 years. Today I can basically live with one carryon bag and one computer bag for two weeks on the road. Things are progressing very nicely in this area.

This trend means that how we evolve and how we interact with folks and how we take information is going to change drastically. I have been doing e-mail for 20-plus years. What would it be like if I kept all that e-mail? A lot of the messages were from proprietary systems back then, so I had no way of keeping them. As a matter of fact, Cisco is howling at me for keeping my e-mail back four years now. I have to have it some place, either on CDs or on my laptop, because I actually go back and use it once in a while. The scary part is that I found out that I still have some e-mail that I brought from National Science Foundation in 1992.

E-mail could become one of our major archives in the future, and when messages are being deleted off the disks we are losing all this history. (As I go through my basement every once in a while, I try to get rid of all the old foils and transparencies that we used for slides. The history of the Internet generation was done on mylar. It was not presented using a computer like today.)

In discussing the technology evolution and looking at some of the advances, this may be relevant specifically to you as a set of folks who really think about how you categorize, collect, and store information and knowledge. It is very important. I am very concerned about the way things are going—from my own personal standpoint I don't think things are going well with respect to our capturing this potentially important information.

While developing the NSFNET back in the early 1990s into the commercial Internet that we know today, we talked to a lot of people about content—communication companies, etc. What we discussed back then was, Are we going to end up with the next-generation TV where we have a bunch of garbage on the media? Now, instead of a TV with 500 channels on cable, we are going to have 5 million channels on your DSL line—and you are still not going to have anything worth looking at. Today, some alarming things are happening with Web sites and the way people are putting stuff up on the network that are leading us to this.

Chaos theory, I think, is extremely important to the evolution of a lot of things. I bring this up because the ways we are going to look at what we want to categorize or track for information, I think, are going to change over the future as well. I think this is actually going to be good. It is going to make us rethink the basis of what we consider to be information. The philosopher Hegel's general notion was that changes are essential for progress. We can all argue whether progress is good or bad in this case, but it will happen.

The speed of technology evolution is very rapidly changing the way we work. I know that from a personal standpoint. I was exchanging messages with my daughter (now a college student) at midnight last night in the same house—and we think nothing of doing this all the time. We almost follow the time-space continuum in that we no longer think about my being in Japan or here or there. We just communicate. When we are online, we no longer have to think about having to be together physically to communicate.

Moore's law is basically the doubling of capacity and power every 18 months at a minimum, perhaps 24 months at a maximum. It applies to the computing and storage cycle. Every year and a half, the processing power and storage I have on this laptop doubles . That means that I no longer think anything of walking around with a laptop with 50 gigabytes of data on it.

Increasing speed allows me to process more. I can do speech synthesis and analysis. I can do all kinds of concurrent multimedia communications. It is going to change the way I actually do things.If you want to record things in the future about what people do, how are you going to capture all this electronic stuff, which is very ethereal to begin with and has multiple modes going at the same time?

The other law we have is Metcalf's. Metcalf is the guy who actually designed the Ethernet technology which everybody uses today. His premise was that the value of the network increases with the number of things on the network. If you were the first person to have a fax, it wasn't real exciting. Once everybody had fax machines, faxing became ubiquitous. Now everybody has e-mail.

Evolution of Networks and Infrastructure

First we had the evolution of research networks. We had the ARPAnet in the 1960s. Then we had the research networks, which was the second generation in the Internet world, NSFNET, vBNS/Naps, DOE, SURFNET, etc. Then we had the IPOs, or "initial public offerings," which are funded by venture capital and make stocks available for us to buy.

In reality the evolution of research networks depended on the evolution from the mainframes. You had this system of networks and computing both evolving at the same time, and there was a certain synthesis between the two, which is fairly important. The original NSFNET was primarily focused on providing broad access to open supercomputing facilities. Then we had the NSFNET II, the networking architecture that I coauthored with Hans-Werner Braun and Peter Ford in the early 1990s. During that time microcomputers were popping up everywhere. In fact, we had some of the first laptops.

Then we got into gigabit testbeds. This is important because we started talking about using gigabits of information in the early 1990s. Everybody looked at what we were doing and said, "Who would ever use gigabits? That's absurd. Fifty-six kilobits is more than you would ever need for the rest of your life." Now we are actually funding terabit networks, we are talking about petabit networks, and we are going on to exabit networks.

The speed with which we are going to be pushing information, transmitting it, and storing it is going to go up exponentially and continue in that fashion.

The Web took off in 1994. The interesting thing about the Web—everybody thinks that 1994 was the real launch, but well before that, in 1990, you had some people at a physics facility who wanted to get some work done and they sort of changed the use of hyperlinks to hypertext so that they could actually link documents. That was very important, but it needed a nicer user interface. Prior to this, as part of the vBNS/Naps we commercialized the interconnection points of all the networks on a global basis—the points of presence (POPs) where you have commercial and R&E networks connecting. Without this and without that hypertext, the Web never would have taken off. Everybody thinks we just miraculously woke up one morning and you had all these people with PCs and on the Web.

The Web took off and it actually did a really good thing by focusing on what is considered to be a standard graphical user interface. Just like any sword, this has two edges. A lot of times, the major problem is that everything gets funneled through the Web now; but at least it gave us something.

Then there is the I-WAY. Ian was the main motivator to pull this together back in the 1990s. We tried to figure out how to pull some unique applications together on very high-speed networks just to demonstrate them. That was the great thing. The bad thing was that it only lasted for about two weeks, and then we tore all the infrastructure down. We said, "We can do it," and then we moved on.

Internet2: There is a lot of stuff in the Internet2 in the US and a lot of networks worldwide. Basically, they are rebuilding the NSFNET of the late 80s. They like to build production networks. It keeps them busy and satisfied, which is good. The most important part is that it acts as a catalyst for the higher education community to pull together and share best practices.

In the meantime the grids and a lot of the stuff that Ian is doing are the next major evolution. A few of them are going on from where we started with the Internet, but the grids are the next level up. It is not about rebuilding the Internet2 or the network. We need to move up the "food chain" of protocols for the next evolution of the Internet.

Back to my remark about chaos. As things change, there will be stuff we are not going to be able to plan for. There are all these types of technologies coming in and we are not sure how they are going to affect us. E-Presence is "electronic presence," being on all the time, and we are not far from it. We have PDAs, laptops. At home I have three or four devices on all the time where people can reach me. Ubiquitous computing—you have all the stuff in your cars, in your home.

Then we have the nanotechnologies. Nanotechnologies use all these little processors. In fact, some are working on nanodust; we have these things floating in the air right now, and they can actually transmit information. They could be aggregated in this room and then shipped off some place else. The amount of information that is going to be able to be gleaned very quickly and in a short time frame is going to be impressive.

These are all extremely big chaos agents, and I have no idea how any of this is really going to come out.(One of the nice things about leading off with a thought like this is that I can ask more questions than I get to answer—again, trying to scare you because I don't know what the answers are.)

How to track and document this stuff will be very challenging...

The Grid

The movie The Matrix—or, even better, the book Neuromancer by William Gibson—gives you an idea about the multilayered, multidimensional type of matrix of information and technology that people are going to interface with. That is the way I see us going.

We have this network grid, and we will call that the Internet: the communication lines, the routers, and all that other good stuff. That is one plane.

Then you have the middleware, which DARPA (and soon thereafter DOE) started in the later 1990s. NSF has a middleware program in 2001. Middleware is the stuff that sits on top of the network, and then you have the applications, and then you have these research grids, which are sort of like the applications but set off so that they can crash and burn. Then we have security grids, which are sort of different ways of being able to exchange information, to authenticate and control access. All this stuff put together, to me, is going to be the Grid. It is going to be all these planes laid on top of each other, with all of the applications and middleware drilling down to get different affinity groups using different parts and combinations. How to track and document this stuff will be very challenging, because you are going to get different chunks of different kinds of software and hardware.

I would contend today that the majority of work on the Grid is still focused on distributed computing, the Holy Grail for those of us who did high-end computing in the 1980s.

The I-WAY I already mentioned. Globus software, which Ian is responsible for, made a big impact. In about 1997 Ian and company pulled together, wrote a book, and did the first Grid workshop. We had global Grid conferences starting in about 2000.

Legion was one of the other R&D-type of grids that is now commercialized. IBM is also doing Grids now—it is not just a research tool any more. A long time ago, it was just the researchers using the Internet; then we commercialized it. We are starting to move into the same phase with the Grid. In fact, we will probably get to it faster than we did with the Internet because people are tracking this now.

My point is that the next evolution is going to be this peering across the planes and down through the planes, which is going to make things very complicated. When this happens, what we think of as the Grid today is going to change drastically one more time as it becomes more n-dimensional. Of course, if we want this to become a success, then the easy thing is to pool all our organizations together. All of you in RLG come from different organizations; you all have different security and different types of technology. When you pool yourselves together, you'll be a virtual organization.

Grid Applications

Some of the applications of the Grid are data transfer, teragrids, metacomputing. The neat part is that by using some of this stuff you can get remote control experiments: people can use a one-of-a-kind, very expensive microscope or telescope as if they are there in the room with it. This is already happening. As these grids evolve, they are going to make it much easier, and you won't have to be part of the blessed society that gets to walk into the special room any more.

One application is 3D virtual reality, or "CAVES." If you have ever watched Star Trek, they are basically 10-foot-cubed rooms with projections on the ceilings, the floor, the walls around you, and it is like being in there. There are 3D walls which could become part of the Grid. If you put a bunch of them together, it is almost like all being in the same room at the same time. It changes the way we do things. You can actually do the modeling now to do this.

Another Grid application, real-time instrumentation, is going on in California, where there are a lot of earthquakes. They have thousands and thousands of sensor-based processors distributed through the state now that are picking up the tremors and feeding back and tracking information about tremors and the shift in the plates, etc. There are a lot of different ways of using the Grid.

The collaborative environment is where it really gets neat [and] hardest to document...

I think one of the things we are going to see in the future is the knowledge grid. Everything I have pointed out so far is more like application-specific grids, but I think we are going to see knowledge and data and information grids that will somehow have to be matched together. Think of it as the big-library-in-the-sky kind of thing. You have certain things that people can access or not—and it will be who makes that determination, especially as we go across national boundaries, that will really be challenging.

The mobile wireless is going to be yet another extension to the Grid. You already have people who walk around with laptops, the little PDAs, the handhelds, the BlackBerries™, etc. We also have this explosion of wireless capability for our home and in offices. Wireless is going to become the normal mode pretty soon. I am going to be on a lot more than I am off. I may control when you know that I am on, but I am going to be on.

The collaborative environment is where it really gets neat. In fact, I would contend that the collaborative environments are going to be the real challenge, because that is going to be the hardest thing to document, especially if you want to record something. For example, when I am in my office at my home, I have my laptop jacked into the network, and I usually have a video running; I am doing my e-mail; at the same time I am doing instant messaging, and I have a phone going at the same time. This is all in the same conference. I use instant messaging to a certain set of folks and do out-of-band messaging with them that the rest of the parties are not privy to. I may tell one of my people to ask this question or they may ask me for verification of information. I can actually be communicating this way while I am talking to you here, and you have no idea that I am doing this.

If you really want to capture information, if four or five different channels of communication are going on, are you sure you are getting them all? Or are you just getting one view—and is that the accurate view? All this stuff coordinated together, the total picture pooled together, is going to give you the accurate context of what is going on and what was implied. If you don't get it all, then it is going to be hard to actually record for reality what has happened. Think of the implications for security or monitoring.

IP may be seen as middleware to a networking person:

Then you have middleware from a Grid perspective—as Ian Foster sees it:

You will notice that he is more applications oriented, so he thinks of this network as being there with all those capabilities, not layered like I did. All the middleware information supports what we call the applications at the top. They are sort of aggregations of the capabilities I showed floating on top of the network. Each one of the applications in Ian's diagram may see different stacks or columns going through all those network structures.

If I use different parts here, some things I may not be able to capture and some things I might. Some things I might be able to track; some I might not.

Jon Postel basically controlled all the protocol standards in the IETF (Internet Engineering Task Force) process until he passed away about three years ago. While the OSI (Open Systems Interconnection) model was layers one through seven, Jon said that layers eight and nine were the most important (that is, money and politics) and this is correct. A lot of the things that scare me in the future are not necessarily the technology but who has the money and who gets to make the decisions of the future about what is going to be used or not used and how you can use this technology.

Technology Advances

 Increased local storage capacity. I am looking now at putting an 80-gig drive on my home system for the family so that I can start converting all my digital pictures plus home videos and everything else and just store it in my own archives instead of having it sit around on tape. In three years, when HP or Dell comes out with a PC where all you have to do is hit a button, you are going to start as well. Having to figure out how to do this is what slows people down.

It's getting easier to move things. If it is easy for me to move things, then it is easy for me to process and generate information all the time; whereas, before, I had to be at one place to do it. Once I had to go to the supercomputing area and load card decks; the knowledge was in the card decks that you would read in or on tape, and then you had printouts. Now all kinds of information are being generated onto disks. You have flash memories that you can stick into the PCMCIA slots on the sides of your laptops; you have CD burners; you have DVD burners. You have all this stuff on laptops today and on PCs.

Images. You can buy a cheap video camera for $100 now and attach it to your machine and actually transmit video back and forth. It is easy. How much information are we going to lose or record when we start sending all these pictures back and forth?

As the processing keeps going up and our ability to generate images and capture images goes up, the amount of data that will be going across the network will increase. As we keep racing forward with how much we can generate, what do we decide to keep? We can't keep it all. Even if you wanted to keep it all, as all you people in cataloging know, how do you identify it?

I have had a digital camera about one and a half years and I already have 1,000-and-some pictures. I just stick them in this big directory because I don't want to go and label all those things. A lot of information is going to be lost in this way.

Applications are basically popping up everywhere. It is easier to move from one machine to another. The hardware and software are a lot easier to install.

Networking. Basically anybody can do a network today. You can go down to a store and buy a nice router for $200. If you have a DSL line or ICN line, stick it in and you are up and going on the network.

 Advances in software infrastructure. When I dial in from any place on the road, I use a special software capability called a VPN, "virtual private network," that has security. It encrypts the information from my machine when I dial into the system. I can restrict who gets to look at my information, even if I am going across the open Internet. I don't have to have my own dedicated line any more. The easier it is to exchange information and feel comfortable, the more everybody is going to do it.

Instant messaging. We have used this in conferences to control remote presentations. I have called Singapore from where I live in the States and used instant messaging with someone I know there to let him know when I needed to move on with my slides, or to find out if there was a question or if people were looking tired because they hadn't had enough coffee, or anything else like that. He could also give me a little smile when I made a joke, so I could pause. If you keep only my slides from that presentation, without the other stuff, you lose a lot—plus the audio that we were doing at the same time.

All the data is converging onto IP and is going to be funneled together. This is going to make it very challenging. Whether you want to be a government trying to eavesdrop on somebody or whether you are really looking for information, it will be hard to pick out the pieces that are relevant to the different parts, because it all gets funneled into one big highway and split out later depending upon the designation at the end.

Unified messaging. My phone has a California phone number but it is on my desk in the eastern US. The number shouldn't matter. You call the number and you get the phone. When I travel, I forward calls to my cell phone and then on to different phones, so I control where the call ends up. Now, if you use the geographic association of an area code or a country code to draw a conclusion or make a decision about some information, you may be making a mistake.

Future Technologies to Watch

Mobiles. In Japan everybody just punches information back and forth on little phones. They spend a lot of time on the trains commuting into Tokyo, for example, and they use a lot of SMS, instant messaging, and I-mode.

Mobile location services are being marketed to teenagers in Europe, Asia, and in some of the larger cities in the US. Basically, it's a way to find their friends just by using GPS. I haven't figured out why they don't just call and say, "Where are you?" But apparently they all like it.

Real Persistent Presence. This is the big one to watch. We will have these things called agents. If you want to communicate with me, my agent may give you information about how to locate me or leave a message or, if it knows you, it could give you a file off my laptop. These agents will be on all the time.

When I'm on the network in the future, it will not necessarily mean that I am sitting there typing. Multiple agents may represent me, doing different things. I could be searching the Internet. I could be giving you an interface to my "presence agents." I could also be sending information to my administrative assistant. All kinds of things could be going on at the same time. You are not going to know when I am really on. I will always be on in some manner, or some inference of me will be.

    Images. The digital camera is just incredible. It is nice and small and you can make an instant decision about what to keep. It lets me generate all kinds of data now that would never have dawned on me even a year ago—and now I say, "If I can do still, I can do moving, too. Why not?"

    Speech synthesis and recognition, and unified messaging. I will get from my one phone number voice mail converted to a Wave file and e-mailed it to me. I will listen to voice mail attached to an e-mail message.

    Artificial intelligence. If you have been in computer science long enough, you know that about every eight years we say artificial intelligence is going to take off again. It is good to have it as a recurring theme. Actually, expert systems have done much better as time has gone on, and you will see a lot more of them. Artificial intelligence will help us analyze our data and even generate it.

    Ubiquitous computing. Along with GPs, we are now putting very powerful processors into trucks and automobiles to log you into the Internet as you drive. You will be able to download and upload through "smart stations" while you are driving.

    Another aspect of ubiquitous computing is called "body area network": communicating through a little router on your belt. You may see more use of this stuff for medical purposes and other things.

    In my mind's eye I still see this geek from MIT who used to have these computers on his body all over the place. I kept thinking how great that was for scaring everybody away from "personal area networks." Three weeks ago, this advertisement comes out in The Washington Post. We all might be wearing these things real soon.

    An interesting part of ubiquitous computing is what they call the "peering/interaction"—the interaction of the personal area network—which may be not so much my body as what is in my laptop and maybe in my bag—and the desktop area network.

      Nanotechnology. If you think surveillance with a camera is scary, it pales in comparison with what we have at our front door with this. They have figured out how to do it so that you don't even know it is there, and you probably won't even see it unless you are really paying attention.

      Virtual boardrooms. There is some really neat virtual reality now that lets people at four or five different geographical locations appear to be sitting around one table. In business, when you first meet people you want to see their body language and facial expressions. You can get it fairly well with this stuff and it doesn't take a big network anymore. But, are you going to record this kind of stuff for posterity's sake? I don't know.

      Future Trends to Watch

      With simulation and modeling we are getting into an area that is scary. If you do simulation modeling of a very complex situation, the problem is how you verify. We can get close statistically, but the question is how much you trust it.

      With all the technology I have been pointing out here, all you have to do is think of the data it can generate—and it is generating it. For every one person generating this data now, in one year you will have 20 more. Just keep doing the numbers and figure out when we are going to get overload.

      I don't mark my own pictures, let alone read half the mail I get. It's just too much for me to handle. All the messaging and expert systems are going to allow me to control the data just for me, which will affect what I save and what will be available later on.

      Knowledge is going to change. In fact, I already quit trying to remember things because it is just information overload. In all the networking and operations I have to do, I do not need to remember everything in great detail. I have to remember overall general principles, of course, but I don't remember the specific nitty-gritty. I can't remember how to configure one of my own routers any more, but I know exactly where to go to find out. Knowledge is going to become how one accesses the information and data, not the data itself. Where do I go for this information? How do I get it?

      The Grid is going to change, too, incorporating mobile and wireless access. And I have already told you about how I use concurrent multimodal communication. Virtual folding of the time-space continuum is going to mean that I no longer feel constrained by distance or time. I don't have to fly to Singapore now to give a talk. I can exchange video with someone at the last second instead of having to be there.

      New Applications

      One interesting thing about the Web is that it is sort of a black hole of information. We only have to go through this one porthole and then pop out again on the other side—that is, if you have somebody who knows how to do the folding of the space appropriately. In fact, in New York some of the financial people are now bypassing the normal venues to exchange information. They are basically working directly with each other in a peer-to-peer mode. As we get more and more of the peer-to-peer applications, no longer will everybody go through some common portal areas where it is easy to watch or duplicate the information that is collected.

      Gaming is really interesting. I am just trying to figure out how this is going to play out. MUDs, which is not a kinky game, means "multiuser domains"—people in a bunch of different areas. Like PlayStation 2, it means gaming on the Internet. Entertainment is going to drive change. The question is how to adapt that for knowledge and intellectual pursuits?

      Virtual organizations and geographically distributed collaborations. One reason I got into networking was to figure out how to get the physicists in California talking to the physicists in New York and the physicists in Germany and in Japan and Italy and wherever. With middleware evolving, pulling together a virtual organization on demand is going to take seconds—we're not likely to need this army of rocket scientists and engineers to fly over to set it up for you.

      We will probably belong to more organizations and collaborations and, counter-intuitively, we will travel more. As we communicate more quickly, we'll expand the number of people we communicate and do business with. Even if only a few of these contacts results in a trip, we end up traveling more. However, there is a big increase in productivity. We can also become involved in more different groups of folks doing different things than we did before.

      Where does the real data lie?...

      Survey databases. I am still trying to figure out how these are going to evolve. We are now talking about things like the teragrid in the US and the data grid in Europe—huge databases of information, too big to ship all at one time. Instead you put pieces of them some places and replicate them in other places. It is almost impossible to take a snapshot of the database at any one point because anything can change in each of those different parts of the database. What does that mean to us for the future? I don't know. It is very daunting.

      Telecommunications networking companies see their profits going down so they want to add services for the end user, be it personal or business use or whatever. They want to be able to provide storage—and there is this thing called caching. Most of you are familiar with the Web and you know about Web caching. What you probably don't understand is how many times it has probably been cached through the network and in how many different instances, and there is probably no coherency between all those images. Depending on whom you go to, one person may be getting a different batch of information than the other person. That is only going to get worse because we are not only going to have the caching which is temporal—because they actually put time-to-live timers on the actual records and on the data in there—but now start thinking about my providing you storage.

      That storage may be three months or six months. The question is: If I have remote storage and I have a local copy, all database-type technology, how we handle it across domains from an administrative standpoint to make it easy? Because this is going to cost money. Every time I go to update my storage, it is going to cost me money. I am not going to go out and update too frequently because that server provider is going to charge me.

      Where does the real data lie? Probably locally. What happens if time lags too far before I do the updates? These are all going to be challenges. The biggest challenge will be social-political.

      If the hurdles are high, people who generate stuff are just going to delete it...

      Bumps in the Yellow Brick Road

      Again, some of what I am telling you may be good and some may be bad.

      Everybody says bandwidth is free. In fact, most national networks today have more bandwidth than can be used. However, we have not yet connected using video, for example, and we don't have the grids yet that will make it easier to move huge amounts of data. That glut could be gone in a second.

      One of the reasons the Web and the Internet have done so well is that we all got to one level. The Grid has a whole process based on identifying what standard people should use so that they can interconnect. You have to have standards, just like languages or power outlets. The problem is that this makes it easy to generate or create a lot of new types of things. You say, "All I have to do is make sure it interacts with that. If it interacts with that, it works." However this may not be the most efficient and most flexible approach. A lot of people would like to have this customizable middleware. The problem is that if you have middleware that changes and not everybody uses the same standard, you can't be sure of the state of any type of system at any one point, so you don't know how your data is going to be treated.

      The digital library. I find it interesting from a computer standpoint that everybody has ideas about standards and how to codify data and information and how to handle it. But in the libraries we haven't solved the fact that you have different intellectual domains trying to deal with a lot of the same issues, and for different reasons. For instance, the data and information the physics community puts online is not the kind of stuff you would normally catalog or make available to the general public. However, there is still the whole question of the granularity you use to categorize huge amounts of data and how you replicate it. How are you going to store it and how are you going to make sure that you can read it next year?

      We have a lot of questions in common, but the communities don't seem to be speaking.

      Ubiquitous computing. If I am always on and the government is always on, then it is going to be interesting to see how much information they keep on me. How do I know, from a citizen's standpoint, that they are keeping the right stuff? If you don't have the whole picture, you may not have the right context. If you have ever been quoted in a paper, you understand this very well.

      Then there is how one finds resources. A really bothersome trend is that a lot of Web sites are selling what they call the top-shelf space. In my small town there is one church, a hardware store for the farmers, four homes, and a post office. I plugged in my street address to a Yahoo map site and it came back with, "Click here to find out what they are saying about Sabillasville." Now, nobody even knows where it is; how can they say anything about it? But somebody paid for that. They are redirecting you to sites based on how much money that site pays, so it is not necessarily the best match of information anymore. How do we evolve around that? If it continues, people are going to quit using it altogether.

      Another problem is the transparency in network connectivity. It should be my decision whether people outside my gateway know my home network. We have firewalls or NATs—network address translators; we figured out ways of using one address and having dummy addresses behind it. IPv6 may solve this, but everybody still likes to have their firewalls. This means that what you see may not be what you see because the addresses change or the addresses you think you are dealing with are not real. You may not be looking into the site you think you are—I can redirect you through a different IP address. You end up on a completely different server, maybe in a completely different part of the country. All these aspects of transparency are going to affect the way we move data and try to track it.

      Multiple cell-phone standards. I have a cell phone that doesn't work in Europe, so I have to rent one there. When I go to Japan, I have to get a different one. When I travel, I carry five times the weight of my laptop in adapters for the phone, the power, and everything else. Maybe we could get a Euro network so I only have to carry one adapter, one for every continent.

      Then there is the contention for and congestion of wireless frequencies. I just told you all these great things about having wireless and how we can be on all the time, but one of the problems is that, if everybody is going to be on all the time, it is going to be crowded. It is basically a radio frequency; you can't just add more wires. With congestion, the amount of information getting through will slow to a trickle, which will essentially be useless.

      Yet another problem is software monopolies—and I won't mention any names. That goes for hardware or anything else.

      Another problem is Internet Service Providers and telecom consolidation. With the big dot-com bubble bursting, what we used to call the greenfields ISPs and telcos have basically gone out of business. The competition they generated forced the other ones to implement and make available new technologies. With no competition, there is no incentive. We may be stuck with the old networks forever until something pushes them along. Regulation could deal with that.

      The entertainment industry influence has been troubling me for a while. I believe in copyright and I believe in patents. On the other hand, when I buy a CD of music and play it at home, I want to be able to put it on my laptop so that I can play it on the plane when I travel. I don't believe I should have to pay twice for music that I already have. The entertainment industry wants to prevent my laptop from ever recording anything again. All the DVD and CD-ROM drives would be unable to make copies. But what happens if I have family videos or pictures, or some papers I have written, on a CD and now I can't copy them? I understand the industry's need to protect itself, but the draconian ways they are attempting to address this issue are a bit outrageous.

      Then there is the new media. I have some 78-rpm phonograph records at home. You can't get a 78 drive anymore. In fact, I got my last one 30 years ago and it was a rebuild then. I also bought four diamond needles back then; I don't know if you can get them any more.

      How can I save what I have; how do I copy it from this medium to that medium? When we used to make backups of data at National Labs 25 years ago, we actually used to send off a whole big tape drive with the data so they could read it later. I don't know if they still do it and I don't know if those machines still work.

      I am sure this is a problem you wrestle with. Think about the fact that if the hurdles are high people who generate stuff are just going to delete it. You could lose all kinds of information, all kinds of history, just because it's a pain in the neck to try to save it or move it to the next medium. Only a few people are as persistent or as stubborn as me about figuring out a way.

      There is talk about the Semantic Web as the next level. This involves those agents I mentioned coming out of a Web. That sort of scares me because it is no longer under my control. If I am going to have agents, I want them working for me, not for some Web site.

      Then we have the intellectual property rights. Though I believe in patents and copyright, they are easy to abuse. Some applications for patent and copyright can scare the daylights out of you. Companies copyright anything that looks like anything, and most of the copyright and patent lawyers employed by most governments don't have the time or expertise to go back over the technology and validate anything that is close to being real. People are claiming stuff that is common knowledge, that has been out since 1960, but you have to hire a lawyer to go into court and fight them. Big companies can handle that, but it stifles a lot of creativity for the smaller players in hardware and software.

      Instant messaging standards. I actually want two different products on my laptop because there are two standards. One could argue for monopolies if you had one standard, but on the other hand the problem with multiple ones is if I want to talk to someone, I have to know which product to use.

      Interesting Side Effects

      Whole new languages are going to come out of these changes. My teenagers are into instant messaging all the time, and I taught them how—we used to shorten and abbreviate stuff all the time. How much of the new language can you capture, and how do you translate it later if you don't have the textbook beside you to figure it out?

      A whole new world of information today doesn't go through normal publishing channels anymore. I have stuff out there that has never seen paper. We used to say only weird people would publish only online, but a lot of people do it now. If it never hits paper, how are you going to archive it? If it moves through a Web site and the cache gets cleared or the URL changes and you can't find it, who is going to capture the information for later?

      However, there is a certain freedom from the tyranny of all these review panels that said who was allowed to publish and when and where. Publishing can be a pain in the neck. Now, the good side is that anybody can publish. The bad side is that anybody can publish. How do you verify the truth or value of what they are saying? You can set up online publications that never touch paper, but still have some kind of peer review. But a lot of people out there are not doing it. How much do we take the old model of review process and basically change it to work with the new technology? Are we going to start trying to capture this material somehow?

      Foretelling the Future

      Two years ago somebody from a big ISP in the US was sitting next to me on a plane, telling me what the Network Access Point (NAP) was designed for. I coauthored the NAP, but he didn't know who I was and he was sitting there telling me what it was for. He was making assumptions about what we knew or didn't know at the time. When we did the NAP, we sort of knew that life would change, but we couldn't say exactly how it was going to end up. I don't know anybody who can tell us how peer-to-peer is going to evolve over time. You have technical pushes and then you have the social-political, and you never know which one is going to win. As I said, everything is going to change.


      From the Discussion that Followed

      Discussion explored the implications of vast amounts of data—even just personal e-mail and digital snapshots—for access and for search tools, agents, and metadata.

      Moderator: I have tried to imagine you sitting before those six or seven screens, but all that information in one way or another has to enter your mind and for that you only have 24 hours a day. Is it possible to squeeze in more information in the same limited amount of time?

      Robert Aiken: You are going to see what I call the agent technology. You are going to teach smart agents with an artificial intelligence system how to go through information, either images or text, and figure out which ones to bring to your attention. Agents buy you time so that you only have to look at the things you really need to.

      What impact does that have on somebody physically? If you keep being barraged by very intense information streams, you are going to burn out. Studies at MIT have different sounds whose frequency or volume changes depending upon the information. One could use things like that to highlight information.

      For my e-mail, every half a year I rename my whole in-box "old" and create a new one. If I am looking for something, then I just hit the search engine. I wait two to three seconds, and it pops up. If I am not clear with my search, I have to search more selectively. I think this is the way people in the future are going to locate information. In other words, people are not going to take the time to go through it and classify it and categorize it and tag it. They are just going to shove it into what we call stacks. The processing power keeps increasing so much that it will be cheaper to search it than to catalog it.

      Moderator: The same applies to appraisal and selection. Nowadays it is much cheaper just to keep everything with a search capacity than to go through the whole process of appraising and selecting. Since we are all still in a hybrid situation where we have both paper and electronic records, that is a problem, but that will be solved in five years from now.

      Participant: For digital photographs, I see some hope because of the JTEC 2000 format that is being worked on. It combines a new standard for a file format and a standard for some metadata so that your camera can put in the time and the GIS location.

      Aiken: One nice thing about a video camera is that it has audio. You can say, "Here is the name of the building" or whatever. You are saying you can't search for audio. But you have the text, the speech translators that come cheap. Now I might start thinking about annotating some of the pictures. There is at least enough metadata to attach to it. If I have to type it in on the computer, forget it. It ain't gonna happen. That is the difference.


       


       
      Home  |   About RLG   |  Projects  |  Products & Services  |  Publications  |  Support
      Usage Statistics  |  Contact Us  |  About This Site  |  Copyright & Permissions  |  Site Map  |  © 2006 RLG
       
        About RLG home
        Mission & goals
        Members
        Board of directors
        Organization
        Events
        News
        Discussion lists
        Jobs
        Contact us
        Projects home
        Projects by goal
        Current projects
        Past work
        Guides & tools
        Working groups
        Products & services home
        Online databases
        Resource sharing & interlending
        Technical services
        Purchasing background
        Publications home
        Newsletters
        Symposium proceedings
        Books & reports
        Publications order form
        Support home
        Usage statistics
        Service schedules
        LI list
        Support contacts