Ron Dekker, Scientific Statistical Agency (WSA)
Let me start with stating that what I am going to say is my own opinion; it is not the Social Science Council's opinion.
Let's start where we ended yesterday evening. Of course, you went to bed early, right after dinner, or you had just one beer, saw that the coffee shops were not for coffee, the lounge café was saving electricity, and the waitresses are so poor that they can't afford any clothes. That is social sciences.
Social sciences is about behavior, happiness, expenses, safety, crime, traveling, housing—all these kinds of matters. Social scientists, luckily for you, do not study individual behavior. They want to study the behavior of a group, explain the behavior of a group, and discover trends. Scientists set up theories which they want to test, and for this they need empirical data.
I want to talk about all the new data and metadata and data preparation and methodologies to analyze the data in social science.
A lot of social scientists used to construct their own data. They would set up a small sample and go into the city, or ask some students. If you set up these samples yourself as a researcher, this requires expertise—how to pose the right questions, how to construct the questions and questionnaires, how to do the sampling, random sampling, or other kinds of sampling, stratified sampling, and you have to know something about data entry, cleaning the data, coding and recoding the data.
Secondhand Data
Therefore, a lot of social scientists in various subdisciplines are getting secondhand data. This saves a lot of time. If you want to have a time series on certain years and you want to construct it yourself, you have to wait for 30 years.
This also ensures quality. National statistical offices are experienced and specialized in getting these surveys done, and they know everything about the statistics and methodologies and sampling.
This is also efficient at the societal level. If every evening at six o'clock when you are having dinner you get a phone call from a new bureau asking you some questions about what you think of a political party or the new washing powder, you get tired of that. Concentrating these surveys saves a lot of time for people and for companies.
The major issue for secondhand data is that the data owner or the data producer wants to protect his instrument. This is not merely to protect the privacy of respondents. It is to his own benefit to protect privacy because that guarantees that he might get access the next time he wants to call on these respondents. There is also a law that says that privacy of respondents should be protected.
When distributing surveys to researchers from a national statistical office, the office has to ensure that there is no risk of revealing the privacy of the people. With small samples there is little risk to privacy because the chance of your being in this sample is about one percent or less.
Moreover, the data producer can make data anonymous so that no name or exact birth date or ZIP Code is in the data that they are distributing to researchers. In addition, they can have several data-protection measures. You know the term "public-use files," which are protected in a way that makes them hardly usable for science any more. That is why we also have scientific-use files in The Netherlands, which are less protected and can be used for research.
Some measures for protecting data include randomizing results for both continuous and discrete variables. For example, you can add some noise to income, and there are special techniques to change the ZIP Codes. You can also aggregate outcomes. Instead of 20 categories, you could go to 10 or five categories, or you could simply suppress data. At the Netherlands Statistics Office they have a special program where data, before being sent to the researchers, are checked on "rare outcomes" in multidimensional tables.
For secondhand data, these measures can solve the problem of protecting the privacy of respondents.
New Data
Let's talk about new data—for example, registries. You could have access to the entire unemployment registry, the entire census, or pension fund data. Even data from taxes can be accessed. Think about data that are collected by banks, cell phone companies, credit card companies. When you do your shopping in The Netherlands, we have the Air Miles card. Every time you do your shopping and you use this card, you get some benefit, but everything you bought is being stored in some database.
I don't know if this is a true story, but just to indicate how credit card companies work: if you have a credit card and you come home and you find some rather high amount of money on this bill from a particular place, then the credit card company might phone you and say, "You never go to these kinds of places. Has your card been stolen, or what is happening?"
Other kinds of data, not personal data but location or geographical data, are increasingly important not only for geographics but also for economists. You can combine location with personal data. Yesterday Robert Aiken talked about GPS and cell phones as a means to locate where you are. If you want to study mobility, this would be a perfect database to use.
Another upcoming type of data is on neighborhood statistics, very detailed regional statistics. In The Netherlands we have at the ZIP Code level, which is about one street, all kinds of aggregates attached to it—for example, income or the number of cars or the number of children.
There is also medical data. Psychologists especially are using patient records, after getting the permission of the respondents. Patient files or even block samples are taken into the data. In The Netherlands there is the famous twin register—I think it covers 25 percent of all twins in The Netherlands—giving excellent data if you want to do social and also psychological and medical research.
What does a new data record look like?
- It starts with your social security number because it is a perfect key variable, as it is in a lot of registries. Then you have some household characteristics, including family and friends or your network, your picture and your video, all the pictures of Robert taken in Amsterdam—they are all attached to this data file.
- Information about housing, mobility, the car you are using, the number of miles that you are driving each year, and where you drive most, education—not only the final education, but also the exams you missed—occupation, your employer history, jobs, union membership, income, savings, expenses. It is said that one of the big shopping malls in The Netherlands knows before your bank does whether you need a loan or not. If you are starting to buy cheaper versions instead of the brands, it could be an indication that you are short of money and that perhaps you need a personal loan.
- Health, including DNA and your health records; crime victimization, whether your bike has ever been stolen—if you live in Amsterdam, the answer is yes.
- It also has a lot of data on your opinions, either on voting or religion or what you think about the European Union or about France, etc. This is stored in one data record.
Utopia?
Is this Utopia for researchers? No; there's more. At Statistics Netherlands, they are constructing a virtual census. That means that they can have a census at any time they want to because they have all the input they need to construct it. They start with 16 million records, which is the Dutch population. They are allowed to demand all government registries, unemployment or taxes or whatever, and they can include all their own and all government surveys, from the national government down to the municipality. Every survey you answer from a government body could be included in this new dataset.
The same goes for company data. Each company will have one record, and everything they know about this company will be stored in the database.
I talked before about the banks and insurance companies and the cell phone companies who know exactly where you are now if your cell phone is on. Grocery stores and shopping malls keep track of all your shopping. It is estimated that in The Netherlands each person is in about 1,000 registries or surveys. Perhaps you know the example of Iceland where the DNA of the entire population was sold.
What is the message? The first one is not new. Everything becomes digitized, as was said yesterday. The number and various types of data increase, and there are no limits on what can be stored in data records.
When national data archives, especially in Europe, started in the 1960s to store all kinds of electronic data, the mere fact that the data had to be electronic was a useful filter so they could decide to store everything which was digitized. Nowadays this filter is gone because everything is digitized. Archives will have to decide on a new policy on what to archive and what not to archive.
For scientists the fact that everything is digitized is good news.
The second message is that researchers won't have access. You can see it, you can smell it, you can hear it, but you don't get it. The reasons are the commercial value of data and protection of privacy.
Especially in geographical data, there is a high commercial value in collecting all these kinds of data and selling them to companies. For example, if you want to buy a new car, you can ask a central office, "In my city, give me all the customers who have a car older than three years whose used value is about 30,000 euros."
There is also increasing risk of privacy violation. If you have a registry you know that everybody is in, you can start looking for the person you want to find.
In The Netherlands there is a discussion about whether we should have a DNA bank. They started a DNA bank for severe criminals, but there was a shift, of course, to get everybody into this DNA bank. If you have the whole population and you have a DNA track, then you can find the one who did it. We all know now about anti-terrorism measures. For example, in the US older data on education are collected to find out whether some other people tended to take flight instruction. There are other types of uses for these data. Three months ago the Palestinian National Statistical Office was occupied by Israel, and some computers were carried away.
There is a risk for the civilian. If all these data records are stored in one place, what if someone gets access to these data whom you don't want to have access?
Access
I think we have to find safe ways for access to data by researchers.
There are three ways of accessing data. The first is to have the data on your desk. Another one is to visit the local statistical office and work on the data on-site. There is a new way where you have access at your desk to data held remotely on a mainframe somewhere else.
On desk. The most known method is by CD-ROM. For the data producer, in the mid-1990s the CD-ROM was a safe medium because it could not be copied. Nowadays it is very simple to get CD content on your PC.
Another way of getting data on your desk is to download it, but then the problem for the producer is how to register the users. How can you check the real identity of the users?
The advantage of a CD is that you can construct one standardized file for all research purposes. (If you negotiated with each researcher and you gave one several variables and the next researcher gets other variables, the moment they meet and combine the data it is no longer guaranteed that these combined data are safe.) With only one version, you are ensured that you are distributing same data.
For the researcher the advantage is evident. He or she has access to individual records, doing all the kinds of research desired. The disadvantage for the researcher is that these standardized files on CD will be protected in some way or another.
My experience is that once you give the data to the researcher he forgets all the conditions he should obey, so he doesn't submit his publications to the data producer to check tables on privacy violations and doesn't keep the data in a safe environment. The CD will be on his desk and not in a safe place. Even on the network—you only hope that it is impossible to get on this network.
On-site. This refers to research data centers you have in the US; Canada is also setting up centers. In The Netherlands we have two locations for going on-site: the two offices of Statistics Netherlands. In Germany there will probably be local statistical offices in each state.
For the producer the advantage is clear. There is total control over the data. Instead of protecting the data at the input and checking all kinds of users and the identity of users, they just have to check on the output. If the output is safe, little can go wrong.
The advantage for the researcher is that he has access to all kinds of data. The unprotected files are open and accessible. There is no data protection, no negotiation about aggregation level of some variables. The disadvantages are the travel required to the on-site facility and the administrative burden: just suppose that all researchers are working on-site, and one employee at the Statistics Office has to check all this output.
In the US the research data centers introduced a new market: local researchers doing the work at these data centers. Again there is a danger for the data producer because there will be someone else who is actually using and analyzing the data. This might undermine data protection.
Remote access. This is a system which automatically accepts requests by researchers, checks them, processes them, and returns output. This kind of system has the advantage of being on-site because you can have access to all the data, but there is one big disadvantage. You won't have contact with the data and you won't have access to the individual records.

This slide shows what it is like: You start on the left. You send an e-mail. This is received by the post office. It checks your ID, checks the syntax of your SAS program. It sends it to one of three batch machines and gets the data from another data server and sends it back to the black box. There the output is checked either automatically or by hand. If it is okay, it is sent back.
This system is being used in the Luxembourg Income Study, and it does work. You have response times in minutes rather than hours or days. If you want to control everything very precisely, then the response time will be days. For some data this remote access could be convenient.
To sum up: if you talk about access to data and talk about secondhand user data, there is an increasing supply. Everything is digitized, but there is also a danger of closing down access to data. Besides the on-desk and on-site, we have to look at new ways to access data.
A World without Access to Data
Just suppose we could not get access to the data. That would be very inefficient because researchers want data at any cost any way. They will go to creating their own data collections, and there will be no standardization, because on one topic you would have two large datasets, one from the Statistical Office and one from the group of researchers. There is danger of a downward spiral movement. There is no trust between researcher and data producer, so there is more data protection. There is more danger of alternative routes, and the lack of trust grows.
In accessing new data, the key is mutual trust between the data producer and the researcher. Of course, you must have the contracts and the secrecy statements and the protocols, but the key is mutual trust.
A World with Access to Data
Suppose that Statistical Offices all agreed and gave researchers access to all the data. We would drown. Again, there would be inefficient use. Searching data costs a lot of time. Preparing the data takes even more time, and it becomes specialized work. You have to do some cleaning and you have to construct new variables before you even get to your analysis.
Second, we would not know now how to analyze the data because there is a lack of statistical methods for complex, mixed-mode data, which I will talk about.
Metadata
Searching requires metadata. There are a lot of standards and a lot of developments on metadata for surveys, but metadata should also cover other kinds of data—tables, longitudinal data, cohort data, registries, qualitative data, and, as I mentioned before, complex data.
In The Netherlands we are starting up a Kinship Panel Study also known as "Family Relationships: The Ties That Bind," which asks relatives about certain matters and which for a subsample has in-depth interviews. Another example is Statistics Netherlands' Living Conditions Survey. For example, if your bike has been stolen, you could enter a special module on crime or on victimization. It also includes health information and blood samples. Another example is an Internet panel which is asking one member of the family each week or each month about certain matters.

These become very complex data. You have to know what your population is, how to weight a subsample to population statistics.
Describing complex data is like describing this picture. I could talk for about 15 minutes to describe this painting. You could also take 15 minutes to describe it, and the match between our two stories would be not that big, I think. There is no standardization to cover the metadata of this painting.
Ideas about metadata are good, but they need to be standardized. There are a lot of initiatives. On surveys there is the data documentation initiative which was undertaken by ICPSR (Inter-University Consortium for Political and Social Research). You have your working group on metadata. There is an ISO standard (ISO 11179).
A Knowledge Cycle
Talking about creating new knowledge, let's have a look at the knowledge cycle by Nonaka and Takeuchi (the knowledge-creating company). There are four phases in setting up the knowledge cycle: socialization, externalization, combination, and internalization. You could start anywhere. In socialization, you start to discuss the idea. Externalization means putting it on paper. In the combination phase you have several ideas on paper and you start to exchange ideas and to integrate these ideas into one. Internalization gets this knowledge into your organization or your work processes.
On metadata there is especially a lack of combination. There are a lot of ideas, but these ideas should be put on paper and people should sit together and come up with one standard.
Data Preparation
Suppose we solve the problem of metadata. We still have a problem because the way to do social science research is changing. Research problems and research questions are more complex. There will no longer be a one-to-one relationship between research questions and the data. For example, if you study unemployment, you want to include national policies, whether they are effective, so you need other types of data than just those in a socioeconomic panel.
I mentioned complex data. You want data on an international level, or employer and employee data, which is an emerging research topic, or combinations of registries, interviews, pictures, and policy evaluation. Researchers will use these kinds of data, and it will change the way work is done on preparing data.
Preparing data is expensive. OECD (Organisation for Economic Co-operation and Development) countries have figured out that they spend about 500 million dollars each year on R&D—this is all sciences—and a large part of it is absorbed by data supply. If you talk about getting and understanding and cleaning data, this takes a lot of time. I think it is about 30 to 40 percent, just getting the data in your hands. With the growing complexity of data, I think the amount of time will rise.
Researchers have to think about new ways of preparing data and of handling this problem. Universities and policymakers should be aware that data are not incidental research costs any more. They are prominent. They are international, and they are multipurpose. I think other sciences are more used to this approach.
A social scientist can be like a lonesome cowboy. But if you would cooperate with all kinds of research staff in preparing the data, helping you with the data, you would be a Formula One driver as a researcher. You would have a team of people helping you to make the best performance.
Organization and Cooperation
If you talk about creating knowledge, especially in the social sciences, everybody is in socialization, is thinking about new ideas. Some are put on paper, but it is primarily tacit knowledge and there is little externalization and internalization. If you don't make this whole knowledge cycle, you are blocking the creation of knowledge. Cooperate. Combine efforts on accessing and sharing data.
For this I want to use the term "the hub-and-spoke model." It is appears in the United Kingdom's ESRC (Economic and Social Research Council) white paper on research infrastructure policy. They say that you could have some central activities.
For example, archiving would be more efficient if it were done centrally. You need some trusted third parties who are allowed to do this data merging. For example, in Norway the National Archives has access to the National Cancer Registry. If researchers want data, they ask this Archives and say, "I want to research this topic. I need these variables," and they get these variables from the Archives. This can be done at a central level.
For data enhancement you need specialized knowledge. You need the ability of the data producers and the ability of the researchers and the local data experts to add value to certain data.
Data Brokers
For data access I think you need a data broker. That is what I am doing at the Social Science Council. We have just three people and a large board, but we are trying to be the connection between the Statistics Office, which does not speak academic language, and academics who don't have as much understanding about the problems of the Statistics Office and data protection. We try to be an in-between party. We try to solve financial matters at one level and give access to researchers at far lower cost. For example, for all surveys on persons and households through Statistics Netherlands, the Council pays one 450,000 euros a year just to access the data; whereas, the tariff for the researcher is about 2,200 euros for one dataset. There is a big difference, and you should solve this at a central level.
In other countries you see data archives acting in this role as a data broker, trying to find out what are the needs of researchers for the data.
There are also data librarians, which are more prominent in the US and Canada and less familiar in Europe where we have central data archives. Research libraries and university libraries could act as a front office for researchers and students and they could be data brokers at the local level, knowing what their researchers and their faculty or university might want. At local levels you should not have to worry about archiving and getting the data; you should worry about what kind of data the researcher wants and then go to the central level and get this data.
New Techniques
And for efficient use of new data, you need new statistical methods. If you have survey data or conventional data, you have one dataset and you can have linear regression and apply logic or factor analysis. For new data, if you have longitudinal data (panels, cohorts, multilevel, or international data), you could do some survival analysis. But what about mixed-mode data, where you have qualitative and quantitative data, where you have health data or interviews or pictures? What about registries where you have the whole population and instead of random analysis, you want systemic analysis? You need techniques for data mining.
This calls for statisticians from all disciplines. For example, social sciences got the techniques of event history and survival analysis from biology. Refer to the knowledge cycle: I think you have to focus on a combination from all disciplines and internalize this new knowledge in the social science departments.

To wrap up, the major barrier in accessing new data will be mutual trust between the data producers, especially statistical offices, and researchers. If this trust could be realized, then we could have a huge laboratory for social sciences. It would be on the level of CERN (the European Organization for Nuclear Research) in Geneva for physics.
We also have big problems. We have researchers trying to find data, so we have to standardize these metadata. We have to have data brokers at local levels. We have to have data archives at the central level, taking care of long-term archiving.
Finally, there is a lack of statistical methods, and this applies also to the interdisciplinary approach. And we have to jointly address data preparation. We have to get acquainted with complex data, but a researcher doesn't have to do this work all by him or herself. In social sciences we should try to get a culture that is more common in health sciences and physics, where researchers are used to cooperating on large infrastructures.
From the Discussion that Followed
Discussion covered different national laws and attitudes about the sensitivity or privacy of data, the time value of data, and how librarians and archivists can act as data brokers.
Data Protection and Privacy
Participant 1: Can you talk a bit about how similar or different data protection and privacy laws are globally and how likely it is that we can combine vast amounts of nationally drawn-together social science data?
Ron Dekker: I think it is a global problem. In Europe we have a new privacy law; it is a European regulation which forced each national country to have a new law on data protection. They are talking about this "safe harbor" principle. This means that, if I have data in The Netherlands, I am not allowed to export these data to a country that does not obey this safe harbor principle, like the US. It is a problem at international levels.
Moderator: We have this one European directive on data protection, and it has been implemented for national applications. I have done a study in comparing these national legislations. As an American or a Canadian, you should not believe that there is one European legislation. No. If you are dealing with Italy, you have to check the Italian registration. That is just one example which has no major legal consequences, but the European directive has a definition of what are sensitive data—data on race, religion, political behavior, etc. This catalog has been changed in different national laws. In The Netherlands we start with race. In Italy they start with religion. Why? I don't know. You have to study these national laws. What will happen when within the European Union they are combining these data?
Participant 1: That is my question. What likelihood is there that we can imagine the hub-and-spoke model beyond national boundaries?
Moderator: I believe that all the national legislations are still in a traditional model. As was explained yesterday, the time and space continuum is changing. My personal data are in a number of international, global data registries. No longer can a national legislator live with the idea and the belief that we can legislate everything concerning data on Dutch people.
This globalization has not yet come through to the national legislators or even to the European legislators. This whole nonsense about safe havens and trying to fix, as it were, the boundaries of European data vis-à-vis American or Canadian or Asian data is not feasible.
Dekker: I think another globalization is about this national data archives. If you look at international data like European barometers or general social surveys or the new European social survey, these are kept at various places. They are archived three or four times.
The new European Social Survey will be archived and distributed by the Norwegian Data Archive. This is a first step, I think, toward, at least at the European level, having one address. If I want to know something about this European social survey, I go to the Web site of the Norwegian Archive.
I think data archives will have to specialize on particular data. It doesn't make sense if the German Archive and the Norwegians both concentrate on European barometer data; it would be inefficient.
Participant 2: Last year RLG met in Ottawa, and we heard from Canadian archivists that privacy activists in Canada were attempting to abolish the early 20th-century Canadian census under the principle that individuals had the right to be forgotten. Is that point of view a live one or is it represented in western Europe at the moment?
Moderator: In The Netherlands we don't have a census. The last census was in 1971. Because of the privacy issue and privacy concerns, the census law was abolished. We are not worse off than the United States or any other country which has a census, so you can do a lot without having a census.
We have, as in other European countries since Napoleonic time, a very sophisticated, much more sophisticated than the United States, system of citizen registration. This virtual census by the National Statistics Office replaces the annual American census.
Australia, for instance, had a big debate, and they have destroyed their old censuses.
From the archival point of view, I would say that, if Parliament or the society decided that there should be a census, then you should keep that census. What you have to restrict and be very careful about is access to these data later on.
The paradox is that generally people don't want government to have many data on them, but there are many cases in Europe and also in North America where people come back to the archives and are using those same data which they had opposed government collecting to implement their rights or to get benefits or pensions.
On the whole question of the Nazis, we could only restitute works of art—gold, money, valuables—and do the reparation in many European countries because the data which are nowadays opposed had been kept on those people. It is paradoxical. It is paradoxical that the more people want protection of their own privacy, the more they are giving away in appearing on television and telling all sorts of stories about their private lives.
Dekker: What also strikes me is that there is concern about privacy and government data, but no one is worried about the data at private companies. Your bank has far more data on you than the Statistical Office.
Moderator: In all the European countries now—and I think it is the same in the United States—every citizen has the right to address any private or public organization asking, "What data do you keep on me?" This right is used maybe 10 times or 100 times a year in a population of 15 million. People really don't care.
Participant 2: That is not entirely true. One thinks of things like closed-circuit television. The schizophrenia is there when you can see the protection that this affords you in a shopping mall. Then you move on to a different take on it when you are conscious that you are being observed and your movements are being logged and kept all the time.
I was interested in the Canadian archivists because I was beginning to wonder whether the commercial value of the data declines quite quickly and whether there are several phases in this where the immediate collection and for a period, let's say, of five years has a significant sensitivity which declines over time. Apparently for the Canadians and perhaps for the Mormons as well, it begins to rise after a while.
Is there a cycle in this that we could use to our benefit so that researchers researching five-year-old data and ten-year-old data have different levels of access?
Participant 3: I think of much more importance in terms of popular consciousness is the overwhelming interest in genealogy, the family history, and the enormous use of the Internet for this purpose. The Brits already know that the Public Record Office in the UK has put the 1901 census, I think in its entirety—names, addresses, occupations, and so on—onto the Net just this year and had to take it out again two days after it was put up, even though it was on a powerful computer, because the demand for it worldwide was absolutely overwhelming. To say that all this ought to be destroyed—I am very much against it.
Moderator: But ask yourself: What if tomorrow the police come to your office and say, "We want to have all the patron records, all those people who have checked out in the last two months book A, B, and C, be it pornography or manuals to make a bomb?" How was this case solved at this bookshop in Denver, where the bookshop owner refused to give access to the records of the clients who had bought a particular book? That is also about privacy.
Participant 4: This is a really common event that we have in Ireland because of terrorism in the north. In recent years several times we have disclosed records to the police force of what people have been doing with library materials. If we don't disclose them, they will subpoena them anyway, so the question becomes irrelevant. We have to retain the records for a period for our own statistical purposes. If the police want them, we have no choice but to hand them over. Under the law they can demand that.
Moderator: And your clients are advised or warned when they register?
Participant 4: Absolutely. They know perfectly well what the position is. I suppose at the end of it, if somebody is borrowing books to make a bomb or to poison the water supply, which is something we had a few years ago, they are aware that those records may be available if the police want them.
Centralized Archiving
Participant 5: You said that in your opinion data archiving was performed more cost-effectively centrally. I wonder if you could tell us about the experience that lies behind that.
Dekker: The experience is based on this ESRC report on data infrastructure. There was a discussion about whether you should concentrate on data archiving as an art. Data archiving has to be there; it is a provision. There is no selection on what is relevant or irrelevant data. You have to archive these data.
If you talk about use of data, at the UK Data Archives 90 percent of all use is based on 10 percent of their collection. Out of 3,000 studies, they are using 300 and perhaps even less are used for most research.
At an archive the question is not what is relevant data now; it is what is relevant data in 10 or 20 or 50 years. Therefore, data archives should be something different from data enhancement projects where you concentrate on data which are relevant now and you elaborate on them. The experience and knowledge that is useful for archiving is better used if you have it concentrated at the national level.
Moderator: You mentioned the commercial value of data. Is there any experience that an existing, big data archive like in Essex and in Amsterdam market their data? Can they earn money if the data has commercial value and, if they don't, why not?
Dekker: There is a discussion about whether researchers would have to pay for data which are already collected mostly with public money.
Moderator: Not only researchers. Can't you imagine that you would serve a wider community including private enterprise, because they would be interested in your data?
Dekker: I think this market is rather small, and I think it should not be the primary task of an archive to make money from the data. People should be aware that this is a service which has to be at the national or European or whatever level, but it has to be there. I think it can never earn back money from selling all kinds of data.
Moderator: To put it differently, public archives in some countries like ours also acquire private records from private companies, churches, foundations, associations, etc. Could one imagine that existing data archives would try to acquire the data currently held by commercial companies? At a certain point these companies will no longer be interested in what they consider to be old data, but then it will becomes historical data.
Dekker: That means selling your knowledge as an archive to private companies. The other one that was mentioned was genealogy. The Danish Archive also has a site which is very popular. They say, "If we had a phone office where each customer had to pay one euro, we could set up a bureau of 10 people." If every customer is paying a small amount, then these types of heavily demanded services can make money.
Participant 6: The problem I have with a central office is this business model. The only business model that seems to work is central government funding of some kind or central funding. Organizing that and ensuring that for the long-term future is very difficult to achieve.