In common usage, data (, ) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of that may be further interpreted formally. A datum is an individual value in a collection of data. Data are usually organized into such as tables that provide additional context and meaning, and may themselves be used as data in larger structures. Data may be used as variables in a computational process.
Data are data reporting using techniques such as measurement, observation, query, or analysis, and are typically represented as or characters that may be further data processing. Field work are data that are collected in an uncontrolled, in-situ environment. Experimental data are data that are generated in the course of a controlled scientific experiment. Data are data analysis using techniques such as calculation, , discussion, presentation, visualization, or other forms of post-analysis. Prior to analysis, raw data (or unprocessed data) is typically cleaned: are removed, and obvious instrument or data entry errors are corrected.
Data can be seen as the smallest units of factual information that can be used as a basis for calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements, including, but not limited to, . Thematically connected data presented in some relevant context can be viewed as information. Contextually connected pieces of information can then be described as data insights or intelligence. The stock of insights and intelligence that accumulate over time resulting from the synthesis of data into information, can then be described as knowledge. Data has been described as "the new Petroleum of the digital economy". Data, as a general concept, refers to the fact that some existing information or knowledge is represented or in some form suitable for better usage or data processing.
Advances in computing technologies have led to the advent of big data, which usually refers to very large quantities of data, usually at the petabyte scale. Using traditional data analysis methods and computing, working with such large (and growing) datasets is difficult, even impossible. (Theoretically speaking, infinite data would yield infinite information, which would render extracting insights or intelligence impossible.) In response, the relatively new field of data science uses machine learning (and other artificial intelligence (AI)) methods that allow for efficient applications of analytic methods to big data.
When "data" is used more generally as a synonym for "information", it is treated as a mass noun in singular form. This usage is common in everyday language and in technical and scientific fields such as software development and computer science. One example of this usage is the term "big data".
When used more specifically to refer to the processing and analysis of sets of data, the term retains its plural form.
This usage is common in the natural sciences, life sciences, social sciences, software development and computer science, and grew in popularity in the 20th and 21st centuries. Some style guides do not recognize the different meanings of the term and simply recommend the form that best suits the target audience of the guide. For example, APA style as of the 7th edition requires "data" to be treated as a plural form.
Knowledge is the awareness of its environment that some entity possesses, whereas data merely communicates that knowledge. For example, the entry in a database specifying the height of Mount Everest is a datum that communicates a precisely-measured value. This measurement may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. Awareness of the characteristics represented by this data is knowledge.
Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract. In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that range from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a sign to differentiate between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something.
Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. With the development of computing devices and machines, these devices can also collect data. In the 2010s, computers were widely used in many fields to collect data and sort or process it, in disciplines ranging from marketing, analysis of social service usage by citizens to scientific research. These patterns in the data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as "truth" (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.
Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A Computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A computer program is a collection of data, that can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.
Some of these data documents (data repositories, data studies, data sets, and software) are indexed in Data Citation Indexes, while data papers are indexed in traditional bibliographic databases, e.g., Science Citation Index.
Data accessibility. Another problem is that much scientific data is never published or deposited in data repositories such as . In a recent survey, data was requested from 516 studies that were published between 2 and 22 years earlier, but less than one out of five of these studies were able or willing to provide the requested data. Overall, the likelihood of retrieving data dropped by 17% each year after publication. Similarly, a survey of 100 datasets in Dryad found that more than half lacked the details to reproduce the research results from these studies. This shows the dire situation of access to scientific data that is not published or does not have enough details to be reproduced.
A solution to the problem of reproducibility is the attempt to require FAIR data, that is, data that is Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be used in subsequent research and thus advances science and technology.
The term data-driven is a neologism applied to an activity which is primarily compelled by data over all other factors. Data-driven applications include data-driven programming and data-driven journalism.
|
|