A characteristic of data. Data may have several attributes.
Large, diverse, variable, and hard-to-manage volumes of data that grow at an increasing rate. It is often collected from publicly shared social media and websites, and is stored in computer databases and analyed with the help of specially designed sofware-as-a-service products. Examples include large-scale data on demographics and purchase history, which can be analyzed and broken into actionale information such as patterns, trends, and correlations.
The collection of global servers to store data accessed throught the internet and the softwares and databases that run on those servers. The cloud enables the on-demand availability of computer system resources, information, data storage (called cloud storage) and computing power around the world. Cloud servers of SaaS companies are managed by a third-party (Google, Apple, Microsoft, Amazon).
The models, policies, rules, standards, and tools that govern which data is collected and how it is acquired, stored, arranged, organized, transported, integrated, queried, and used in data systems and organizations. In other words, it is the framework for how information technology infrastructure (such as cloud storage, APIs, data streaming, cloud-computing, analytics, etc.) supports an organization’s data strategy. Good data architecture improves agility, transparency, and security of data at an organization.
A data and metadata management tool that companies use to inventory and organize the data within their systems. It uses metadata — data that describes or summarizes other data — to create an informative and searchable inventory of all data assets in an organization. These assets can include structured (tabular) data, unstructured data, reports and query results, data visualizations and dashboards, machine learning models, and connections between databases.
Any employee given access to an organization’s proprietary information; a data analyst, data scientist, data steward, or other data professional with access to corporate data that searches through all of an organization’s available data assets to access and use data for their analytical or business needs.
A collection of names, definitions, and attributes about data elements that are being used or captured in a database, information system or part of a research project. It describes the meaning and purpose of data elements within the context of a project and provides guidance on interpretation, accepted meanings, and representation.
The core ethical values, principles, ideals, and morals in the field of data science. Data ethics of an organization constitutes responsible use, storage, and sharing of data. Organizations may outline data ethics pratices in a code of conduct.
An emerging data management design for attaining flexible, reusable, and augmented data management (i.e. better semantics, integration, and organization of data) through metadata. Metadata drives the fabric design. Compared to traditional approaches, active metadata and semantic inference (i.e. the process by which new data is added to a dataset, created from the existing data) are key new aspects of a data fabric.
A collection of processes, roles, policies, standards, metrics, and accountabilities that ensure the effective and efficient use of data in an organization. It involves the process of managing the availability, integrity, and security of data throughout the data lifecycle and across departments of an organization. Good data governance ensures that data is consistent, trustworthy, timely, and does not get misused, enabling better decision-making, business planning, and compliance.
The systematic organization of data in a hierarchical form. It can be used to reflect the structure of the business organization (business unit data sources, sub-business unit sources, consolidated data, etc.).
The process of combining data from different sources into a single, unified view. Integration begins with the ingestion process, and includes steps such as cleansing, ETL (extract, transform, load) mapping, and transformation.
The overall accuracy, completeness, and consistency of data. Data integrity also refers to how well the data complies to regulatory requirements and security standards.
A vast pool of raw data – unstructured, semi-structured, or structured- in an organization whose purpose is not yet defined, either at the company or department level. There are no limits on data sizes, files, sources, or structure; data is kept in its original format and machine to machine data logs flow in real time.
A map of the data journey, from origination to consumption, which includes the data origin, what happens to it and where it goes over time. It involves the process of recording, visualizing, and understanding data as it flows from data sources to consumers, including all the transformations of data along the way. Details are necessary to provide compliance auditing, improve risk management, trace data errors to their root cause, and comply with internal policies and regulatory standards.
Often used as a general term to describe competencies when working with data, i.e. the ability to work with, analyze, communicate, and argue with data with an understanding of the data sources and constructs, analytical methods and techniques applied, as well as of the use case application and resulting business value or outcome.
The practices, policies, and procedures related to collecting, using, organizing, storing, and maintaining data in a secure, efficient, and cost-effective manner. Good data management is a critical part of information technology systems that helps people, organizations, and technologies use data optimally within the bounds of policies, regulations, and ethical considerations and informs strategic decision-making and actions within a company.
The process of matching fields from one database to another. It’s the first step to facilitate data migration, data integration, and other data management tasks.
A concept that describes the availability and usability of data in an organization; refers to to the ability and ease of an organization or user to get data precisely where and when it is needed. Strong or good data mobility translates to organizations being able to leverage data (manage, analyze, and create reports, etc.) from any possible source (platform, software, database, etc.) in any form (structured, semi-structured, or unstrusctured) all in a timley manner.
The process of creating a visual representation of either a whole information system or parts of it to communicate connections between data points and structures. The goal is to illustrate the types of data used and stored within the system, the relationships among these data types, the ways the data can be grouped and organized, and its formats and attributes.
A set of tools and processes used to automate the movement and transformation of data between a source system and a target repository.
A dimension of data that reflects whether or not a data set or point will be adequate for a use-case and/or in decision-making. Data quality is comprised of seven key dimensions (see “Seven dimensions of data quality”).
The ability and practice of distributing the same sets of data resources with multiple users or applications while maintaining data accuracy across all entities consuming the data.
A database that is maintained and used by one group or department which is not easily or fully accessible by other groups in the same organization.
The process of ensuring the accuracy and quality of data. It is implemented by building several checks into a system or report to ensure the logical consistency of input and stored data.
The act of translating information, often by humans with the help fo machines, into a visual context, such as a map or graph to make data easier for people to consume, understand, and gather actionable insights from.
A central repository for structured, filtered data in an organization or department that has already been processed for a specific purpose (is currently in use). Data is stored in files and folders that help organize the data, provide a multidimensional view, and support strategic decision-making.
The practice of using data to inform leadership decisions at an organization.
Database management system
Software that handles the storage, retrieval, and updating of data in a computer system.
Data that has been attributed relevant characteristics and metadata required for a specific use. Data that has been defined is able to be processed or analyzed with data of similar or other characteristics.
Enterprise-wide business transformation enabled by digitization and digitalization. It can involve distinct digitalization projects, but goes beyond to encompass all aspects of business where technology adoption leads to strategic business transformation, ie an entirely new market, custmomer, and business.
The process of taking otherwise manual, paper-based, or physical processes and making them digital, ie, leveraging digital technologies to transform business operations. Digitalization often involves automation, increases process efficiency, and improves data transparency.
The process of taking analog information (text, images, sound, etc.) and encoding it for computers to store, process, and transmit. For example, converting handwritten text (or VHS video, paper photographs) into digital form is a digitization process.
Data that is used as evidence to support a specific belief or proposition. It can be analyzed, presented, converted, etc. to validate, prove, or disprove a specific position.
Key Performance Indicator (KPI)
A metric used to periodically track and evaluate performance toward the achievement of a specific objective or target.
A type of data that cannot be counted, measured or easily expressed using numbers, mostly reflected textually.
Information that can be counted or measured, in other words numerically recognized and analyzed.
A request input by a user and executed by a database management system (DBMS); it can either be a request for data results from the database or for action on the data, or for both.
Relational Database Management System
The software used to store, manage, query, and retrieve data stored in a relational database, where the data points are related to one another. A RDBMS is an advanced version of a database management system where data is typically stored in the form of tables rather than files.
Semi-structured data is information that does not reside in a relational database but that has some organizational properties that make it easier to analyze. With some processes, it may be possible to store it in a relational database. For example, an email or a text file.
Seven dimensions of data quality
The key characteristics that define data quality. These include: i) accuracy (how well the information reflects reality/expectations), ii) completeness (how the information fulfills expectations of what’s comprehensive), iii) consistency (if the information stored in one place matches the relevant data stored in another), iv) timeliness (if the information is available when it is needed), v) conformity (if the information is in the correct format and follows business rules), vi) uniqueness /integrity (if there is only one instance where the information appears), and vii) coverage (if all possible information is captured with the right breadth and depth).
A software licensing and delivery model in which a software provider delivers an application to users on the internet via a website or app. Unlike traditional software products, SaaS software is licensed on a subscription basis (typically with a monthly or annual fee) and is centrally hosted. End users of SaaS products or platforms typically don’t have to undertake costly or lengthy upgrades to the solutions; since the software is typically cloud-based, upgrades are managed by the solution provider.
Systems that contain data to feed a data warehouse. This may include operational databases and other internal or external systems.
Data that is defined and formatted for a specific purpose. It follows a standardized format or well-defined structure; it complies to a data model, follows a persistent order, is easily accessed by humans and programs (such as in machine learning), is quantitative, is searchable, and is typically stored in a database. Data is structured when similar data points are grouped in classes (i.e. have the same attributes), it is organized in fixed fields in file or record, it has an explicit definition and meaning, and it has an identifiable structure (such as appearing in rows and columns).
System of record
An information storage system that is recognized as the original and authoritative source of data. A system of record may serve as a shared reference point for any person or system to confirm the origin of a data point or set.
Data, whether structured or unstructured, that has no associated metadata to contextualize it and designate it for a specific use. Undefined data exists but without any other necessary characteristics or attributes it is not useful for processing or analysis.
Data with no particular data model, that cannot be processed and analyzed via conventional tools and methods. It is stored in its native format, and is often qualitative rather than quantiative, such as documents, web pages, email, social media content, mobile data, images, audio, video, and more.