A Brief Introduction to Data Portals
A crucial tool for any organization, data portals perform a range of functions, from providing an easily-searchable catalog of your data to enabling data visualizations and enhancement. This article is a must-read for anyone looking to unlock their data’s potential, from NGOs to the Fortune 100.
Photo by Matthew T. Rader on Unsplash
# What is a data portal?
A data portal is a software that catalogs datasets. There are two main types of data portal: open data portals for sharing public data, and internal data portals for sharing data within an organization. They serve as a single “point of truth” for an organisation’s data or of data relating to a certain topic. Along with basic catalog features, data portals can incorporate an extensive range of functionality for organising, structuring and presenting data.
# Background
The rise of data portals reflects the increase in the volume and variety of data being collected by organizations. This could be data on tax, crime and geolocations, in the case of governments, and sales, customer preferences and costs, in the case of enterprise. Even the simplest of organizations may have dozens of data assets, ranging from cloud spreadsheets to web analytics, meanwhile large organizations can have very complex data arrangements, ranging from Hadoop clusters and data warehouses to CRM systems. The more data you collect, the more robust your storage needs to be and the more sophisticated your system for managing it.
# Why are data portals useful?
Data portals have five main functions. These are listed below.
# 1. Data discovery -
organizations wanting to get the most out of their data and use it to drive fact-based decision-making first need to overcome a basic obstacle: working out what data they actually own. Without a data portal, they might have to rely on word of mouth/ calling around the office to ask if anyone knows about the whereabouts (or even existence) of a certain dataset or file.
# 2. Data access -
common metadata, data showcases and data APIs make data easily and quickly accessible to technical and non-technical users. Data previews also allow users to work out whether the data is what they are looking for or not without having to open it.
# 3. Data lineage -
without a portal, it is easy to forget or lose track of data created a long time ago, by colleagues who have now left, or even just by other colleagues in the office. If you can’t locate the data, you might assume it doesn’t exist, re-invest in collecting it and have your data engineers re-transform it - a costly and time consuming process.
# 4. Data integration -
often, organizations keep their data across different systems, devices and clouds. This means that data becomes ‘siloed’, ie. not accessible by certain people or devices, leading to cumbersome document sharing across departments or staff. Some data portals can also take data directly from the web, transform it into the correct format, and include it in the portal. This lets you integrate public data with your own.
# 5. Data visualisation & analysis -
one of the key motivations behind organising data is that you can use it to generate insight. Some data portals allow you to create graphs or other visual tools to monitor and analyse patterns or anomalies.
# Using your portal as a scaffold
With your data standardised and uniformly accessible, you can start to discover new purposes for it. That is to say, aside from helping you get your data organized, data portals act as scaffolds to start building with your data. Much of the new value for data comes from unexpected or unplanned applications that are made possible by combining existing data from across divisions and systems.
You can also use data portals to start applying principles of progressive enhancement to your data. Only once your data is standardised and uniformly accessible can you begin to enrich your data. This may involve the following additions: data dictionaries (the location column refers to cities); data mappings (by using this city as a look-up against a different set, we can know that Sidney is a city in Australia & other info about Sidney); and data validation (are these locations correct?).
# What type of data portal is CKAN?
CKAN is the leading data portal software. It is usable both ‘out of the box’ and as a powerful framework for creating more tailored systems. CKAN’s combination of open-source codebase and enterprise support make it uniquely attractive for organizations looking to build customized, enterprise-grade solutions.
# FAQs
# Are data portals always either open or closed?
Data portals are not necessarily always either open or closed, but could fall somewhere in between. Some organizations - particularly those in the fields of research or philanthropy, ie. those wanting to help others with their data - might use a portal for internal data management while allowing external organizations to search certain data sets.
Want a data portal from the people who built CKAN, the world’s leading data management system and the world’s first portals for publishing open data such as data.gov and data.gov.uk? Contact us!
We empower government and enterprise to unlock their data’s potential through outstanding data management strategy and implementation. Check our website for more information.
© Datopian (CC Attribution-Sharealike (by-sa)).
graphic design: Monika Popova