AI-based tools to support research data management

Posted on June 28, 2024June 28, 2024 by Stephen Gray

As datasets grow in size and complexity, traditional methods of data management may be insufficient to meet the demands of modern research. This is where AI-based tools can come into play, offering a suite of powerful capabilities to streamline and enhance data management processes. These tools are designed to address a variety of challenges that researchers face, from data collection and cleaning to storage, analysis, and security. The integration of AI in research data management can enable researchers to focus on higher-level like tasks such as hypothesis generation and theory building, whilst helping maintain scientific reproducibility. But despite these advantages, it is important to recognise that AI tools are not a panacea, and present both opportunities and threats to open research. For example, they require careful selection and implementation to address specific research needs, and the reliance on AI necessitates a degree of proficiency in data science, which might be a barrier for some researchers. There can also be concerns over data reuse, and questions about the motivations of major-league software developers. Nonetheless, we’ve noticed some AI-based software tools that seem to be achieving prominence. Below is a list of these. They are not intended as recommendations, but may provide a starting point for critical evaluation.

Finally, an interesting perspective on the use of AI in science was provided by UoB’s Pen-Yuan Hsing in his recent talk at the Reproducibility by Design symposium in Bristol on 26th June: “AI is not the problem – thinking about outcomes”.

Data Collection and Integration

Google Data Studio

Google Data Studio allows researchers to turn data into informative, easy-to-read, shareable, and customisable dashboards and reports. Its AI capabilities help integrate and visualize data from multiple sources.

Keboola

Keboola leverages AI to integrate various data sources, automate workflows, and ensure data consistency, aiding researchers in managing complex datasets.

Data Cleaning and Preparation

Trifacta

Trifacta uses AI to simplify data wrangling, helping researchers clean and prepare their data for analysis. It identifies patterns and anomalies.

Talend

Talend provides AI-powered data integration and data integrity solutions, allowing researchers to clean, transform, and govern data efficiently.

Data Storage and Management

Datalore by JetBrains

Datalore is an AI-driven collaborative data science platform that allows researchers to create, run, and share Jupyter notebooks in the cloud.

Azure Data Lake

Azure Data Lake provides a scalable and secure data storage solution, with AI capabilities to manage large datasets and perform big data analytics.

RapidMiner

RapidMiner uses AI to facilitate data mining, machine learning, and predictive analytics. It offers a visual workflow designer for data preparation, model building, and evaluation.

KNIME

KNIME Analytics Platform is an open-source software that integrates various components for machine learning and data mining through a modular data pipelining.

Using Synthetic Datasets to Promote Research Reproducibility and Transparency

Posted on May 10, 2024May 13, 2024 by Richard Westaway

By Dan Major-Smith

Scientific best practice is moving towards increased openness, reproducibility and transparency, with data and analysis code increasingly being publicly available alongside published research. In conjunction with other changes to the traditional academic system – including pre-registration/Registered Reports, better research methods and statistics training, and altering the current academic incentive structures – these shifts are intended to improve trust, reproducibility and rigour in science.

Making data and code openly available can improve trust and transparency in research by allowing others to replicate and interrogate published results. This means that the published results can be independently verified, and can even help spot potential errors in analyses such as in this and this high-profile examples. In these cases, because data and code were open, errors could be spotted and the scientific record corrected. It is impossible to know how many papers without associated publicly available data and/or code suffer from similar issues. Because of this, journals are increasingly mandating both the data and code sharing, with the BMJ being a recent example. As another bonus, if data and code are available, readers can test out potentially new analysis methods, improving statistical literacy.

Despite these benefits and the continued push towards data sharing, many researchers still do not openly share their data. While this varies by discipline, with predominantly experimental fields such as Psychology having higher rates of data sharing, there is plenty of room for improvement. In the Medical and Health Sciences, for instance, a recent meta-analysis estimated that only 8% of research was declared as ‘publicly available’, with only 2% actually being publicly available. The rate of code sharing was even more dire, with less than 0.5% of papers publicly sharing analysis scripts.

~~~~

Although data sharing should be encouraged wherever possible, there are some circumstances where the raw data simply cannot be made publicly available (although usually the analysis code can). For instance, many studies – and in particular longitudinal population-based studies which collect large amounts of data on large numbers of people for long periods of time – prohibit data sharing for reasons of preserving participant anonymity and confidentiality, data sensitivity, and to ensure that only legitimate researchers are able to access the data.

ALSPAC (the Avon Longitudinal Study of Parents and Children; https://www.bristol.ac.uk/alspac/), a longitudinal Bristol-based birth cohort housed within the University of Bristol, is one such example. As ALSPAC has data on approximately 15,000 mothers, their partners and their offspring, with over 100,000 variables in total (excluding genomics and other ‘-omics’ data), it has a policy of not allowing data to be released alongside published articles.

These are valid reasons for restricting data sharing, but nonetheless are difficult to square with open science best practices of data sharing. So, if we want to share these kinds of data, what can we do?

~~~~

One potential solution, which we have recently embedded within ALSPAC, is to release synthetic data, rather than the actual raw data. Synthetic data are modelled on the observed data which maintain both the original distributions of the data (e.g., means, standard deviations, cell counts) and the relationships between variables (e.g., correlations between variables). Importantly, while maintaining the key features of the original data, the data are generated from statistical models, meaning that observations do not correspond to real-life individuals, hence preserving participant anonymity.

These synthetic datasets can then be made publicly available alongside the published paper in lieu of the original data, allowing researchers to:

Explore the raw (synthetic) data
Understand the analyses better
Reproduce analyses themselves

A description of the way in which we generated synthetic data for our work is included in the ‘In depth’ section at the end of this blog post.

While the synthetic data will not be exactly the same as the observed data, making synthetic data openly available does add a further level of openness, accountability and transparency where previously no data would have been available. Further, synthetic datasets can provide a reasonable compromise between the competing demands of promoting data sharing and open-science practices while maintaining control over access to potentially sensitive data.

Given these features, working with the ALSPAC team, we developed a checklist for generating synthetic ALSPAC data. We hope that users of ALSPAC data – and researchers using other datasets which currently prohibit data sharing – make use of this synthetic data approach to help improve research reproducibility and transparency.

So, in short: Share your data! (but if you can’t, share synthetic data).

~~~~

Reference:

Major-Smith et al. (2024). Releasing synthetic data from the Avon Longitudinal Study of Parents and Children (ALSPAC): Guidelines and applied examples. Wellcome Open Research, 9, 57. DOI: 10.12688/wellcomeopenres.20530.1 – Further details (including references therein) on this approach, specifically applied to releasing synthetic ALSPAC data.

Other resources:

The University Library Services’ guide to data sharing.

The ALSPAC guide to publishing research data, including the ALSPAC synthetic data checklist.

The FAIR data principles – there is a wider trend in funder, publisher and institutional policies towards FAIR data, which may or may not be fully open but which are nevertheless accessible even where circumstances may prevent fully open publication.

Author:

Dan Major-Smith is a somewhat-lapsed Evolutionary Anthropologist who now spends most of his time working as an Epidemiologist. He is currently a Senior Research Associate in Population Health Sciences at the University of Bristol, and works on various topics, including selection bias, life course epidemiology and relations between religion, health and behaviour. He is also interested in meta-science/open scholarship more broadly, including the use of pre-registration/Registered Reports, synthetic data and ethical publishing. Outside of academia, he fosters cats and potters around his garden pleading with his vegetables to grow.

~~~~

In depth

In our recent paper, we demonstrate how synthetic data generation methods can be applied using the excellent ‘synthpop’ package in the R programming language. Our example is based on an openly available subset of the ALSPAC data, so that researchers can fully replicate these analyses (with scripts available on a GitHub page).

There are four main steps when synthesising data, which we demonstrate below, along with example R code (for full details see the paper and associated scripts):

1. After preparing the dataset, create a synthetic dataset, using a seed so that results are reproducible (here we are just using the default ‘classification and regression tree’ method; see the ‘synthpop’ package and documentation for more information)

dat_syn <- syn(dat, seed = 13327)

2. To minimise the potential disclosure risk, when synthesising ALSPAC data we recommend removing individuals who are uniquely-identified in both the observed and synthetic datasets (in this example, only 4 of the 3,727 observations were removed [0.11%])

dat_syn <- sdc(dat_syn, dat, rm.replicated.uniques = TRUE)

3. Compare the variable distributions between the observed and synthetic data to ensure these are similar (see image below)

compare(dat_syn, dat, stat = “count”)

4. Compare the relationships between variables in the observed and synthetic data to check similarity, here using a multivariable logistic regression model to explore whether maternal postnatal depressive symptoms are associated with offspring depression in adolescence (see image below)

model.syn <- glm.synds(depression_17 ~ mat_dep + matage + ethnic + gender + mated + housing, family = “binomial”, data = dat_syn) 
compare(model.syn, dat)

As can be seen, although there are some minor differences between the observed and synthetic data, overall the correspondence is quite high.

GW4 Open Research Prize 2023: Theory of Change (from Research Culture blog)

Posted on May 10, 2024 by Richard Westaway

Read the new blog post by Christopher Warren, Assistant Research Support Librarian, about the GW4 Open Research Prize 2023 on The Research Culture Blog.

A Scholarly Works Policy for the University of Bristol

Posted on October 13, 2023October 13, 2023 by Alex Clarke

A new Scholarly Works Policy was approved at the April meeting of Senate. Here we set out the reasons for the policy, what it does, and how it will work.

Why are we introducing this policy?

The University is committed to improving research culture and – as part of this – supporting and enabling open research practices. The ability to publish our research Open access, ensuring free and unrestricted access to research outputs, is an essential part of this. Open Access has also become an expectation of research assessment exercises such as the REF, as well as a requirement of many funders (including UKRI and Wellcome).

Gold Open Access (paying publishers to publish the “version of record” Open Access via Article Processing Charges (APCs) and “transformative agreements”) is well established in many disciplines, but now green Open Access (self-archiving the author manuscript in an institutional repository) is becoming increasingly common.

The development of a robust green route to Open Access publishing promotes an inclusive research culture by making Open Access publishing available to all, regardless of academic position and current funding, and mitigates the risks of choosing to publish Open Access for individual researchers when navigating a complex publishing landscape. With most Russell Group Institutions implementing similar policies, it also strengthens our collective hand when negotiating with publishers for Open Access services.

The University’s new Scholarly Works policy uses the concept of “rights retention” to support authors in choosing to self-archive. With Rights Retention, authors can disseminate their work as widely as possible while also meeting funder and any future REF requirements.

What is rights retention?

Traditionally, publishers require that authors sign a Copyright Transfer Agreement. The only way to access the article after publication is to pay for it. Rights Retention is based on the simple principle that authors and institutions should retain some rights to their publications.

The policy provides a route for researchers to deposit their author accepted manuscript in our institutional repository, and, using a rights retention statement, both retain the rights within their work, and grant the University a licence to make the author accepted manuscript of their scholarly article publicly available under the terms of a Creative Commons Attribution (CC BY) licence.

What does this mean for researchers?

This policy should not involve a major increase in administrative burden for researchers. There will be very little change to researcher workflows – in fact, as part of the review of workflows Library Services is undertaking, there will be a reduction in the number of steps required for Pure submissions in many cases.

Library Services will be updating their webpages, guidance, training and instructional videos so that researchers can feel confident about using this policy. If you have questions, comments or feedback, please get in touch because it could be helpful in shaping this guidance. You can contact us by emailing lib-research-support@bristol.ac.uk

The Uncertain Space: a virtual museum for the University of Bristol

Posted on September 25, 2023September 27, 2023 by Catherine Dack

The Uncertain Space is the new virtual museum for the University of Bristol. It is the result of a joint project between Library Research Support and Cultural Collections, funded by the AHRC through the Capability for Collections Impact Funding, which also helped fund the first exhibition.

The project originated in a desire to widen the audience to some of the University’s collections, but in a sustainable way which would persist beyond the end of the project. Consequently, The Uncertain Space is a permanent museum space with a rolling programme of exhibitions and a governance structure, just like a physical museum.

The project had two main outcomes: the first was the virtual museum space and the second was the first exhibition to be hosted in the museum. The exhibition, Secret Gardens, was co-curated with a group of young Bristolians, aged 11-18 and explores connections between the University’s public artworks and some of the objects held in our rich collections.

Entrance to the Secret Gardens exhibition

The group of young people attended a series of in-person and online workshops to discover their shared interests and develop the exhibition. The themes of identity, activism and environmental awareness came through strongly and these helped to inform their choice of items for the exhibition.

hand pointing at manuscripts on a table — Choosing items from Special Collections for the exhibition

Objects, images and audiovisual clips, to link with each of the public artworks, were selected from the Theatre Collection, Special Collections, the Botanic Gardens and from collections held in the Anatomy, Archaeology and Earth Sciences departments. For some of the choices, digital copies already existed, but most of the items had to be digitised by photography or by scanning, using a handheld structured light scanner. The nine public artworks were captured by 360 degree photography. In addition, the reactions of the young people were recorded as they visited each of the public artworks and these are also included in the exhibition.

scanning a piece of malachite — Scanning a piece of malachite for the first exhibition

As the virtual museum was designed to mimic a real-world exhibition, the University of Bristol team and the young people worked with a real-world exhibition designer, and it was found that designing a virtual exhibition was a similar process to designing a real-world exhibition. Some aspects of the process, however, were unique to creating a virtual exhibition, such as the challenges of making digital versions of some objects. The virtual museum also provides possibilities that the real-world version cannot, for example the opportunity to pick up and handle objects and to be transported to different locations.

Towards the end of the project, a second group of young people, who were studying a digital music course at Creative Youth Network, visited the virtual museum in its test phase and created their own pieces of music in response. Some of these are included in a video about the making of the museum.

The museum and first exhibition can be visited on a laptop, PC or mobile device via The Uncertain Space webpage, by downloading the spatial.io app onto a phone or VR headset, or by booking a visit to the Theatre Collection or Special Collections, where VR headsets are available for anyone to view the exhibition.

We are looking forward to a programme of different exhibitions to be hosted in The Uncertain Space and are interested in hearing from anyone who would like to put on a show.

You can read more about the making of The Uncertain Space and its first exhibition from our colleagues in Special Collections and Theatre Collection:
Our collections go virtual!
Digitising for the new virtual museum: The Uncertain Space

Case Study: Library Research Metrics Service

Posted on April 3, 2023April 3, 2023 by Zosia Beckles

This is the first of a series of qualitative case studies exploring the work and impact of Library Research Support activities and services. This case study focuses on the Library Research Metrics Service.

What we do

The Library Research Metrics Service provides support to individuals with research metrics queries, via training on a range of research metrics platforms, and education and outreach to ensure the university’s commitments to responsible use of research metrics are upheld. This is designed to complement support offered by the Department of Research, Enterprise and Innovation’s Research Information and Evaluation team which has a wider remit covering strategic research intelligence and support for large grant bids.

As well as an email enquiry service and web guidance, the Library Research Metrics Service provides training via online workshops, open to all academics and postgraduate researchers. These serve as an introduction to the concept of citation metrics and alternative metrics, what they can and cannot be used for, the principles of responsible metrics, and the importance of data accuracy – including how this may be improved through the use of ORCID researcher identifiers. Sessions also include live demonstrations on the tool, platform, or process of attendees’ choice: for example, how to create bespoke reports in SciVal, how to find alternative metrics in Scopus or Altmetric Explorer, or how to clean up author profiles in Scopus and other bibliographic databases.

Outreach activities are a key part of the support service; currently the ORCID promotion campaign is the main focus for outreach activities. This campaign seeks to increase ORCID signup rates among research staff and PGRs, which with support from Faculty Research Directors will be achieved in a variety of ways:

Direct communication with the small subset of researchers that have an ORCID but have not fully synchronised it to their Pure profile
Talks at School assemblies and other relevant gatherings
PGR-led promotion activities
Passive communication via posters and banners in key locations
Active encouragement via a prize draw for new ORCID signups

Enquiry types

The email enquiry service receives a range of enquiry types: primarily these relate to 1) use of specific metrics platforms, 2) requests for metric support for grant for promotion bids, 3) queries about the use of metrics to support decisions on journal choice. Often a large part of the response to these enquiries is educational rather than direct provision of the resources requested. For example, both DORA and the University’s own statement on Responsible Research Evaluation state that research outputs must be considered on their own merits rather than the reputation or ranking of the journal or publisher. Therefore, a significant part of enquiry work is responding sensitively to researchers with these types of queries, to explain why metrics may not necessarily be helpful in making these decisions and to signpost to alternative tools and methods for journal selection. There are some instances where specific metrics can be useful: for example, establishing proportions of article types published in a given journal to identify titles most likely to be receptive to submission of similar manuscripts. In these instances, the Library Research Metrics Service will demonstrate how these metrics can be obtained or provide bespoke reports.

Another common query category comes from researchers who are finding unexpected results when seeking metrics data on their own publications: typically, missing publications or missing citations. Support in these instances usually takes two formats: 1) an investigation into and explanation of any data inaccuracies and suggestions for how these may be addressed, and 2) education on the limitations of metrics platforms – which is particularly relevant for researchers working in disciplines that are not covered well by the main bibliometrics platforms (arts, humanities, and those working in languages other than English, to name a few).

Outcomes and next steps

Responses to these education and outreach activities have largely been positive, with researchers praising the service for providing “really helpful” information. Certain departments or units are frequent flyers to the service – for example ALSPAC – but generally users tend to have a single query only. It remains to be seen whether the raised profile of the Library Metrics Service provided by the ORCID promotion campaign will result in larger volume of enquiries. In future, workshops will be run in person as well, and online workshops will be provided asynchronously to enable wider uptake.

Shiny shells and steamships: an experiment in phototexturing a 3D model.

Posted on October 12, 2022October 12, 2022 by Catherine Dack

In the Library Research Support team we have quite a bit of experience of 3D scanning and of photogrammetry, but have never tried combining digital photographs with scan data to make a ‘photorealistic’ 3D model.
When we were asked to scan a large, engraved shell belonging to the Brunel Institute , we decided it was time to give it a go, using our Artec Space Spider structured light scanner and the ‘phototexturing’ function in Artec Studio 16. This phototexturing option allows photographs of the object to be combined with the digital model to improve the model’s textures and produce a more photorealistic result.

The shell in question has a shiny surface and is engraved with text and images, including depictions of the SS Great Britain and Omar Pasha, an Ottoman Field Marshall and governor. Shiny surfaces can be problematic when scanning, but we dialled up the sensitivity of the scanner a bit and encountered no difficulties. We were also concerned that the very low relief engravings would not be discernible in the final model, which did indeed prove to be the case.

We were careful to capture both scans and photographs under the same conditions, scanning one side of the shell and then, without moving it, taking photographs from every angle before turning it over to scan and photograph the underside.

When processing the scan data, the main difficulty was fixing a large hole in the mesh which occurred in the cavity of the shell where the scanner had not been able to capture data. Because of the complex geometry, Artec Studio’s hole-filling options simply covered the hole with a large blob. Therefore, we used the bridge function to link opposite edges of the large hole and subdivide it into smaller ones, which could be filled with a less blobby result. We then used the defeature brush and the smoothing tool to reduce flaws. The result is not an accurate representation of the inside of the shell, but gives a reasonable impression of it and, without any holes in the mesh, the model can be printed in 3D.

Adding texture from the photographs was simply a matter of importing them in two groups (photos of the top and photos of the underside) and matching them to the fusion. A handful of photographs couldn’t be matched but there was enough overlap between the other photographs to complete the texture. The phototextured model does show some shadows as we were not using dedicated lights, but there is significant improvement in the resolution and in the visibility of the engravings.

an engraved shell — The shell before phototexturing, showing texture captured by the scanner.

When we came to experiment with printing the model, we found there was not enough 3d geometry to reproduce the engravings, though we had avoided simplifying the mesh during processing. As the faint engravings on the shell are mostly visible through discolouration, we think that 3D printing in colour would be a good solution and the Brunel Institute are also considering other possibilities, such as engraving directly onto a 3D print. We look forward to seeing the result of their chosen solution.

Engraved shell by bris-dhsupport on Sketchfab

More on finding open access research

Posted on February 17, 2022October 12, 2022 by Catherine Dack

The library has subscribed to two services that will help you to find open access articles, as well as those subscribed to by the library.

LibKey Nomad is a browser extension that will connect you to full-text articles that are either available via a University of Bristol Library subscription or open access. Read more on the Library webpages.

LibKey.io allows you to access journal articles which are available either by library subscription or open access, by using either a digital object identifier (DOI) or a PubMed identifier (PMID). More information is on the Library webpages.

See also our previous post on finding open access articles.

Finding Open Access Research

Posted on February 5, 2021February 5, 2021 by Alex Clarke

It has become common practice for researchers to make a copy of their research articles available for free online. Many of these ‘Open Access’ papers are held in institutional or subject repositories – which can make them challenging to find. However, there are several useful tools designed to make this a lot easier.

Useful Open Access Resources

CORE

CORE aggregates the Open Access full text content of many Open Access repositories, including PubMed Central, so that you can search and read it all in one place.

Searching here will help you find many articles that you can open and read for free. CORE also contains electronic PhD theses and other works that are hard to find elsewhere.

EndNote Click

EndNote Click is an extension for your internet browser that quickly tells you if you have access to a version of a journal article that you are looking at. It detects when you are looking at an article’s page and if you have access, either through your library’s subscriptions or through an Open Access version, it will provide a link to the document.

This is generally the most convenient way to find Open Access work if you’re used to searching academic journals and databases. The extension will work in Google Chrome.

Unpaywall

Unpaywall is another useful browser extension. It adds an icon to the right-hand side of any page where it detects an academic article. The icon indicates whether there is an Open Access version available and clicking it will take you to the appropriate document.

Unpaywall draws on slightly different sources to Kopernio, but does not check if you have access through your university. It may be helpful to install both. The extension will work in Google Chrome and Firefox.

Open Access DOIs

If you’re familiar with DOI numbers, then you know that you can use them to link to articles. (e.g. http://doi.org/10.1038/ng.3260 ) However, this will usually only link you to the publisher’s version, which might try to charge you for access. If you use the Open Access DOI format instead – (http://oadoi.org/10.1038/ng.3260) – you can create a link to an Open Access version of the article, if one is available.

This is a good way to find out if there is an Open Access version. It’s also a good way to share an Open Access paper with someone else who might not have access to the publisher’s version.

DOAJ (Directory of Open Access Journals)

DOAJ curates a list of Open Access journals across a range of subjects. If you want to find Open Access journals within your discipline, this is a good place to look. You can also use their search function to find resources from across their database of journals.

They provide criteria for good practice in Open Access journals and can be a useful place to check the quality of a new Open Access journal that you weren’t previously aware of. Inclusion in DOAJ implies that the journal follows their principles and is therefore likely to be a reputable source.

DOAB (Directory of Open Access Books)

DOAB is a collection of Open Access books from a range of subjects and publishers. It is a good place to search if you are looking for more in depth Open Access materials and is a useful companion to a DOAJ or CORE search.

Ethos

The Electronic Theses Online project run by the British Library collects electronic theses from UK university and makes them available through Ethos. You can search Ethos to find results from a large collection of PhD theses. The search may also return works that are currently under an embargo, but you can limit your search to Open Access resources if necessary.

Pandemic Publication Panic – what to do when you need to publish your data from home

Posted on October 26, 2020July 28, 2022 by christopher.warren

Tl;dr – If you don’t have time to read the full post, here are three things you can do now which will speed up the process. This shouldn’t take you more than 30 minutes to set up, and will probably take a lot less. (If you’re a PI and don’t already have an RDSF account, you’ll need to do that first.)

If you’re a Data Steward of a project with multiple users, nominate Deputy Data Stewards so you can delegate some of the duties (creating a record, associating data, tidying up files). You can have two deputies, and ACRC can do this for you.
Install and set up the University’s VPN so you can access the network securely via Single Sign-On.
Map your project as a drive on your computer, so you can see all of your folders and files. Here’s how to do it on a Windows machine.

That’s it. If you want to know why you need to do this, read on!

‘Help! We’re mid-pandemic, I’m at home, and I need to publish my data! I need a DOI for a paper!’ We’ve heard this a lot over the past few months, and there are some issues which keep cropping up. So, we thought we’d take a moment to give you a rough guide on what to do when you have PPP (Pandemic Publication Panic).

As with so much else this year, COVID-19 has brought a huge change in working practices. From March 2020, the majority of research has been carried out at home. Research and professional services staff are all working from home where possible, and only on-campus when necessary.

For researchers, it affects the way you work with and store data, which in turn affects your workflow for publishing and sharing data.

To publish data in the repository, data first needs to be in the Research Data Storage Facility (RDSF), so we can copy it across. But how do you get it in there if you are working remotely?

Hang on – RDSF v data.bris repository – what’s the difference?

The Research Data Storage Facility (RDSF) is where you STORE data. The Research Data Repository (data.bris) is where you SHARE data.

The RDSF is a secure, private, University storage facility, designed specifically for research data. It’s so secure and private, users can only access it through the University’s network. You can store sensitive data there, but it needs to be encrypted first. Advanced Computing Research Centre manage the RDSF.data.bris, AKA the Research Data Repository, is a public gateway to research data, with citeable, Google-indexed Digital Object Identifiers (DOIs) for datasets. Data are shared to data.bris staff and we check and publish the dataset with a publicly accessible dataset record – data are either available on demand as Open Data or after an agreement has been signed if there are access restrictions. The Research Data Service (that’s us) manages data.bris.

So, once you’ve decided what you want to share, you put it in a designated folder in your RDSF project (it has ‘Data-Bris’ on it) and you share it with us. But to do that, you need to get into your RDSF folders.

Accessing the University of Bristol network

On-campus, University computers link to the RDSF through the network – either through hardwired desktop machines or via the VPN software or hub cables on NWOW laptops. At home, you will need to establish a secure connection to access the RDSF.

There are three ways of doing this, but for ease and speed, we recommend using the VPN:

Virtual Private Network

The VPN puts your computer on the University network and allows access to a number of services including the RDSF. University-managed Windows computers have the VPN software already installed. The VPN is often needed to access files on Filestore when using a managed University laptop. Details are available on how to install VPN on your Windows, Mac and Linux machines and on iOS and Android devices.

If you’re using a personal device, please follow instructions on the IT website.

Setting up the VPN is a simple 2-stage process for Windows or Mac:

i. download the BIG-IP Edge Client app, to install on your machine (you only need to do this once)
ii. start the app and log in to open the VPN and access the network

Publishing data

Once on a secure connection, you can access the RDSF, and deposit data for publication in data.bris. We’ve got a short video you can use which shows the publication process:

We’ve also got a webpage with more detailed instructions on the publication process.

In particular we’d highlight our short Data preparation rules, and what you to include in your essential Readme.txt file to help anyone encountering your data for the first time to ‘unpick’ your data, access your file formats, check your data sources and understsand your file naming choices – basically make understanding your data as simple and supported as you can!

And that’s it. As soon as you have ‘requested publication’ we can look at the dataset.