Secure Data Commons - Conducting Analysis

Conducting Analysis in SDC

As a data analyst, work within the USDOT Secure Data Commons (SDC) to share code and data, upload datasets, and export approved derived analyses. Through the SDC, you can:

  • Share code and data with other analysts
  • Upload your own datasets
  • Export approved derived analysis

We'll provide you with a cloud-based workstation with preloaded programming environments and software that grants you access to the data lake and data warehouse. The workstation also includes commercially available tools - no local software or tool installation needed!

Analytical Tools and Query Languages Supported

The SDC platform provides on-demand access to popular programming and statistical tool packages for cloud-based processing (for experienced analysts). Other, nonstandard software can be installed upon request, both individually and across user groups. For software requiring special licenses, analysts may provide their own existing licenses.

Analytical Tools

Power BI logo

Custom options available
upon request

Query Language

R Studio logo
Python logo
SQL logo

Custom options available
upon request

Types of Datasets

The SDC platform provides a data lake of transportation-related structured, semi-structured, and unstructured datasets that are stored in raw, curated, and published formats. Each dataset has different data agreements based on the complexity and sensitivity of the data. Access to specific data is approved by data providers - learn more about specific dataset formats below:

Dataset Formats

Raw Datasets

Unaltered data are stored in their native/original "as-is" (raw) format. Uploads can be continuous through streaming sources (i.e., APIs or sensors) or through one-time uploads from external sources. This data can be structured (databases, logs, financial data), semi-structured (HTML, XML, RDF, CSV), or unstructured (images, PDFs, Word documents). Raw data cannot be copied or exported.

Curated

Data curation is the process of integrating raw data collected from various sources and annotating and presenting the data so that the value of the data is maintained and made available for reuse and preservation. For researchers and data scientists, curated datasets enable data discovery and retrieval and maintain data quality. During the curation process, data are transformed from unstructured and semi-structured formats to structured formats; and data deduplication, obfuscation, and cleansing processes are conducted - resulting in high-quality data that enables researchers to elicit meaningful insights.

Published

Researchers create published datasets to disclose their research and allow other users to verify and reuse the data beyond their original purpose. Published datasets are a result of combining analyses on curated datasets in the SDC platform with other datasets or algorithms owned or created by a researcher or data scientist.

What's Next?

As a data analyst planning to do analysis in the SDC, use the steps below to get started.

Number one

Download the access request form (PDF 77 KB), fill out the required details, and send an email to sdc-support
@dot.gov
. Once approved, we will send you an email with the instructions for accessing the platform.

Number two

Follow the instructions in the Welcome Email from the SDC. Review the Data Analyst User Guide (PDF 4.8 MB).

Number three

The Enablement Services Program offers custom upgrades to help your project team along the way

Last updated with release 2.6 (June 12, 2020)