A Service to Parquet (Oracle)
Alexa Modular Adapter
Alexa Enabled Universal Remote
ARbot
AutoIrrigation
Automated Hydroponics
Autonomous UV-C Sanitation Bot
Bus Tracker Project
Bus Tracking System
Bus Usage Monitor
Classmates Search
Cloud Native Wireguard
CO2 Monitoring System
Diabetics Companion
Edu Plastic Pollution
EDU (CPU)
Googun
H2Eyes
IMDB on FPGA
Indoor Robot
Induction Motor
Land Trust Management
Learning Storage Networks
Low Latency Gaming
Marine Plastics Monitor
ODS Web App Performance Tuning
Offroad Spotting Drone
ONI Code Visualization
Painless Healthcare Management
Parquet+OCI project
Preventing Vehicular Heatstroke
Remote Nuclear Monitoring
Rent-a-Driveway 2020
ResearchConnect
RREESS Microgrid Management
Save our Species 2020
SAWbots - Miniature Medical Robots
Self Stabilizing Personal Assistance Robot
Slug Charge
Slug Sat
Smart Cane
Smart Magazine Floorplate
Smart Park
Smart Seat Cover for Posture Detection
Smart Slug Bin
Soaring Slugs
Team Litter Buster
Understanding Healthcare Data
Vibrace
VoIP Management Assistant
Wildfire Detection Drone
Abstract
This project is intended to provide a cloud service that provides a variety of tools that can aid Oracle clients in storing and processing their data. Switching to Parquet is made easier with the file conversion library and data processing tools like filtering, searching, and sorting can transform that same data and store it on the cloud. Users can access this service on the Oracle cloud and will easily be able to streamline their data analytics process.
Approach
Service: The core of the service provides a way for a client to communicate with the server. This was accomplished with Helidon, an Oracle open-sourced library.
Library: After communicating with the server, the Helidon service calls on the library portion that provides all the functionality, including data-processing operations and cloud access. This required using the Apache Parquet library and the Oracle Cloud Infrastructure SDK.
Overview
As data sets increase, storage and data analytics require optimizations. When Oracle customers store data, analyzing the data involves extracting, transforming, and loading the data, called the ETL process. While this process is costly and inefficient, Oracle’s solution revolves around a file format called Apache Parquet. Designed as a columnar storage, Parquet files can offer efficient and complex data processing. However, current solutions handling Parquet are not-so-lightweight. Oracle hopes to create a cloud service for Parquet that is both lightweight and easy-to-use in order to streamline the ETL process.
Results
Clients who use our service have access to a variety of tools, including file conversion, row filtering, column filtering, and cloud access. Compared to current solutions, our service makes data processing simpler. With our time constraints, we used Apache Drill as a shortcut to implement some functionality. Additionally, we did not reach our intial goal of deploying our service on the cloud.
Benchmark
Benchmarking our native solution with the Drill implementation, we found that our native solution performed better than Drill’s. We also see that that improved performance is more noticeable for smaller files.
Acknowledgements
Daniel Langerenken – Vishal Vaddadhi – David Hernandez