Microservice environments with databases often grow to be a complex architecture behind the scenes to the point where requirements can’t be met. This talk will show how to run a scalable stack with persistent data storage based on Docker and how that will lead to less grey hairs on the Ops team.
Getting the most out of your containerized database
1. Containerized DBs
In a Machine Data Environment
(or how you get the most out of your containerized database)
DevOps Gathering, 24th March 2017
@claus__m
2. About
~2yrs at Crate.io
DevRel/Field Engineering/Support/
Integrations/…
Crate.io
Founded in 2013, ~25 people and growing
Offices
San Francisco, Berlin, Dornbirn (AT)
Talk to me about
Rust, Raspberry Pis, Tech!
@claus__m
4. Source: HPE Jun 2016
http://www.slideshare.net/penumuru/harness-the-power-of-big-data-with-oracle-63438438/9
@claus__m
5. Machine Data
Characteristics
Millions of data points/second
Streaming in from sensors, devices, logs, etc.
Data diversity
Structured & unstructured JSON, Blobs
Real-time query performance
Monitoring & alerting
Complex queries of big data volumes
With Terabytes of historic data
Growth
Adding sources often means exponential
growth @claus__m
6. Machine Data
Internet of Things
Sensors, cameras, ...
Wearables, Gadgets
Location data, interaction data, ...
Logs & Monitoring data
Component health monitoring, access logs, ...
Industry 4.0, Digitization
Production line insights, automation, ...
Vehicles
Location data, health data, ...
@claus__m
7. Clickdrive.io
Fleet management & vehicle tracking
Vehicle health and tracking data
High ingest rate
2,000 data points per car, per second
In-depth & real-time analysis
Predictive maintenance, accident
reconstruction, route/driver efficiency
@claus__m
8. Roomonitor
Smart apartments
Monitoring & control climate, occupancy, noise,
access
Better efficiency, safer environment
Alerts: AC/heating on with window open, noisy
neighbors, ...
@claus__m
9. Skyhigh Networks
Cloud access security broker (CASB)
Access logging for cloud services
Large data volumes & ingest
Billions of events per day from 600+
customers, 10s of thousands of concurrent
TCP connections
Machine data is the fingerprint of fraud
Unsupervised learning to find anomalies
@claus__m
15. Go Live
More users!
More sensors and users
Data storage
Slow and fast
Monitoring & Analytics
Two different subsystems
LOAD BALANCER
V
C
S
S
U
S S
U
NoSQL DBMessage
Queue
SQL DB
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
@claus__m
16. But ...
Even more users?
Horizontal scaling?
Maintenance & bug hunting?
Mostly via scheduled downtimes
Reporting?
Kafka? Elasticsearch?
Security?
Access control?
Expertise?
Knowledge transfer?
LOAD BALANCER
V
C
S
U
S S
U
NoSQL DBMessage
Queue
SQL DB
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
S
@claus__m
22. CrateDB Fundamentals
Disk-based index with
in-memory caching
Fast and efficient OS caching
Shards: Units of data
Concurrency by distributing
shards
Distributed query execution
engine
“Push down” queries
@claus__m
24. A better
setup!
Horizontal scalability
Scale out everything
Reduced tech stack
Get to know it quicker
Live reporting
Use ad-hoc
queries on
production data
Flexibility
Schema
Evolution not
required @claus__m
LOAD BALANCER
V
C
S
S
U
S S
U
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
25. A better
setup!
No single point of failure
As highly available as your service
Reduced network traffic
Better reliability
No queue
Work with
real data
DB isolation
Accessible only
from the host
@claus__m
LOAD BALANCER
V
C
S
S
U
S S
U
U
S
S
C
V
C
MONITORING
V
S
ANALYTICS
26. Live Demo
Docker Swarm
Orchestration across platforms
Eden Server (Rust!)
RESTful web service
Eden Client (Rust!)
ARM application for reading
temperature data from BMP180
Grafana
To draw up a nice dashboard
@claus__m
LOAD BALANCER
G
E
ME
Pi
E E
28. An Open Stack
for Machine Data w/ CrateDB
Ad-hoc analysis with SQL
Instant reporting on production
data
Integrates well
Legacy SQL applications included
Horizontally scalable
Container native, highly
availability
@claus__m