Getting Started: A Real World Time Series Data Modeling Example
Hello folks, I’m Leon Sasson, and I’m currently working towards my Computer Science degree at Northwestern University. I’m excited to join the TempoDB team as an engineering intern and over the next series of posts, I will be showing you different use cases and some of the best practices on our platform, so you can focus on building your product while we make sure your time series data storage and analysis is simple, scalable, and fast.
There is no better way to understand a technological tool than with a hands-on example. I will guide you through the different techniques and best practices of TempoDB using a publicly available time series dataset.
We will be using weather data, specifically the Automated Surface Observing System (ASOS) one-minute dataset that is publicized by the National Oceanic and Atmospheric Administration (NOAA). It is weather data recorded by different weather stations around the U.S. There are many different stations, and each station records many types of weather metrics, such as temperature and wind speed.
In this series of posts I’ll show you how to model this data in TempoDB.
Interview with Daniel Friedman, CTO at Ninja Blocks
“We could have hired one or two more engineers and we could have put them a day or two a week keeping this behemoth time series database up and running and trying to figure out how to scale it but in no way, shape, or form would it be anywhere near as efficient as just completely outsourcing the problem to a team of specialists.”
In this interview, we learn more about Daniel, and he provides insight into how TempoDB accelerates the development of Internet of Things applications.
Optimizing Relational Databases for Time Series Data (Time Series Database Overview Part 3)
So far in this series, we’ve discussed characteristics of a time series dataset and how relational databases are not the only option for the variety of datasets you have at your company. But why does this matter? A relational database can still store time series data in smaller quantities. Why make the switch to a new architecture?
Relational databases are designed with flexibility in mind and structured to store any type of dataset you can define. This is an advantage in many ways, but for time series data, this flexibility creates challenges when storing and analyzing a high volume of timestamp/value pairs.
A common way we see developers storing time series data from sensors, smart meters, servers, etc. in a relational database looks like this:
In this example, each row has a unique identifier that indicates that the data is stored by date and sensor. The columns define the sampling rate, and the values stored are based on the sensor reading on a given day at a specific time.
This common relational database implementation of time series data can create problems for you in two main ways:
We are hiring multiple Sales Engineer positions
We are on a mission to make sense of the measured world, and we want you to join our sales engineering team to help with inbound customer leads.
TempoDB is a new SaaS database primarily targeted at the following industries: energy, smart grid, oil and gas, internet of things, server/network infrastructure, renewable energy, medical and health tracking, and finance.
We are a small team located in Chicago, with customers spread around the world.
At TempoDB, we are building the time series database service that enables the storage and analysis of massive streams of measurement data that break traditional databases. Our technology makes it possible to measure more about our environment, our infrastructure, and ourselves, and to learn about and improve our world. This is a huge opportunity, and we need smart, passionate team members to help us with our growing inbound leads.
This is a technical sales/sales engineering role. We’re looking for someone who loves to help solve customer problems, and is hands on and technical enough to guide the customer through the integration process. You should be comfortable helping a customer through the evaluation and decision process, and equally ready to dive into the command line and parse logs.
If this sounds like you, please get in touch by emailing email@example.com.
TempoDB is a Smart Grid News Company to Watch
After two rounds of voting over two weeks, TempoDB was selected as a Smart Grid News Company to Watch for 2013!
This is a tremendous honor for us, as we join awesome companies like Cisco, EnerNOC, and Schneider Energy on the list.
We believe that the smart grid forms the foundation for a new measured world and a new Age of Accountability, and we are excited about the opportunity to provide a database service purpose-built for the task of storing, analyzing, and monitoring the massive streams of time series data being generated by millions of smart grid devices.
Thanks to everyone that voted!
Interview with Mahesh Murthy, Senior Software Engineer at Signal
“It turns out this (time series data) is a really hard problem to solve. We started with MySQL, and when that failed us, we moved to MongoDB. The hard problem is doing the number crunching. We need this data at a granular level, need to query it lots of different ways, and need it in real time. TempoDB hits the nail on the head, it’s exactly what we wanted. The data is granular, we can query ad hoc. It just works.”
- Mahesh Murthy, Senior Software Engineer at Signal
In this interview, we learn more about Mahesh, and he provides insight into his experience integrating TempoDB into Signal’s infrastructure.
Estimating Percentiles on Streams of Data
Recently we added a new analysis feature- percentile rollups. This new rollup type allows you to make percentile queries on your time series data. It turns out, this is not an easy problem to solve for very large datasets. When you don’t know the size of your dataset at the time of computation, the streaming model must be used. What we need to know is how to compute percentiles on streams of data.