Ahhh, the annual resolutions list! A time-honored tradition when we all look at the new year as an opportunity for a fresh start.
Review of Statistics Done Wrong by Alex Reinhart
Since the advent of the big data era, organizations have been crying out for data scientists. Initially it was finding the true data scientist. But as these resources were considered scarce, it was a search for people who could hand-code analytical models in a Hadoop environment. As statistical tools such as R, Alteryx, and RapidMiner augmented Hadoop environments, we started to include traditional tools such as SAS and SPSS. These data scientists, or data scientists in training, were asked to take large amounts of data and divine the nuggets that would create a “cross-sell/up-sell” recommendation engine that would launch the next Netflix or link two disparate data sets that explain how markets interact and find the next groundbreaking investment opportunity.
The ability to execute in a low latency time frame is a core component of the concept of next generation data management architectures such as the Enterprise Management Associates Hybrid Data Ecosystem (HDE). One of the key business drivers of the HDE is speed of response, which stems from an organization’s drive to execute faster than their competitors to create an advantage or to be on par with those competitors to simply “keep up with the Joneses.” You see this in workloads such as cross-sell/up-sell opportunities for revenue generation. You see this in opportunities to limit costs with asset logistics and labor scheduling optimization. You see this in opportunities to limit exposure to risk in fraud management and liquidity risk assessment.
Enterprise Management Associates (EMA) has recognized that big data implementers and consumers rely on a variety of platforms to meet their big data requirements. These platforms include new data management technologies such as Hadoop, MongoDB, and Cassandra, but the collection also includes traditional SQL-based data management technologies supporting data warehouses and data marts; operational support systems such as customer relationship management (CRM) and enterprise resource planning (ERP); and cloud-based platforms. EMA refers to this collection of platforms as the Hybrid Data Ecosystem (HDE):
In skiing, the “black diamond” run or ski slope is often referred to as “high risk/high reward.” You receive lots of “reward” skiing the black diamond slopes, but you have a significant amount of “risk” associated with variable terrain, such as the presence of trees and the possibility of injury. However, the black diamond slopes are very fun to experience, and you can mitigate the risks with preparation, practice, and a really good skiing helmet.