Google’s new Android app makes it easy to save mobile data on the go

Google’s new Android app makes it easy to save mobile data on the go

The good folks at Android Police have spotted a new app from Google called Triangle; it lets you control which other apps can use your mobile data. It’s a handy little tool if you’re stuck on a crappy plan or are low on credits abroad.

Just fire up Triangle, grant it the necessary permissions (including one that allows it to set up a VPN), and you can then monitor data usage, and restrict apps from consuming mobile data – even if you’ve 2G/3G/4G data turned on.

That’s useful for times when you might forget that you’re on limited data, and happen to launch a data-hungry app just to kill time (Reddit clients have swallowed up my entire allowance a few times). It’s also great for preventing apps from consuming data in the background without your knowledge.

You can also grant apps access to mobile data for 10 minutes or 30 minutes from a prompt that pops up when you launch an app. And there are rewards to be earned by installing promoted apps, but that’s only available to a couple of carriers in the Philippines.

Speaking of which, Google is presently testing Triangle in the Philippines, and so it’s only available to users there from the Play store. It is, however, available to sideload from APKMirror and it works like a charm even if you’re not in that country.

Care to try it yourself? Grab the installer APK from this page, and check out more info on Google Play.

[“Source-thenextweb”]

How to Create Interactive JavaScript Charts from Custom Data Sets

Image result for DeCaffeinate Converts CoffeeScript To JavaScriptCharts are a great way of visualizing complex data quickly and effectively. Whether you want to identify a trend, highlight a relationship, or make a comparison, charts help you communicate with your audience in a precise and meaningful manner.

In my previous article — Getting Started with AnyChart: 10 Practical Examples — I introduced the AnyChart library and demonstrated how it is a great fit for your data visualization needs. Today, I want to dig a little deeper and look at AnyChart’s data mapping features which allow you to create beautiful charts from custom data sets with a minimum of fuss.

I also want to look at the many ways you can customize AnyChart to suit your requirements, as well as how you can change the look and feel of AnyChart charts by using themes. There are currently 17 out-of-the-box themes to choose from, or you can create your own. And if you’ve not got the best eye for design, why not buy our book to get a leg up.

As the Head of R&D at AnyChart, I could spend all day talking about this library, but now it’s time to get down to business.

Data Mapping in AnyChart

More from this author

  • Getting Started with AnyChart — 10 Practical Examples
  • Introducing GraphicsJS, a Powerful Lightweight Graphics Library

To facilitate the integration of custom data sources into charting applications, AnyChart has special objects called data sets. These objects act as intermediate containers for data. When data is stored in data sets, AnyChart can track changes to it, analyze it, and work with this data in a more robust and effective manner. In short: interactive JavaScript charts have never been easier!

No matter if you have an array of objects, an array of arrays, or a .csvfile, you can use data sets to:

  • ensure full and explicit control over the series created
  • define which column is an argument (x-axis)
  • define which columns hold values for which series
  • filter data
  • sort data

Basics of Data Mapping

The best way to learn how data mapping works in AnyChart is to look at an example. Let’s imagine an array with the following custom data set:

var rawData = [
  ["A", 5, 4, 5, 8, 1, "bad"],
  ["B", 7, 1, 7, 9, 2, "good"],
  ["C", 9, 3, 5, 4, 3, "normal"],
  ["D", 1, 4, 9, 2, 4, "bad"]
];

There’s nothing too wild going on here — this kind of custom data structure is common in a lot of existing applications. But now you want to use this array in AnyChart. With many other charting libraries you would be forced to transform the data to a format that the library can work with. Well, with AnyChart things are a lot simpler — just look what we can do. First, load the array into a data set:

var rawData = [
  ["A", 5, 4, 5, 8, 1, "bad"],
  ["B", 7, 1, 7, 9, 2, "good"],
  ["C", 9, 3, 5, 4, 3, "normal"],
  ["D", 1, 4, 9, 2, 4, "bad"]
];

var dataSet = anychart.data.set(rawData);

And then, once the data has been loaded into the data set, the real magic begins: you can now create so called views. These are data sets derived from other data sets.

var rawData = [
  ["A", 5, 4, 5, 8, 1, "bad"],
  ["B", 7, 1, 7, 9, 2, "good"],
  ["C", 9, 3, 5, 4, 3, "normal"],
  ["D", 1, 4, 9, 2, 4, "bad"]
];

var dataSet = anychart.data.set(rawData);

var view1 = dataSet.mapAs({x: 0, value: 1});
var view2 = dataSet.mapAs({x: 0, value: 2});
var view3 = dataSet.mapAs({x: 0, high: 3, low: 4});
var view4 = dataSet.mapAs({x: 0, value: 5, meta: 6});

You’ll notice that when defining a view, you determine which columns from the original array are included and what names these columns get. You can then use them to create whichever kind of charts you like. For example, here’s how to create a pie chart from the custom data in the 5th column.

Note: AnyChart needs only x and value fields to create a pie chart, but the views also contain a meta field with the data from the 6th column. You can map any number of optional fields and use them as you like. For example, these fields can contain additional data to be shown as labels or as tooltips:

anychart.onDocumentLoad(function() {
  var rawData = [
    ["A", 5, 4, 5, 8, 3, "Bad"],
    ["B", 7, 1, 7, 9, 5, "Good"],
    ["C", 9, 3, 5, 4, 4, "Normal"],
    ["D", 1, 4, 9, 2, 3, "Bad"]
  ];

  var dataSet = anychart.data.set(rawData);
  var view4 = dataSet.mapAs({x: 0, value: 5, meta: 6});

  // create chart
  var chart = anychart.pie(view4);
  chart.title("AnyChart: Pie Chart from Custom Data Set");
  chart.labels().format("{%meta}: {%Value}");
  chart.container("container").draw();
});

And this is what we end up with:

Note: You can find all of the demos in this article as a CodePen collection.

Multi-Series Combination Chart with Custom Data Set

Now, let’s see how we can use the same custom data to create a combination chart with line and range area charts on the same plot. This section is going to be very short since now you know what views are. All you need to do is choose the proper views and create the necessary series explicitly:

anychart.onDocumentLoad(function() {
  var rawData = [
    ["A", 5, 4, 5, 8, 3, "Bad"],
    ["B", 7, 1, 7, 9, 5, "Good"],
    ["C", 9, 3, 5, 4, 4, "Normal"],
    ["D", 1, 4, 9, 2, 3, "Bad"]
  ];

  var dataSet = anychart.data.set(rawData);

  var view1 = dataSet.mapAs({x: 0, value: 1});
  var view2 = dataSet.mapAs({x: 0, value: 2});
  var view3 = dataSet.mapAs({x: 0, high: 3, low: 4});

  // create chart
  var chart = anychart.line();
  // create two line series
  chart.line(view1).name("EUR");
  chart.line(view2).name("USD");
  // create range area series
  chart.line(view2).name("Trend");

  // set title and draw chart
  chart.title("AnyChart: Combined Chart from Data Set");
  chart.container("container").draw();
});

This is what it looks like:

Live Data Streaming and Filtering

And now, to showcase the beauty of views and data sets. For this we’ll create a column chart and a multi-line chart, live stream data into a data set, and accept only certain values into the column chart. Sound complicated? It isn’t really!

anychart.onDocumentLoad(function() {
  var rawData = [
    ["A", 5, 4, 2, 6, 3, "Bad"],
    ["B", 7, 2, 1, 9, 5, "Good"],
    ["C", 8, 3, 2, 9, 4, "Normal"],
    ["D", 1, 4, 1, 4, 3, "Bad"]
  ];

  dataSet = anychart.data.set(rawData);

  var view1 = dataSet.mapAs({ x: 0, value: 1 });
  var view2 = dataSet.mapAs({ x: 0, value: 2 });
  var view3 = dataSet.mapAs({ x: 0, value: 3 });
  var view4 = dataSet.mapAs({ x: 0, value: 4 });
  var view5 = dataSet.mapAs({ x: 0, value: 5 });

  // create chart
  var chart1 = anychart.line();
  // create several line series
  chart1.line(view1).name("EUR");
  chart1.line(view2).name("USD");
  chart1.line(view3).name("YEN");
  chart1.line(view4).name("CNY");

  // create column chart
  // based on filtered view
  // that accepts values of less than 5 only
  var chart2 = anychart.column(
    view5.filter("value", function(v) {
      return v < 5;
    })
  );

  // set title and draw multi-line chart
  chart1.title("Line: Streaming from Data Set");
  chart1.legend(true);
  chart1.container("lineContainer").draw();

  // set title and draw column chart
  chart2.title("Column: Filtering Stream");
  chart2.container("columnContainer").draw();
});

// streaming function
var streamId;
function stream() {
  if (streamId === undefined) {
    streamId = setInterval(function() {
      addValue();
    }, 1000);
  } else {
    clearInterval(streamId);
    streamId = undefined;
  }
}

// function to add new value and remove first one
function addValue() {
  // generate next letter/symbol as argument
  var x = String.fromCharCode(
    dataSet.row(dataSet.getRowsCount() - 1)[0].charCodeAt(0) + 1
  );
  // append row of random values to data set
  dataSet.append([
    x,
    Math.random() * 10,
    Math.random() * 10,
    Math.random() * 10,
    Math.random() * 10,
    Math.random() * 10
  ]);
  // remove first row
  dataSet.remove(0);
}

You see that we’ve created a custom data set and five derived views. Four of them are used as is to create lines, but when creating our column chart, we apply the filtering function and let only values of less than 5 into the view. Then we stream data by adding and removing rows from the main data set, and all the views automatically receive data and charts are updated — no coding is needed to implement this!

This is just the tip of the iceberg as far as data mapping goes! You can do the same with arrays of objects and CSV data, and you can sort, search, listen to changes, go through values, as well as change existing rows. There are also special views for hierarchical data utilized in Tree Maps and Gantt Charts, and such views can be searched and traversed.

Customizing Chart Visualization

AnyChart is extremely versatile when it comes to chart customization. You can change line styles, fill colors, use gradient fill or image fill on almost any element. Colors can be set using String constants, HEX or RGB notation, or a function that returns one of these values. The library also offers a number of built-in patterns, as well as an option to create patterns of your own.

Themes

The easiest way to change the look and feel of AnyChart charts is to change the theme.

In fact, you can create your own theme or use one of those that already come with the library. There are currently 17 out-of-the-box themes available in AnyChart: Coffee, Dark Blue, Dark Earth, Dark Glamour, Dark Provence, Dark Turquoise, Default Theme, Light Blue, Light Earth, Light Glamour, Light Provence, Light Turquoise, Monochrome, Morning, Pastel, Sea, Wines. The files for these themes can be obtained from the Themes Section at AnyChart CDN.

You can reference them like so:

<script src="https://cdn.anychart.com/js/latest/anychart-bundle.min.js"></script>
<script src="https://cdn.anychart.com/themes/latest/coffee.min.js"></script>

And activate the theme with just one line:

anychart.theme(anychart.themes.coffee);

Let’s use the basic chart from the previous article by way of an example.

anychart.onDocumentLoad(function() {
  // set theme referenced in scripts section from
  // https://cdn.anychart.com/themes/latest/coffee.min.js
  anychart.theme(anychart.themes.coffee);
  // create chart and set data
  var chart = anychart.column([
    ["Winter", 2],
    ["Spring", 7],
    ["Summer", 6],
    ["Fall", 10]
  ]);
  // set chart title
  chart.title("AnyChart Coffee Theme");
  // set chart container and draw
  chart.container("container").draw();
});

Now the look and feel of that chart is completely different.

Note: you can browse through all the themes and see how they work with different chart types and series on the AnyChart Themes Demo Page. Please refer to the AnyChart Documentation to learn how to create your own themes and read about other options.

Coloring

As you have seen, it’s quite trivial to color elements in the AnyChart charting library. But there’s more! It is also possible to color shapes with solid colors with opacity, use radial and linear gradients, and make lines dashed. You can apply colors by their web constant names, HEX codes, or RGB, RGBA, HSL, HSLA values — just as you can in CSS.

To showcase all of this in one powerful sample, I want to highlight another option along the way — namely that you can color elements using your own custom functions. This may come in handy in many data visualization situations, for example when you want to set a custom color depending on the value of the element. Let’s test this out on a Pareto chart. AnyChart can build this chart from raw data and calculate everything it needs.

anychart.onDocumentReady(function() {

  // create Pareto chart
  var chart = anychart.pareto([
    {x: "Defect 1", value: 19},
    {x: "Defect 2", value: 9},
    {x: "Defect 3", value: 28},
    {x: "Defect 4", value: 87},
    {x: "Defect 5", value: 14},
  ]);

  // set chart title
  chart.title("Pareto Chart: Conditional coloring");

  // set container id and draw
  chart.container("container").draw();
});

But say we want to highlight the elements that have relative frequency of less than 10%. All we need to do in this case is to add some coloring functions:

// Get Pareto column series
// and configure fill and stroke
var column = chart.getSeriesAt(0);
column.fill(function () {
  if (this.rf < 10) {
    return '#E24B26 0.5'
  } else {
    return this.sourceColor;
  }
});
column.stroke(function () {
  if (this.rf < 10) {
    return {color: anychart.color.darken('#E24B26'), dash:"5 5"};
  } else {
    return this.sourceColor;
  }
});

As is shown in the example, we can access the coloring function context with the help of the this keyword, then we create a condition and output whichever color or line setting we need. We use a simple HEX string with opacity for the color: '#E24B26 0.5'. For the line, we calculate a darker color using AnyChart’s special color transformation function and also set an appropriate dash parameter: {color: anychart.color.darken('#E24B26'), dash:"5 5"}.

Here’s what we end up with:

Default Pattern Fill

The AnyChart JavaScript charting library also provides a very flexible way to work with pattern (hatch) fills. To start with, you have 32 pattern fill types. They can be arranged into your own custom pallets, and you can directly set which one to use for a certain series, element, etc. Here’s what the code for a monochrome pie chart might look like:

anychart.onDocumentReady(function() {
  // create a Pie chart and set data
  chart = anychart.pie([
    ["Apple", 2],
    ["Banana", 2],
    ["Orange", 2],
    ["Grape", 2],
    ["Pineapple", 2],
    ["Strawberry", 2],
    ["Pear", 2],
    ["Peach", 2]
  ]);

  // configure chart using chaining calls to shorten sample
  chart.hatchFill(true).fill("white").stroke("Black");
  chart
    .labels()
    .background()
    .enabled(true)
    .fill("Black 1")
    .cornerType("Round")
    .corners("10%");

  // draw a chart
  chart.container("container").draw();
});

The only thing we did to enable the pattern fill is chart.hatchFill(true).

Custom Pattern Fill

AnyChart gets its ability to use custom patterns from the underlying GraphicsJS library. I introduced this library and showcased some really cool drawing samples in my article here on SitePoint: Introducing GraphicsJS, a Powerful Lightweight Graphics Library. The fact that AnyChart is powered by GraphicsJS allows for some really amazing results, but a picture is worth a thousand words, so let’s proceed to the next example.

We’ll create patterns using one old font that used to be rather popular among geologists – Interdex. I remember one of AnyChart customers asking us how to use this in his project, and he was very happy to learn just how easy it was.

First, if you want to use a font that is not present on your system, you need to create a web font and properly reference it in the CSS file. A sample of such file for our Interdex font can be found on the AnyChart CDN. We need to reference it along with the font management library being used and the AnyChart charting library:

<link rel="stylesheet" type="text/css" href="https://cdn.anychart.com/fonts/interdex/interdex.css"/>

<script src="https://cdn.anychart.com/js/latest/anychart-bundle.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fontfaceobserver/2.0.7/fontfaceobserver.js"></script>

After that, we can proceed with the coding:

var fontLoad = new FontFaceObserver("Conv_interdex");

// create chart when font is loaded using
// https://cdnjs.cloudflare.com/ajax/libs/fontfaceobserver/2.0.7/fontfaceobserver.js
fontLoad.load().then(function() {
  anychart.onDocumentReady(function() {
    // enable hatch fill by default and make chart monochrome
    anychart.theme({
      chart: {
        defaultSeriesSettings: {
          base: {
            hatchFill: true,
            fill: "White",
            hoverFill: "White",
            stroke: "Black"
          }
        }
      }
    });

    // create a stage, it is needed to create a custom pattern fill
    stage = anychart.graphics.create("container");

    // create a column chart and set title
    chart = anychart.column();
    chart.title("AnyChart Custom Pattern");
    // set the data
    chart.data([
      ["Jan", 1000, 1200, 1500, 1000, 1200, 1500],
      ["Feb", 1200, 1500, 1600, 1200, 1500, 1600],
      ["Mar", 1800, 1600, 1700, 1800, 1600, 1700],
      ["Apr", 1100, 1300, 1600, 1100, 1300, 1600],
      ["May", 1900, 1900, 1500, 1900, 1900, 1500]
    ]);

    // set custom hatch palette
    // it can be populated with any number of patterns
    chart.hatchFillPalette([
      getPattern("A"),
      getPattern("N"),
      getPattern("Y"),
      getPattern("C")
    ]);

    // set container and draw chart
    chart.container(stage).draw();
  });
});

// function to create patterns
function getPattern(letter) {
  var size = 40;
  // create a text object
  var text = anychart.graphics
    .text()
    .htmlText(
      "<span " +
        "style='font-family:Conv_interdex;font-size:" +
        size +
        ";'>" +
        letter +
        "</span>"
    );
  // create a pattern object
  var pattern = stage.pattern(text.getBounds());
  //  add text to a pattern
  pattern.addChild(text);
  return pattern;
}

I’ve included all the explanations in the comments, but just to sum up – you can create custom patterns, arrange them in custom pattern palettes, which you can then apply to an entire chart. If you’re paying attention, you’ll notice that we’ve made use of the AnyChart themes mechanism to set defaults.

Here’s the end result. Lovely, I’m sure you’ll agree…

Custom Series

And the final sample in this article is for those who want even more flexibility and customization. While AnyChart strives to provide an ever-increasing number of chart types out of the box, data visualization is an enormous field, and each project has different requirements. In order to make (and keep) everyone as happy as possible and provide a way to create the visualizations of their choice, AnyChart open-sourced the JavaScript drawing library of GraphicsJS and also opened the source of the AnyChart JavaScript charting library itself.

But if you really, really want to go with 100% custom drawing, you’ll have to do a lot of work if you decide to fork the source code. For example, customizing might be a pain, and you could potentially have problems merging future versions if AnyChart doesn’t accept your pull request.

Luckily, there is a third option. With most of AnyChart’s basic chart types, you can override the rendering function and change the appearance of chart elements, while keeping all of the other things AnyChart provides.

Below you will find a code sample that shows how to transform a widely used range column chart (which is supported in the AnyChart charting library), into a less common cherry chart (which isn’t). The main thing to bare in mind when you override AnyChart’s rendering function, is that this function calculates everything from values supplied to a series, and your own (custom) series should be created out of a series with the same number of values.

anychart.onDocumentReady(function() {
  // create a chart
  var chart = anychart.cartesian();
  chart.xAxis().title("Month");
  chart.yAxis().title("Cherry price");

  // create a data set
  var data = [
    { x: "Apr", low: 29, high: 37 },
    { x: "May" },
    { x: "Jun", low: 29, high: 47 },
    { x: "Jul", low: 12, high: 27 },
    { x: "Aug", low: 20, high: 33, color: "#ff0000" },
    { x: "Sep", low: 35, high: 44 },
    { x: "Oct", low: 20, high: 31 },
    { x: "Nov", low: 44, high: 51 }
  ];

  // create a range column series
  var series = chart.rangeColumn(data);
  // set a meta field to use as a cherry size
  // series.meta("cherry", 50);
  // set a meta field to use as a stem thickness
  // series.meta("stem", 1);

  // optional: configurable select fill
  series.selectFill("white");

  // call a custom function that changes series rendering
  cherryChartRendering(series);

  // set container id for the chart and initiate chart drawing
  chart.container("container").draw();
});

// custom function to change range column series rendering to
// cherry chart with a special value line markers
function cherryChartRendering(series) {
  // cherry fill color
  series.fill(function() {
    // check if color is set for the point and use it or series color
    color = this.iterator.get("color") || this.sourceColor;
    return anychart.color.lighten(color, 0.25);
  });
  // cherry stroke color
  series.stroke(function() {
    // check if color is set for the point and use it or series color
    color = this.iterator.get("color") || this.sourceColor;
    return anychart.color.darken(color, 0.1);
  });

  // set rendering settings
  series.rendering()// set point function to drawing
  .point(drawer);
}

// custom drawer function to draw a cherry chart
function drawer() {
  // if value is missing - skip drawing
  if (this.missing) return;

  // get cherry size or set default
  var cherry = this.series.meta("cherry") || this.categoryWidth / 15;
  // get stem thickness or set default
  var stem = this.series.meta("stem") || this.categoryWidth / 50;

  // get shapes group
  var shapes = this.shapes || this.getShapesGroup(this.pointState);
  // calculate the left value of the x-axis
  var leftX = this.x - stem / 2;
  // calculate the right value of the x-axis
  var rightX = leftX + stem / 2;

  shapes["path"]
    // resets all 'path' operations
    .clear()
    // draw bulb
    .moveTo(leftX, this.low - cherry)
    .lineTo(leftX, this.high)
    .lineTo(rightX, this.high)
    .lineTo(rightX, this.low - cherry)
    .arcToByEndPoint(leftX, this.low - cherry, cherry, cherry, true, true)
    // close by connecting the last point with the first
    .close();
}

Here’s what we end up with

Conclusion

I hope you’ve found this look at AnyChart’s more advanced functionality useful and that this has given you plenty ideas and inspiration for your own data vizualizations.

If you’ve not tried AnyChart yet, I urge you to give it a go. Our latest release (version 7.14.0) added eight new chart types, five new technical indicators, Google Spreadsheet data loader, marquee select and zoom, text wrap and other cool new features.

And if you’re using AnyChart in your projects and have any comments or questions regarding the library, I’d love to hear these in the comments below.

[“Source-ndtv”]

Complex 3-D data on all devices

Complex 3-D data on all devices

A new web-based software platform is swiftly bringing the visualization of 3-D data to every device, optimizing the use of, for example, virtual reality and augmented reality in industry. In this way, Fraunhofer researchers have brought the ideal of “any data on any device” a good deal closer.

If you want to be sure that the person you are sending documents and pictures to will be able to open them on their computer, then you send them in PDF and JPG format. But what do you do with 3-D content? “A standardized option hasn’t existed before now,” says Dr. Johannes Behr, head of the Visual Computing System Technologies department at the Fraunhofer Institute for Computer Graphics Research IGD. In particular, industry lacks a means of taking the very large, increasingly complex volumes of 3-D data that arise and rendering them useful – and of being able to use the data on every device, from smartphones to VR goggles. “The data volume is growing faster than the means of visualizing it,” reports Behr. Fraunhofer IGD is presenting a solution to this problem in the form of its “instant3DHub” software, which allows engineers, technicians and assemblers to use spatial design and assembly plans without any difficulty on their own devices. “This will enable them to inspect industrial plants or digital buildings, etc. in real time and find out what’s going on there,” explains Behr.

Software calculates only visible components

On account of the gigantic volumes of data that have to be processed, such an undertaking has thus far been either impossible or possible only with a tremendous amount of effort. After all, users had to manually choose in advance which data should be processed for the visualization, a task then executed by expensive special software. Not exactly a cost-effective method, and a time-consuming one as well. With the web-based Fraunhofer solution, every company can adapt the visualization tool to its requirements. The software autonomously selects the data to be prepared, by intelligently calculating, for example, that only views of visible parts are transmitted to the user’s device. Citing the example of a power plant, Behr explains: “Out of some 3.5 million components, only the approximately 3,000 visible parts are calculated on the server and transmitted to the device.”

Such visibility calculations are especially useful for VR and AR applications, as the objects being viewed at any given moment appear in the display in real time. At CeBIT, researchers will be showing how well this works, using the example of car maintenance. In a VR application, it is necessary to load up to 120 images per second onto data goggles. In this way, several thousand points of 3-D data can be transmitted from a central database for a vehicle model to a device in just one second. The process is so fast because the complete data does not have to be loaded to the device, as used to be the case, but is streamed over the web. A huge variety of 3-D web applications are delivered on the fly, without permanent storage, so that even mobile devices such as tablets and smartphones can make optimal use of them. One key feature of this process is that for every access to instant3DHub, the data is assigned to, prepared and visualized for the specific applications. “As a result, the system fulfills user- and device-specific requirements, and above all is secure,” says Behr. BMW, Daimler and Porsche already use instant3DHub at over 1,000 workstations. Even medium-sized companies such as SimScale and thinkproject have successfully implemented “instantreality” and instant3Dhub and are developing their own individual software solutions on that basis.

Augmented reality is a key technology for Industrie 4.0

Technologies that create a link between CAD data and the real production environment are also relevant for the domain of augmented reality. “Augmented reality is a key technology for Industrie 4.0, because it constantly compares the digital target situation in real time against the actual situation as captured by cameras and sensors,” adds Dr. Ulrich Bockholt, head of the Virtual and Augmented Reality department at Fraunhofer IGD. Ultimately, however, the solution is of interest to many sectors, he explains, even in the construction and architecture field, where it can be used to help visualize building information models on smartphones, tablet computers or data goggles.

 

[Source:- Phys.org]

 

 

Upcoming Windows 10 update reduces spying, but Microsoft is still mum on which data it specifically collects

Privacy-2-1024x812

There’s some good news for privacy-minded individuals who haven’t been fond of Microsoft’s data collection policy with Windows 10. When the upcoming Creators Update drops this spring, it will overhaul Microsoft’s data collection policies. Terry Myerson, executive vice president of Microsoft’s Windows and Devices Group, has published a blog post with a list of the changes Microsoft will be making.

First, Microsoft has launched a new web-based privacy dashboard with the goal of giving people an easy, one-stop location for controlling how much data Microsoft collects. Your privacy dashboard has sections for Browse, Search, Location, and Cortana’s Notebook, each covering a different category of data MS might have received from your hardware. Personally, I keep the Digital Assistant side of Cortana permanently deactivated and already set telemetry to minimal, but if you haven’t taken those steps you can adjust how much data Microsoft keeps from this page.

Second, Microsoft is condensing its telemetry options. Currently, there are four options — Security, Basic, Enhanced, and Full. Most consumers only have access to three of these settings — Basic, Enhanced, and Full. The fourth, security, is reserved for Windows 10 Enterprise or Windows 10 Education. Here’s how Microsoft describes each category:

Security: Information that’s required to help keep Windows, Windows Server, and System Center secure, including data about the Connected User Experience and Telemetry component settings, the Malicious Software Removal Tool, and Windows Defender.

Basic: Basic device info, including: quality-related data, app compatibility, app usage data, and data from the Security level.

Enhanced: Additional insights, including: how Windows, Windows Server, System Center, and apps are used, how they perform, advanced reliability data, and data from both the Basic and the Security levels.

Full: All data necessary to identify and help to fix problems, plus data from the Security, Basic, and Enhanced levels.

That’s the old system. Going forward, Microsoft is collapsing the number of telemetry levels to two. Here’s how Myerson describes the new “Basic” level:

[We’ve] further reduced the data collected at the Basic level. This includes data that is vital to the operation of Windows. We use this data to help keep Windows and apps secure, up-to-date, and running properly when you let Microsoft know the capabilities of your device, what is installed, and whether Windows is operating correctly. This option also includes basic error reporting back to Microsoft.

Windows 10 will also include an enhanced privacy section that will show during start-up and offer much better granularity over privacy settings. Currently, many of these controls are buried in various menus that you have to manually configure after installing the operating system.

It’s nice that Microsoft is cutting back on telemetry collection at the basic level. The problem is, as Stephen J Vaughn-Nichols writes, Microsoft is still collecting a creepy amount of information on “Full,” and it still defaults to sharing all this information with Cortana — which means Microsoft has data files on people it can be compelled to turn over by a warrant from an organization like the NSA or FBI. Given the recent expansion of the NSA’s powers, this information can now be shared with a variety of other agencies without filtering it first. And while Microsoft’s business model doesn’t directly depend on scraping and selling customer data the way Google does, the company is still gathering an unspecified amount of information. Full telemetry, for example, may “unintentionally include parts of a document you were using when a problem occurred.” Vaughn-Nichols isn’t thrilled about that idea, and neither am I.

The problem with Microsoft’s disclosure is it mostly doesn’t disclose. Even basic telemetry is described as “includes data that is vital to the operation of Windows.” Okay. But what does that mean?

I’m glad to see Microsoft taking steps towards restoring user privacy, but these are small steps that only modify policies around the edges. Until the company actually and meaningfully discloses what telemetry is collected under Basic settings and precisely what Full settings do and don’t send in the way of personally identifying information, the company isn’t explaining anything so much as it’s using vague terms and PR in place of a disclosure policy.

As I noted above, I’d recommend turning Cortana (the assistant) off. If you don’t want to do that, you should regularly review the information MS has collected about you and delete any items you don’t want to part of the company’s permanent record.

 

 

[Source:- Extremetech]

Attackers start wiping data from CouchDB and Hadoop databases

Data-wiping attacks have hit exposed Hadoop and CouchDB databases.

It was only a matter of time until ransomware groups that wiped data from thousands of MongoDB databases and Elasticsearch clusters started targeting other data storage technologies. Researchers are now observing similar destructive attacks hitting openly accessible Hadoop and CouchDB deployments.

Security researchers Victor Gevers and Niall Merrigan, who monitored the MongoDB and Elasticsearch attacks so far, have also started keeping track of the new Hadoop and CouchDB victims. The two have put together spreadsheets on Google Docs where they document the different attack signatures and messages left behind after data gets wiped from databases.

In the case of Hadoop, a framework used for distributed storage and processing of large data sets, the attacks observed so far can be described as vandalism.

That’s because the attackers don’t ask for payments to be made in exchange for returning the deleted data. Instead, their message instructs the Hadoop administrators to secure their deployments in the future.

According to Merrigan’s latest count, 126 Hadoop instances have been wiped so far. The number of victims is likely to increase because there are thousands of Hadoop deployments accessible from the internet — although it’s hard to say how many are vulnerable.

The attacks against MongoDB and Elasticsearch followed a similar pattern. The number of MongoDB victims jumped from hundreds to thousands in a matter of hours and to tens of thousands within a week. The latest count puts the number of wiped MongoDB databases at more than 34,000 and that of deleted Elasticsearch clusters at more than 4,600.

A group called Kraken0, responsible for most of the ransomware attacks against databases, is trying to sell its attack toolkit and a list of vulnerable MongoDB and Elasticsearch installations for the equivalent of US$500 in bitcoins.

The number of wiped CouchDB databases is also growing rapidly, reaching more than 400 so far. CouchDB is a NoSQL-style database platform similar to MongoDB.

Unlike the Hadoop vandalism, the CouchDB attacks are accompanied by ransom messages, with attackers asking for 0.1 bitcoins (around $100) to return the data. Victims are advised against paying because, in many of the MongoDB attacks, there was no evidence that attackers had actually copied the data before deleting it.

Researchers from Fidelis Cybersecurity have also observed the Hadoop attacks and have published a blog post with more details and recommendations on securing such deployments.

The destructive attacks against online database storage systems are not likely to stop soon because there are other technologies that have not yet been targeted and that might be similarly misconfigured and left unprotected on the internet by users.

 

 

[Source:- JW]

Apache Beam unifies batch and streaming for big data

Apache Beam unifies batch and streaming for big data

Apache Beam, a unified programming model for both batch and streaming data, has graduated from the Apache Incubator to become a top-level Apache project.

Aside from becoming another full-fledged widget in the ever-expanding Apache tool belt of big-data processing software, Beam addresses ease of use and dev-friendly abstraction, rather than simply offering raw speed or a wider array of included processing algorithms.

Beam us up!

Beam provides a single programming model for creating batch and stream processing jobs (the name is a hybrid of “batch” and “stream”), and it offers a layer of abstraction for dispatching to various engines used to run the jobs. The project originated at Google, where it’s currently a service called GCD (Google Cloud Dataflow). Beam uses the same API as GCD, and it can use GCD as an execution engine, along with Apache Spark, Apache Flink (a stream processing engine with a highly memory-efficient design), and now Apache Apex (another stream engine for working closely with Hadoop deployments).

The Beam model involves five components: the pipeline (the pathway for data through the program); the “PCollections,” or data streams themselves; the transforms, for processing data; the sources and sinks, where data is fetched and eventually sent; and the “runners,” or components that allow the whole thing to be executed on an engine.

Apache says it separated concerns in this fashion so that Beam can “easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing.” This is in line with reworking tools like Apache Spark to support stream and batch processing within the same product and with similar programming models. In theory, it’s one fewer concept for prospective developers to wrap their head around, but that presumes Beam is used in lieu of Spark or other frameworks, when it’s more likely it’ll be used — at first — to augment them.

Hands off

One possible drawback to Beam’s approach is that while the layers of abstraction in the product make operations easier, they also put the developer at a distance from the underlying layers. A good case in point: Beam’s current level of integration with Apache Spark; the Spark runner doesn’t yet use Spark’s more recent DataFrames system, and thus may not take advantage of the optimizations those can provide. But this isn’t a conceptual flaw, it’s an issue with the implementation, which can be addressed in time.

The big payoff of using Beam, as noted by Ian Pointer in his discussion of Beam in early 2016, is that it makes migrations between processing systems less of a headache. Likewise, Apache says Beam “cleanly [separates] the user’s processing logic from details of the underlying engine.”

Separation of concern and ease of migration will be good to have if the ongoing rivalries, and competitions between the various big data processing engines continues. Granted, Apache Spark has emerged as one of the undisputed champs of the field and become a de facto standard choice. But there’s always room for improvement or an entirely new streaming or processing paradigm. Beam is less about offering a specific alternative than about providing developers and data-wranglers with more breadth of choice between them.

 

 

[Source:- Javaworld]

Snowflake now offers data warehousing to the masses

Snowflake now offers data warehousing to the masses

Snowflake, the cloud-based data warehouse solution co-founded by Microsoft alumnus Bob Muglia, is lowering storage prices and adding a self-service option, meaning prospective customers can open an account with nothing more than a credit card.

These changes also raise an intriguing question: How long can a service like Snowflake expect to reside on Amazon, which itself offers services that are more or less in direct competition — and where the raw cost of storage undercuts Snowflake’s own pricing for same?

Open to the public

The self-service option, called Snowflake On Demand, is a change from Snowflake’s original sales model. Rather than calling a sales representative to set up an account, Snowflake users can now provision services themselves with no more effort than would be needed to spin up an AWS EC2 instance.

In a phone interview, Muglia discussed how the reason for only just now transitioning to this model was more technical than anything else. Before self-service could be offered, Snowflake had to put protections into place to ensure that both the service itself and its customers could be protected from everything from malice (denial-of-service attacks) to incompetence (honest customers submitting massively malformed queries).

“We wanted to make sure we had appropriately protected the system,” Muglia said, “before we opened it up to anyone, anywhere.”

This effort was further complicated by Snowflake’s relative lack of hard usage limits, which Muglia characterized as being one of its major standout features. “There is no limit to the number of tables you can create,” Muglia said, but he further pointed out that Snowflake has to strike a balance between what it can offer any one customer and protecting the integrity of the service as a whole.

“We get some crazy SQL queries coming in our direction,” Muglia said, “and regardless of what comes in, we need to continue to perform appropriately for that customer as well as other customers. We see SQL queries that are a megabyte in size — the query statements [themselves] are a megabyte in size.” (Many such queries are poorly formed, auto-generated SQL, Muglia claimed.)

Fewer costs, more competition

The other major change is a reduction in storage pricing for the service — $30/TB/month for capacity storage, $50/TB/month for on-demand storage, and uncompressed storage at $10/TB/month.

It’s enough of a reduction in price that Snowflake will be unable to rely on storage costs as a revenue source, since those prices barely pay for the use of Amazon’s services as a storage provider. But Muglia is confident Snowflake is profitable enough overall that such a move won’t impact the company’s bottom line.

“We did the data modeling on this,” said Muglia, “and our margins were always lower on storage than on compute running queries.”

According to the studies Snowflake performed, “when customers put more data into Snowflake, they run more queries…. In almost every scenario you can imagine, they were very much revenue-positive and gross-margin neutral, because people run more queries.”

The long-term implications for Snowflake continuing to reside on Amazon aren’t clear yet, especially since Amazon might well be able to undercut Snowflake by directly offering competitive services.

Muglia, though, is confident that Snowflake’s offering is singular enough to stave off competition for a good long time, and is ready to change things up if need be. “We always look into the possibility of moving to other cloud infrastructures,” Muglia said, “although we don’t have plans to do it right now.”

He also noted that Snowflake competes with Amazon and Redshift right now, but “we have a very different shape of product relative to Redshift…. Snowflake is storing multiple petabytes of data and is able to run hundreds of simultaneous concurrent queries. Redshift can’t do that; no other product can do that. It’s that differentiation that allows to effective compete with Amazon, and for that matter Google and Microsoft and Oracle and Teradata.”

 

 

[Source:- IW]

Fire up big data processing with Apache Ignite

Fire up big data processing with Apache Ignite

Apache Ignite is an in-memory computing platform that can be inserted seamlessly between a user’s application layer and data layer. Apache Ignite loads data from the existing disk-based storage layer into RAM, improving performance by as much as six orders of magnitude (1 million-fold).

The in-memory data capacity can be easily scaled to handle petabytes of data simply by adding more nodes to the cluster. Further, both ACID transactions and SQL queries are supported. Ignite delivers performance, scale, and comprehensive capabilities far above and beyond what traditional in-memory databases, in-memory data grids, and other in-memory-based point solutions can offer by themselves.

Apache Ignite does not require users to rip and replace their existing databases. It works with RDBMS, NoSQL, and Hadoop data stores. Apache Ignite enables high-performance transactions, real-time streaming, and fast analytics in a single, comprehensive data access and processing layer. It uses a distributed, massively parallel architecture on affordable, commodity hardware to power existing or new applications. Apache Ignite can run on premises, on cloud platforms such as AWS and Microsoft Azure, or in a hybrid environment.

apache ignite architecture

The Apache Ignite unified API supports SQL, C++, .Net, Java, Scala, Groovy, PHP, and Node.js. The unified API connects cloud-scale applications with multiple data stores containing structured, semistructured, and unstructured data. It offers a high-performance data environment that allows companies to process full ACID transactions and generate valuable insights from real-time, interactive, and batch queries.

Users can keep their existing RDBMS in place and deploy Apache Ignite as a layer between it and the application layer. Apache Ignite automatically integrates with Oracle, MySQL, Postgres, DB2, Microsoft SQL Server, and other RDBMSes. The system automatically generates the application domain model based on the schema definition of the underlying database, then loads the data. In-memory databases typically provide only a SQL interface, whereas Ignite supports a wider group of access and processing paradigms in addition to ANSI SQL. Apache Ignite supports key/value stores, SQL access, MapReduce, HPC/MPP processing, streaming/CEP processing, clustering, and Hadoop acceleration in a single integrated in-memory computing platform.

GridGain Systems donated the original code for Apache Ignite to the Apache Software Foundation in the second half of 2014. Apache Ignite was rapidly promoted from an incubating project to a top-level Apache project in 2015. In the second quarter of 2016, Apache Ignite was downloaded nearly 200,000 times. It is used by organizations around the world.

Architecture

Apache Ignite is JVM-based distributed middleware based on a homogeneous cluster topology implementation that does not require separate server and client nodes. All nodes in an Ignite cluster are equal, and they can play any logical role per runtime application requirement.

A service provider interface (SPI) design is at the core of Apache Ignite. The SPI-based design makes every internal component of Ignite fully customizable and pluggable. This enables tremendous configurability of the system, with adaptability to any existing or future server infrastructure.

Apache Ignite also provides direct support for parallelization of distributed computations based on fork-join, MapReduce, or MPP-style processing. Ignite uses distributed parallel computations extensively, and they are fully exposed at the API level for user-defined functionality.

Key features

In-memory data grid. Apache Ignite includes an in-memory data grid that handles distributed in-memory data management, including ACID transactions, failover, advanced load balancing, and extensive SQL support. The Ignite data grid is a distributed, object-based, ACID transactional, in-memory key-value store. In contrast to traditional database management systems, which utilize disk as their primary storage mechanism, Ignite stores data in memory. By utilizing memory rather than disk, Apache Ignite is up to 1 million times faster than traditional databases.

apache ignite data grid

SQL support. Apache Ignite supports free-form ANSI SQL-99 compliant queries with virtually no limitations. Ignite can use any SQL function, aggregation, or grouping, and it supports distributed, noncolocated SQL joins and cross-cache joins. Ignite also supports the concept of field queries to help minimize network and serialization overhead.

In-memory compute grid. Apache Ignite includes a compute grid that enables parallel, in-memory processing of CPU-intensive or other resource-intensive tasks such as traditional HPC, MPP, fork-join, and MapReduce processing. Support is also provided for standard Java ExecutorService asynchronous processing.

apache ignite compute grid

In-memory service grid. The Apache Ignite service grid provides complete control over services deployed on the cluster. Users can control how many service instances should be deployed on each cluster node, ensuring proper deployment and fault tolerance. The service grid guarantees continuous availability of all deployed services in case of node failures. It also supports automatic deployment of multiple instances of a service, of a service as a singleton, and of services on node startup.

In-memory streaming. In-memory stream processing addresses a large family of applications for which traditional processing methods and disk-based storage, such as disk-based databases or file systems, are inadequate. These applications are extending the limits of traditional data processing infrastructures.

apache ignite streaming

Streaming support allows users to query rolling windows of incoming data. This enables users to answer questions such as “What are the 10 most popular products over the last hour?” or “What is the average price in a certain product category for the past 12 hours?”

Another common stream processing use case is pipelining a distributed events workflow. As events are coming into the system at high rates, the processing of events is split into multiple stages, each of which has to be properly routed within a cluster for processing. These customizable event workflows support complex event processing (CEP) applications.

In-memory Hadoop acceleration. The Apache Ignite Accelerator for Hadoop enables fast data processing in existing Hadoop environments via the tools and technology an organization is already using.

apache ignite hadoop rev

Ignite in-memory Hadoop acceleration is based on the first dual-mode, high-performance in-memory file system that is 100 percent compatible with Hadoop HDFS and an in-memory optimized MapReduce implementation. Delivering up to 100 times faster performance, in-memory HDFS and in-memory MapReduce provide easy-to-use extensions to disk-based HDFS and traditional MapReduce. This plug-and-play feature requires minimal to no integration. It works with any open source or commercial version of Hadoop 1.x or Hadoop 2.x, including Cloudera, Hortonworks, MapR, Apache, Intel, and AWS. The result is up to 100-fold faster performance for MapReduce and Hive jobs.

Distributed in-memory file system. A unique feature of Apache Ignite is the Ignite File System (IGFS), which is a file system interface to in-memory data. IGFS delivers similar functionality to Hadoop HDFS. It includes the ability to create a fully functional file system in memory. IGFS is at the core of the Apache Ignite In-Memory Accelerator for Hadoop.

The data from each file is split on separate data blocks and stored in cache. Data in each file can be accessed with a standard Java streaming API. For each part of the file, a developer can calculate an affinity and process the file’s content on corresponding nodes to avoid unnecessary networking.

Unified API. The Apache Ignite unified API supports a wide variety of common protocols for the application layer to access data. Supported protocols include SQL, Java, C++, .Net, PHP, MapReduce, Scala, Groovy, and Node.js. Ignite supports several protocols for client connectivity to Ignite clusters, including Ignite Native Clients, REST/HTTP, SSL/TLS, and Memcached.SQL.

Advanced clustering. Apache Ignite provides one of the most sophisticated clustering technologies on JVMs. Ignite nodes can automatically discover each other, which helps scale the cluster when needed without having to restart the entire cluster. Developers can also take advantage of Ignite’s hybrid cloud support, which allows users to establish connections between private clouds and public clouds such as AWS or Microsoft Azure.

Additional features. Apache Ignite provides high-performance, clusterwide messaging functionality. It allows users to exchange data via publish-subscribe and direct point-to-point communication models.

The distributed events functionality in Ignite allows applications to receive notifications about cache events occurring in a distributed grid environment. Developers can use this functionality to be notified about the execution of remote tasks or any cache data changes within the cluster. Event notifications can be grouped and sent in batches and at timely intervals. Batching notifications help attain high cache performance and low latency.

Ignite allows for most of the data structures from the java.util.concurrent framework to be used in a distributed fashion. For example, you could add to a double-ended queue (java.util.concurrent.BlockingDeque) on one node and poll it from another node. Or you could have a distributed primary key generator, which would guarantee uniqueness on all nodes.

Ignite distributed data structures include support for these standard Java APIs: Concurrent map, distributed queues and sets, AtomicLong, AtomicSequence, AtomicReference, and CountDownLatch.

Key integrations

Apache Spark. Apache Spark is a fast, general-purpose engine for large-scale data processing. Ignite and Spark are complementary in-memory computing solutions. They can be used together in many instances to achieve superior performance and functionality.

Apache Spark and Apache Ignite address somewhat different use cases and rarely compete for the same task. The table below outlines some of the key differences.

Apache Spark doesn’t provide shared storage, so data from HDFS or other disk storage must be loaded into Spark for processing. State can be passed from Spark job to job only by saving the processed data back into external storage. Ignite can share Spark state directly in memory, without storing the state to disk.

One of the main integrations for Ignite and Spark is the Apache Ignite Shared RDD API. Ignite RDDs are essentially wrappers around Ignite caches that can be deployed directly inside of executing Spark jobs. Ignite RDDs can also be used with the cache-aside pattern, where Ignite clusters are deployed separately from Spark, but still in-memory. The data is still accessed using Spark RDD APIs.

Spark supports a fairly rich SQL syntax, but it doesn’t support data indexing, so it must do full scans all the time. Spark queries may take minutes even on moderately small data sets. Ignite supports SQL indexes, resulting in much faster queries, so using Spark with Ignite can accelerate Spark SQL more than 1,000-fold. The result set returned by Ignite Shared RDDs also conforms to the Spark Dataframe API, so it can be further analyzed using standard Spark dataframes. Both Spark and Ignite natively integrate with Apache YARN and Apache Mesos, so it’s easier to use them together.

When working with files instead of RDDs, it’s still possible to share state between Spark jobs and applications using the Ignite In-Memory File System (IGFS). IGFS implements the Hadoop FileSystem API and can be deployed as a native Hadoop file system, exactly like HDFS. Ignite plugs in natively to any Hadoop or Spark environment. IGFS can be used with zero code changes in plug-and-play fashion.

Apache Cassandra. Apache Cassandra can serve as a high-performance solution for structured queries. But the data in Cassandra should be modeled such that each predefined query results in one row retrieval. Thus, you must know what queries will be required before modeling the data.

 

 

[Source:- Infoworld]

Department of Labor sues Google over wage data

Google's Mountain View, California headquarters

The U.S. Department of Labor has filed a lawsuit against Google, with the company’s ability to win government contracts at risk.

The agency is seeking what it calls “routine” information about wages and the company’s equal opportunity program. The agency filed a lawsuit with its Office of Administrative Law Judges to gain access to the information, it announced Wednesday.

Google, as a federal contractor, is required to provide the data as part of a compliance check by the agency’s Office of Federal Contract Compliance Programs (OFCCP), according to the Department of Labor. The inquiry is focused on Google’s compliance with equal employment laws, the agency said.

“Like other federal contractors, Google has a legal obligation to provide relevant information requested in the course of a routine compliance evaluation,” OFCCP Acting Director Thomas Dowd said in a press release. “Despite many opportunities to produce this information voluntarily, Google has refused to do so.”

Google said it’s provided hundreds of thousands of records to the agency over the past year, including some related to wages. However, a handful of OFCCP data requests were “overbroad” or would reveal confidential data, the company said in a statement.

“We’ve made this clear to the OFCCP, to no avail,” the statement added. “These requests include thousands of employees’ private contact information which we safeguard rigorously.”

Google must allow the federal government to inspect and copy records relevant to compliance, the Department of Labor said. The agency requested the information in September 2015, but Google provided only partial responses, an agency spokesman said by email.

 

 

[Source:- Javaworld]

Snowflake now offers data warehousing to the masses

Snowflake now offers data warehousing to the masses

Snowflake, the cloud-based data warehouse solution co-founded by Microsoft alumnus Bob Muglia, is lowering storage prices and adding a self-service option, meaning prospective customers can open an account with nothing more than a credit card.

These changes also raise an intriguing question: How long can a service like Snowflake expect to reside on Amazon, which itself offers services that are more or less in direct competition — and where the raw cost of storage undercuts Snowflake’s own pricing for same?

Open to the public

The self-service option, called Snowflake On Demand, is a change from Snowflake’s original sales model. Rather than calling a sales representative to set up an account, Snowflake users can now provision services themselves with no more effort than would be needed to spin up an AWS EC2 instance.

In a phone interview, Muglia discussed how the reason for only just now transitioning to this model was more technical than anything else. Before self-service could be offered, Snowflake had to put protections into place to ensure that both the service itself and its customers could be protected from everything from malice (denial-of-service attacks) to incompetence (honest customers submitting massively malformed queries).

“We wanted to make sure we had appropriately protected the system,” Muglia said, “before we opened it up to anyone, anywhere.”

This effort was further complicated by Snowflake’s relative lack of hard usage limits, which Muglia characterized as being one of its major standout features. “There is no limit to the number of tables you can create,” Muglia said, but he further pointed out that Snowflake has to strike a balance between what it can offer any one customer and protecting the integrity of the service as a whole.

“We get some crazy SQL queries coming in our direction,” Muglia said, “and regardless of what comes in, we need to continue to perform appropriately for that customer as well as other customers. We see SQL queries that are a megabyte in size — the query statements [themselves] are a megabyte in size.” (Many such queries are poorly formed, auto-generated SQL, Muglia claimed.)

Fewer costs, more competition

The other major change is a reduction in storage pricing for the service — $30/TB/month for capacity storage, $50/TB/month for on-demand storage, and uncompressed storage at $10/TB/month.

It’s enough of a reduction in price that Snowflake will be unable to rely on storage costs as a revenue source, since those prices barely pay for the use of Amazon’s services as a storage provider. But Muglia is confident Snowflake is profitable enough overall that such a move won’t impact the company’s bottom line.

“We did the data modeling on this,” said Muglia, “and our margins were always lower on storage than on compute running queries.”

According to the studies Snowflake performed, “when customers put more data into Snowflake, they run more queries…. In almost every scenario you can imagine, they were very much revenue-positive and gross-margin neutral, because people run more queries.”

The long-term implications for Snowflake continuing to reside on Amazon aren’t clear yet, especially since Amazon might well be able to undercut Snowflake by directly offering competitive services.

Muglia, though, is confident that Snowflake’s offering is singular enough to stave off competition for a good long time, and is ready to change things up if need be. “We always look into the possibility of moving to other cloud infrastructures,” Muglia said, “although we don’t have plans to do it right now.”

He also noted that Snowflake competes with Amazon and Redshift right now, but “we have a very different shape of product relative to Redshift…. Snowflake is storing multiple petabytes of data and is able to run hundreds of simultaneous concurrent queries. Redshift can’t do that; no other product can do that. It’s that differentiation that allows to effective compete with Amazon, and for that matter Google and Microsoft and Oracle and Teradata.”

[An earlier version of this article incorrectly identified “uncompressed storage” as “compressed storage”. The pricing of this feature is the same.]

 

[Source:- Infowrold]