US tech industry says immigration order affects their operations

Trump inauguration

The U.S. tech industry has warned that a temporary entry suspension on certain foreign nationals introduced on Friday by the administration of President Donald Trump will impact these companies’ operations that are dependent on foreign workers.

The Internet Association, which has a number of tech companies including Google, Amazon, Facebook and Microsoft as its members, said that Trump’s executive order limiting immigration and movement into the U.S. has troubling implications as its member companies and firms in many other industries include legal immigrant employees who are covered by the orders and will not be able to return back to their jobs and families in the U.S.

“Their work benefits our economy and creates jobs here in the United States,” said Internet Association President and CEO Michael Beckerman in a statement over the weekend.

Executives of a number of tech companies like Twitter, Microsoft and Netflix have expressed concern about the executive order signed by Trump, which suspended for 90 days entry into the U.S. of persons from seven Muslim-majority countries – Iran, Iraq, Libya, Somalia, Sudan, Syria and Yemen – as immigrants and non-immigrants. The Trump administration has described the order as a move to prevent foreign terrorist entry into the U.S.

Tech companies like Uber, Apple, Microsoft and Google are in touch with employees affected by the order, according to reports. Uber is working on a scheme to compensate some of its drivers who come from the listed countries and had taken long breaks to see their extended families and are now unable to come back to the U.S., wrote CEO Travis Kalanick, who is a member of Trump’s business advisory group.

“As an immigrant and as a CEO, I’ve both experienced and seen the positive impact that immigration has on our company, for the country, and for the world,” wrote Satya Nadella, Microsoft CEO, in an online post over the weekend. “We will continue to advocate on this important topic.” Netflix CEO Reed Hastings wrote in a Facebook post that “Trump’s actions are hurting Netflix employees around the world, and are so un-American it pains us all.”

The tech industry is also concerned about further moves by the government on immigration policy that could place restrictions on visas for the entry of people who help these companies run their operations and develop products and services. The H-1B visa program have been criticized for replacing U.S. workers.

Microsoft’s Chief Legal Officer Brad Smith said in a note to employees on Saturday that the company believes in “a strong and balanced high-skilled immigration system.”

 

[Source:- Javaworld]

 

Google creates ‘crisis fund’ following US immigration ban

Image result for Google creates ‘crisis fund’ following US immigration ban

Tech giant Google has created a US$2 million crisis fund in response to US president Donald Trump’s immigration ban.

Google staff are also being invited to top up the fund, with the money going towards the American Civil Liberties Union (ACLU), Immigrant Legal Resource Center (ILRC), International Rescue Committee (IRC), and the UN High Commissioner for Refugees (UNHCR).

“We chose these organisations for their incredible efforts in providing legal assistance and support services for immigrants, as well as their efforts on resettlement and general assistance for refugees globally,” a Google spokesperson said.

The announcement follows requests by Google CEO, Sundar Pichai last week for staff travelling overseas to come back to the US. More than 100 staff are affected by President Trump’s executive order on immigration.

Since 2015, Google has given more than US$16 million to organisations focused on humanitarian aid for refugees on the ground, WiFi in refugee camps, and education for out of school refugee children in Lebanon, the spokesperson said.

Microsoft CEO Satya Nadella has also responded to the crisis, saying that as an immigrant himself, he has experienced the positive impact that immigration has on the company, the country and the world.

Nadella said Microsoft was providing legal advice and assistance to 76 staff who have a US visa and are citizens of Syria, Iraq, Iran, Libya, Somalia, Yemen, and Sudan.

In an email sent to Microsoft staff, US-based director, Brad Smith said that Microsoft believes in a strong and balance skilled immigration system.

“We also believe in broader-immigration opportunities, like the protections for talented and law-abiding young people under the Deferred Access for Childhood Arrivals (DACA) program. We believe that immigration laws can and should protect the public without sacrificing people’s freedom of expression or religion. And we believe in the importance of protecting legitimate and law-abiding refugees whose very lives may be at stake in immigration proceedings,” he said.

 

 

[Source:- Javaworld]

GitLab database goes out after spam attack

GitLab database goes out after spam attack

Code-hosting site GitLab has suffered an outage after sustaining a “serious” incident on Tuesday with one of its databases that has required emergency maintenance.

The company today said it lost six hours of database data, including issues, merge requests, users, comments, and snippets, for GitLab.com and was in the process restoring data from a backup. Data was accidentally deleted, according to a Twitter message.

“Losing production data is unacceptable, and in a few days we’ll post the five whys of why this happened and a list of measures we will implement,” GitLab said in a bulletin this morning. Git.wiki repositories and self-hosted installations were unaffected.

The restoration means any data between 17:20 UTC and 23:25 UTC from the database is lost by the time GitLab.com goes live again. Providing a chronology of events, GitLab said it detected Monday that spammers were hammering its database by creating snippets and rendering it unstable. GitLab blocked the spammers based on an IP address and removed a user from using a repository as a form of CDN. This resulted in 47,000 IPs signing in using the same account and causing a high database load, and GitLab removed the users for spamming.

The company provided a statement this morning: “This outage did not affect our Enterprise customers or the wide majority of our users. As part of our ongoing recovery efforts, we are actively investigating a potential data loss. If confirmed, this data loss would affect less than one percent of our user base, and specifically peripheral metadata that was written during a six-hour window,” the company said. “We have been working around the clock to resume service on the affected product, and set up long-term measures to prevent this from happening again. We will continue to keep our community updated through Twitter, our blog and other channels.”

While dealing with the problem, GitLab found database replication lagged far behind, effectively stopping. “This happened because there was a spike in writes that were not processed on time by the secondary database.” GitLab has been dealing with a series of database issues, including a refusal to replicate.

GitLab.com went down at 6:28 pm PST on Tuesday and was back up at 9:57 am PST today, said Tim Anglade, interim vice president of marketing at the company.

 

 

[Source:- Javaworld]

 

Go 1.8 goes for efficiency and convenience

Go 1.8 goes for efficiency and convenience

Go 1.8, the next version of Google’s open source language, is moving toward general availability, with a release candidate featuring improvements in compilation and HTTP. The final Version 1.8 is due in February.

According to draft notes, the release candidate features updates to the compiler back end for more efficient code. The back end, initially developed for Go 1.7 for 64-bit x86 systems, is based on static single assignment (SSA) form to generate more efficient code and to serve as a platform for optimizations like bounds check elimination. It now works on all architectures.

“The new back end reduces the CPU time required by our benchmark programs by 20 to 30 percent on 32-bit ARM systems,” the release notes say. “For 64-bit x86 systems, which already used the SSA back end in Go 1.7, the gains are a more modest 0 to 10 percent. Other architectures will likely see improvements closer to the 32-bit ARM numbers.”

Version 1.8 also introduces a new compiler front end as a foundation for future performance enhancements, and it features shorter garbage collection pauses by eliminating “stop the world” stack rescanning.

The release notes also cite HTTP2 Push support, in which the net/http package can send HTTP/2 server pushes from a handler, which responds to an HTTP request. Additionally, HTTP server shutdown can be enabled in a “graceful” fashion via a Server.Shutdown method and abruptly using a Server.Close method.

Version 1.8 adds support for the Mips 32-bit architecture on Linux and offers more context support for packages like Server.Shutdown, database/sql, and .net.resolver. Go’s sort package adds a convenience function, Slice, to sort a slice given a less function. “In many cases this means that writing a new sorter type is not necessary.” Runtime and tools in Go 1.8 support profiling of contended mutexes, which provide a mutual exclusion lock.

Most of the upgrade’s changes are in the implementation of the toolchain, runtime, and libraries. “There are two minor changes to the language specification,” the release notes state. “As always, the release maintains the Go 1 promise of compatibility. We expect almost all Go programs to continue to compile and run as before.” Language changes include conversion of a value from one type to another, with Go tags now ignored. Also, the language specification now only requires that implementations support up to 16-bit exponents in floating-point constants.

 

 

[Source:- JW]

Oracle to Java devs: Stop signing JAR files with MD5

Oracle to Java devs: Stop signing JAR files with MD5

Starting in April, Oracle will treat JAR files signed with the MD5 hashing algorithm as if they were unsigned, which means modern releases of the Java Runtime Environment (JRE) will block those JAR files from running. The shift is long overdue, as MD5’s security weaknesses are well-known, and more secure algorithms should be used for code signing instead.

“Starting with the April Critical Patch Update releases, planned for April 18, 2017, all JRE versions will treat JARs signed with MD5 as unsigned,” Oracle wrote on its Java download page.

Code-signing JAR files bundled with Java libraries and applets is a basic security practice as it lets users know who actually wrote the code, and it has not been altered or corrupted since it was written. In recent years, Oracle has been beefing up Java’s security model to better protect systems from external exploits and to allow only signed code to execute certain types of operations. An application without a valid certificate is potentially unsafe.

Newer versions of Java now require all JAR files to be signed with a valid code-signing key, and starting with Java 7 Update 51, unsigned or self-signed applications are blocked from running.

Code signing is an important part of Java’s security architecture, but the MD5 hash weakens the very protections code signing is supposed to provide. Dating back to 1992, MD5 is used for one-way hashing: taking an input and generating a unique cryptographic representation that can be treated as an identifying signature. No two inputs should result in the same hash, but since 2005, security researchers have repeatedly demonstrated that the file could be modified and still have the same hash in collisions attacks. While MD5 is no longer used for TLS/SSL—Microsoft deprecated MD5 for TLS in 2014—it remains prevalent in other security areas despite its weaknesses.

With Oracle’s change, “affected MD-5 signed JAR files will no longer be considered trusted [by the Oracle JRE] and will not be able to run by default, such as in the case of Java applets, or Java Web Start applications,” Erik Costlow, an Oracle product manager with the Java Platform Group, wrote back in October.

Developers need to verify that their JAR files have not been signed using MD5, and if it has, re-sign affected files with a more modern algorithm. Administrators need to check with vendors to ensure the files are not MD5-signed. If the files are still running MD5 at the time of the switchover, users will see an error message that the application could not go. Oracle has already informed vendors and source licensees of the change, Costlow said.

In cases where the vendor is defunct or unwilling to re-sign the application, administrators can disable the process that checks for signed applications (which has serious security implications), set up custom Deployment Rule Setsfor the application’s location, or maintain an Exception Site List, Costlow wrote.

There was plenty of warning. Oracle stopped using MD5 with RSA algorithm as the default JAR signing option with Java SE6, which was released in 2006. The MD5 deprecation was originally announced as part of the October 2016 Critical Patch Update and was scheduled to take effect this month as part of the January CPU. To ensure developers and administrators were ready for the shift, the company has decided to delay the switch to the April Critical Patch Update, with Oracle Java SE 8u131 and corresponding releases of Oracle Java SE 7, Oracle Java SE 6, and Oracle JRockit R28.

“The CA Security Council applauds Oracle for its decision to treat MD5 as unsigned. MD5 has been deprecated for years, making the move away from MD5 a critical upgrade for Java users,” said Jeremy Rowley, executive vice president of emerging markets at Digicert and a member of the CA Security Council.

Deprecating MD5 has been a long time coming, but it isn’t enough. Oracle should also look at deprecating SHA-1, which has its own set of issues, and adopt SHA-2 for code signing. That course of action would be in line with the current migration, as major browsers have pledged to stop supporting websites using SHA-1 certificates. With most organizations already involved with the SHA-1 migration for TLS/SSL, it makes sense for them to also shift the rest of their certificate and key signing infrastructure to SHA-2.

The good news is that Oracle plans to disable SHA-1 in certificate chains anchored by roots included by default in Oracle’s JDK at the same time MD5 gets deprecated, according to the JRE and JDK Crypto Roadmap, which outlines technical instructions and information about ongoing cryptographic work for Oracle JRE and Oracle JDK. The minimum key length for Diffie-Hellman will also be increased to 1,024 bits later in 2017.

The road map also claims Oracle recently added support for the SHA224withDSA and SHA256withDSA signature algorithms to Java 7, and disabled Elliptic Curve (EC) for keys of less than 256 bits for SSL/TLS for Java 6, 7, and 8.

 

 

[Source:- JW]

Attackers start wiping data from CouchDB and Hadoop databases

Data-wiping attacks have hit exposed Hadoop and CouchDB databases.

It was only a matter of time until ransomware groups that wiped data from thousands of MongoDB databases and Elasticsearch clusters started targeting other data storage technologies. Researchers are now observing similar destructive attacks hitting openly accessible Hadoop and CouchDB deployments.

Security researchers Victor Gevers and Niall Merrigan, who monitored the MongoDB and Elasticsearch attacks so far, have also started keeping track of the new Hadoop and CouchDB victims. The two have put together spreadsheets on Google Docs where they document the different attack signatures and messages left behind after data gets wiped from databases.

In the case of Hadoop, a framework used for distributed storage and processing of large data sets, the attacks observed so far can be described as vandalism.

That’s because the attackers don’t ask for payments to be made in exchange for returning the deleted data. Instead, their message instructs the Hadoop administrators to secure their deployments in the future.

According to Merrigan’s latest count, 126 Hadoop instances have been wiped so far. The number of victims is likely to increase because there are thousands of Hadoop deployments accessible from the internet — although it’s hard to say how many are vulnerable.

The attacks against MongoDB and Elasticsearch followed a similar pattern. The number of MongoDB victims jumped from hundreds to thousands in a matter of hours and to tens of thousands within a week. The latest count puts the number of wiped MongoDB databases at more than 34,000 and that of deleted Elasticsearch clusters at more than 4,600.

A group called Kraken0, responsible for most of the ransomware attacks against databases, is trying to sell its attack toolkit and a list of vulnerable MongoDB and Elasticsearch installations for the equivalent of US$500 in bitcoins.

The number of wiped CouchDB databases is also growing rapidly, reaching more than 400 so far. CouchDB is a NoSQL-style database platform similar to MongoDB.

Unlike the Hadoop vandalism, the CouchDB attacks are accompanied by ransom messages, with attackers asking for 0.1 bitcoins (around $100) to return the data. Victims are advised against paying because, in many of the MongoDB attacks, there was no evidence that attackers had actually copied the data before deleting it.

Researchers from Fidelis Cybersecurity have also observed the Hadoop attacks and have published a blog post with more details and recommendations on securing such deployments.

The destructive attacks against online database storage systems are not likely to stop soon because there are other technologies that have not yet been targeted and that might be similarly misconfigured and left unprotected on the internet by users.

 

 

[Source:- JW]

Google open-sources test suite to find crypto bugs

Google open-sources test suite to find crypto bugs

Working with cryptographic libraries is hard, and a single implementation mistake can result in serious security problems. To help developers check their code for implementation errors and find weaknesses in cryptographic software libraries, Google has released a test suite as part of Project Wycheproof.

“In cryptography, subtle mistakes can have catastrophic consequences, and mistakes in open source cryptographic software libraries repeat too often and remain undiscovered for too long,” Google security engineers Daniel Bleichenbacher and Thai Duong, wrote in a post announcing the project on the Google Security blog.

Named after Australia’s Mount Wycheproof, the world’s smallest mountain, Wycheproof provides developers with a collection of unit tests that detect known weaknesses in cryptographic algorithms and check for expected behaviors. The first set of tests is written in Java because Java has a common cryptographic interface and can be used to test multiple providers.

“We recognize that software engineers fix and prevent bugs with unit testing, and we found that many cryptographic issues can be resolved by the same means,” Bleichenbacker and Duong wrote.

The suite can be used to test such cryptographic algorithms as RSA, elliptic curve cryptography, and authenticated encryption, among others. The project also has ready-to-use tools to check Java Cryptography Architecture providers, such as Bouncy Castle and the default providers in OpenJDK. The engineers said they are converting the tests into sets of test vectors to simplify the process of porting them to other languages.

The tests in this release are low-level and should not be used directly, but they still can be applied for testing the algorithms against publicly known attacks, the engineers said. For example, developers can use Wycheproof to verify whether algorithms are vulnerable to invalid curve attacks or biased nonces in digital signature schemes.

So far the project has been used to run more than 80 test cases and has identified 40-plus vulnerabilities, including one issue where the private key of DSA and ECDHC algorithms could be recovered under specific circumstances. The weakness in the algorithm was present because libraries were not checking the elliptic curve points they received from outside sources.

“Encodings of public keys typically contain the curve for the public key point. If such an encoding is used in the key exchange, then it is important to check that the public and secret key used to compute the shared ECDH secret are using the same curve. Some libraries fail to do this check,” according to the available documentation.

Cryptographic libraries can be quite difficult to implement, and attackers frequently look for weak cryptographic implementations rather than trying to break the actual mathematics underlying the encryption. With Wycheproof, developers and users can check their libraries against a large number of known attacks without having to dig through academic papers to find out what kind of attacks they need to worry about.

The engineers looked through public cryptographic literature and implemented known attacks to build the test suite. However, developers should not consider the suite to be comprehensive or able to detect all weaknesses, because new weaknesses are always being discovered and disclosed.

“Project Wycheproof is by no means complete. Passing the tests does not imply that the library is secure, it just means that it is not vulnerable to the attacks that Project Wycheproof tries to detect,” the engineers wrote.

Wycheproof comes two weeks after Google released a fuzzer to help developers discover programming errors in open source software. Like OSS-Fuzz, all the code for Wycheproof is available on GitHub. OSS-Fuzz is still in beta, but it has already worked through 4 trillion test cases and uncovered 150 bugs in open source projects since it was publicly announced.

 

 

[Source:- JW]

AI tools came out of the lab in 2016

Roboy angry robot

You shouldn’t anthropomorphize computers: They don’t like it.

That joke is at least as old as Deep Blue’s 1997 victory over then world chess champion Garry Kasparov, but even with the great strides made in the field of artificial intelligence over that time, we’re still not much closer to having to worry about computers’ feelings.

Computers can analyze the sentiments we express in social media, and project expressions on the face of robots to make us believe they are happy or angry, but no one seriously believes, yet, that they “have” feelings, that they can experience them.

Other areas of AI, on the other hand, have seen some impressive advances in both hardware and software in just the last 12 months.

Deep Blue was a world-class chess opponent — and also one that didn’t gloat when it won, or go off in a huff if it lost.

Until this year, though, computers were no match for a human at another board game, Go. That all changed in March when AlphaGo, developed by Google subsidiary DeepMind, beat Lee Sedol, then the world’s strongest Go player, 4-1 in a five-match tournament.

AlphaGo’s secret weapon was a technique called reinforcement learning, where a program figures out for itself which actions bring it closer to its goal, and reinforces those behaviors, without the need to be taught by a person which steps are correct. That meant that it could play repeatedly against itself and gradually learn which strategies fared better.

Reinforcement learning techniques have been around for decades, too, but it’s only recently that computers have had sufficient processing power (to test each possible path in turn) and memory (to remember which steps led to the goal) to play a high-level game of Go at a competitive speed.

Better performing hardware has moved AI forward in other ways too.

In May, Google revealed its TPU (Tensor Processing Unit), a hardware accelerator for its TensorFlow deep learning algorithm. The ASICs (application-specific integrated circuit) can execute the types of calculations used in machine learning much faster and using less power than even GPUs, and Google has installed several thousand of them in its server racks in the slots previously reserved for hard drives.

The TPU, it turns out, was one of the things that made AlphaGo so fast, but Google has also used the chip to accelerate mapping and navigation functions in Street View and to improve search results with a new AI tool called RankBrain.

Google is keeping its TPU to itself for now, but others are releasing hardware tuned for AI applications. Microsoft, for example, has equipped some of its Azure servers with FPGAs (field-programmable gate arrays) to accelerate certain machine learning functions, while IBM is targeting similar applications with a range of PowerAI servers that use custom hardware to link its Power CPUs with Nvidia GPUs.

For businesses that want to deploy cutting-edge AI technologies without developing everything from scratch themselves, easy access to high-performance hardware is a start, but not enough. Cloud operators recognize that, and are also offering AI software as a service. Amazon Web Services and Microsoft’s Azure have both added machine learning APIs, while IBM is building a business around cloud access to its Watson AI.

The fact that these hardware and software tools are cloud-based will help AI systems in other ways too.

Being able to store and process enormous volumes of data is only useful to the AI that has access to vast quantities of data from which to learn — data such as that collected and delivered by cloud services, for example, whether its information about the weather, mail order deliveries, requests for rides or peoples’ tweets.

Access to all that raw data, rather than the minute subset, processed and labelled by human trainers, that was available to previous generations of AIs, is one of the biggest factors transforming AI research today, according to a Stanford University study of the next 100 years in AI.

And while having computers watch everything we do, online and off, in order to learn how to work with us might seem creepy, it’s really only in our minds. The computers don’t feel anything. Yet.

 

[Source:- JW]

 

Oracle survey: Java EE users want REST, HTTP/2

Oracle survey: Java EE users want REST, HTTP/2

In September and October, Oracle asked Java users to rank future Java EE enhancements by importance. The survey’s 1700 participants put REST services and HTTP/2 as top priorities, followed by Oauth and OpenID, eventing, and JSON-B (Java API for JSON Binding).

“REST (JAX-RS 2.1) and HTTP/2 (Servlet 4.0) have been voted as the two most important technologies surveyed, and together with JSON-B represent three of the top six technologies,” a report on the survey concludes. “Much of the new API work in these technologies for Java EE 8 is already complete. There is significant value in delivering Java EE 8 with these technologies, and the related JSON-P (JSON with Padding) updates, as soon as possible.”

Oracle is pursuing Java EE 8 as a retooled version of the platform geared to cloud and microservices deployments. It’s due in late-2017, and a follow-up release, Java EE 9, is set to appear a year later.

Based on the survey, Oracle considered accelerating Java EE standards for OAuth and OpenID Connect. “This could not be accomplished in the Java EE 8 timeframe, but we’ll continue to pursue Security 1.0 for Java EE 8,” the company said. But two other technologies that ranked high in the survey, configuration and health-checking, will be postponed. “We have concluded it is best to defer inclusion of these technologies in Java EE in order to complete Java EE 8 as soon as possible.”

Management, JMS (Java Message Service), and MVC ranked low, thus supporting Oracle’s plans to withdraw new APIs for these areas from Java EE 8. While, CDI (Contexts and Dependency Injection) 2.0, Bean Validation 2.0, and JSF (JavaServer Faces) 2.3 were not directly surveyed, Oracle has made significant progress on them and will include them in Java EE 8.

JAX-RS (Java API for RESTful Web Services) drew a lot of support for use with cloud and microservices applications, with 1,171 respondents rating it as very important. “The current practice of cloud development in Java is largely based on REST and asynchrony,” the report said. “For Java developers, that means using the standard JAX-RS API. Suggested enhancements coming to the next version of JAX-RS include: a reactive client API, non-blocking I/O support, server-sent events and better CDI integration.” HTTP/2, a protocol for more efficient use of network resources and reduced latency, was rated very important by 1,037 respondents when it comes to cloud and microservices applications.

Respondents also supported the reactive style of programming for the next generation of cloud and microservices, with 647 calling it very important, and eventing, for cloud and microservices applications, was favored by 769 respondents. “Many cloud applications are moving from a synchronous invocation model to an asynchronous event-driven model,” Oracle said. “Key Java EE APIs could support this model for interacting with cloud services. A common eventing system would simplify the implementation of such services.”

In other findings, eventual consistency for cloud and microservices applications was favored by 514 respondents who found it very important and 468 who found it important. Multi-tenancy, critical to cloud deployments, was rated very important by 377 respondents and important by 390 survey takers. JSON-P was rated as very important by 576 respondents, while 781 gave this same rating to JSON-B. Standardizing NoSQL database support for cloud and microservices applications was rated very important by 489 respondents and important by 373 of those surveyed, and  582 respondents thought it was very important that Java EE 9 investigate the modularization of EE containers.

The greatest number of the survey’s respondents — more than 700 — had more than eight years’ experiences developing with Java EE, while 680 had from two to eight years of experience.

 

 

[Source:- JW]

Apache Beam unifies batch and streaming for big data

Apache Beam unifies batch and streaming for big data

Apache Beam, a unified programming model for both batch and streaming data, has graduated from the Apache Incubator to become a top-level Apache project.

Aside from becoming another full-fledged widget in the ever-expanding Apache tool belt of big-data processing software, Beam addresses ease of use and dev-friendly abstraction, rather than simply offering raw speed or a wider array of included processing algorithms.

Beam us up!

Beam provides a single programming model for creating batch and stream processing jobs (the name is a hybrid of “batch” and “stream”), and it offers a layer of abstraction for dispatching to various engines used to run the jobs. The project originated at Google, where it’s currently a service called GCD (Google Cloud Dataflow). Beam uses the same API as GCD, and it can use GCD as an execution engine, along with Apache Spark, Apache Flink (a stream processing engine with a highly memory-efficient design), and now Apache Apex (another stream engine for working closely with Hadoop deployments).

The Beam model involves five components: the pipeline (the pathway for data through the program); the “PCollections,” or data streams themselves; the transforms, for processing data; the sources and sinks, where data is fetched and eventually sent; and the “runners,” or components that allow the whole thing to be executed on an engine.

Apache says it separated concerns in this fashion so that Beam can “easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing.” This is in line with reworking tools like Apache Spark to support stream and batch processing within the same product and with similar programming models. In theory, it’s one fewer concept for prospective developers to wrap their head around, but that presumes Beam is used in lieu of Spark or other frameworks, when it’s more likely it’ll be used — at first — to augment them.

Hands off

One possible drawback to Beam’s approach is that while the layers of abstraction in the product make operations easier, they also put the developer at a distance from the underlying layers. A good case in point: Beam’s current level of integration with Apache Spark; the Spark runner doesn’t yet use Spark’s more recent DataFrames system, and thus may not take advantage of the optimizations those can provide. But this isn’t a conceptual flaw, it’s an issue with the implementation, which can be addressed in time.

The big payoff of using Beam, as noted by Ian Pointer in his discussion of Beam in early 2016, is that it makes migrations between processing systems less of a headache. Likewise, Apache says Beam “cleanly [separates] the user’s processing logic from details of the underlying engine.”

Separation of concern and ease of migration will be good to have if the ongoing rivalries, and competitions between the various big data processing engines continues. Granted, Apache Spark has emerged as one of the undisputed champs of the field and become a de facto standard choice. But there’s always room for improvement or an entirely new streaming or processing paradigm. Beam is less about offering a specific alternative than about providing developers and data-wranglers with more breadth of choice between them.

 

 

[Source:- Javaworld]