What programming language is needed for big data

There is a big data project, you know the problem domain, you know what infrastructure to use, and you may even have decided which framework to use to process all this data, but there is one decision that has been delayed: my choice Which language? (Or perhaps more pertinently, which language should I force all my developers and data scientists to use?) This question will not be postponed for too long and will be decided sooner or later.

Before sharing, I still want to recommend the big data learning exchange I created by myself. Qun531629188 I am very welcome whether it is a Daniel or a college student who wants to change careers and want to learn. Today's information has been uploaded to the group file, and I will share dry goods from time to time, including me I have compiled a latest big data tutorial suitable for learning in 2018. Welcome to beginners and advanced friends.

Of course, nothing stops you from using other mechanisms (such as XSLT transformations) to work with big data. But generally speaking, there are three languages â€‹â€‹to choose from when it comes to big data these days: R, Python, and Scala, plus Java, which has long stood in the enterprise world. So, which language should you choose? Why or when?

Below is a brief overview of each language to help you make a sound decision.

R is often referred to as "a language developed by statisticians for statisticians". If you need an esoteric statistical model for computation, you might find it on CRAN -- you know, CRAN is called the Comprehensive R Archive Network for no reason. When it comes to analysis and plotting, nothing beats ggplot2. And if you want to take advantage of more power than your machine provides, you can use the SparkR bindings to run Spark on R.

However, if you are not a data scientist and have not used Matlab, SAS or OCTAVE before, it may take some tweaking to process efficiently with R. While R is great for analyzing data, it's not very good at general use. You can build the model in R, but you'll need to consider converting the model to Scala or Python for production use, it's unlikely you'd be writing a cluster control system in that language (with any luck, you can debug it ).

Python

If your data scientists don't use R, they probably know Python thoroughly. Python has been popular in academia for over a decade, especially in fields such as natural language processing (NLP). Thus, if you have a project that requires NLP processing, you are faced with a bewildering number of options, including classic NTLK, topic modeling using GenSim, or the ultra-fast, accurate spaCy. Similarly, when it comes to neural networks, Python is equally comfortable, with Theano and Tensorflow; followed by scikit-learn for machine learning, and NumPy and Pandas for data analysis.

There's also Juypter/iPython -- a web-based notebook server framework that lets you mix code, graphics, and just about anything in a shareable log format. This has always been one of Python's killer features, but these days, the concept has proven so useful that it appears in almost every language that embraces the read-read-output-loop (REPL) concept, including Scala and R.

Python is often supported in big data processing frameworks, but at the same time, it is often not a "first-class citizen". For example, new features in Spark almost always appear on top of the Scala/Java bindings, and may require several minor versions to be written in PySpark for those newer versions (especially for Spark Streaming/MLLib side development tools) ).

Contrary to R, Python is a traditional object-oriented language, so most developers will be quite comfortable with it, and first exposure to R or Scala can be intimidating. A small problem is that you need to have the correct white space in your code. This divides people into two camps, those who think "it's very helpful for readability" and those who think that in 2016, we shouldn't need to force interpretation just because a line of code has a character out of place to make the program run.

Scala

Now for Scala: Out of the four languages â€‹â€‹covered in this article, Scala is the easiest language because everyone appreciates its type system. Scala runs on the JVM, basically successfully combining functional and object-oriented paradigms, and it is currently making great strides in the financial world and in companies that need to process massive amounts of data, often in a massively distributed fashion ( such as Twitter and LinkedIn). It is also a language that drives Spark and Kafka.

Since Scala runs inside the JVM, it has immediate and free access to the Java ecosystem, but it also has an extensive set of "native" libraries for handling large-scale data (notably Twitter's Algebird and Summingbird). It also includes a very handy REPL for interactive development and analysis, just like with Python and R.

I personally like Scala a lot because it includes many useful programming features, such as pattern matching, and is considered much more concise than standard Java. However, using Scala to develop more than one method, the language advertises this as a feature. That's a good thing! But considering it has a Turing-complete type system and various convoluted operators ("/:" for foldLeft, ":" for foldRight), it's easy to open a Scala file, Thought you were looking at some nasty piece of Perl code. This requires following a set of good practices and guidelines when writing Scala (Databricks's make sense).

Another downside is that the Scala compiler is a bit slow to run, reminiscent of the "compile!" days of old. However, it has a REPL, support for big data, and a web-based notebook framework in the form of Jupyter and Zeppelin, so I think many of its hiccups are forgivable.

Java

In the end, there's always Java -- a language that's unloved, abandoned, owned by a company (note: Oracle) that seems to care about it only when it makes money by suing Google, and totally unfashionable. Only drones in the corporate world use Java! Java might be a good fit for your big data projects, though. Think of Hadoop MapReduce, which is written in Java. What about HDFS? Also written in Java. Even Storm, Kafka, and Spark can run on the JVM (with Clojure and Scala), which means Java is a "first-class citizen" in these projects. There are also new technologies like Google Cloud Dataflow (now Apache Beam), which until recently only supported Java.

Java may not be the rock-star beloved language of choice. But as developers struggle to sort out the set of callbacks in Node.js applications, using Java gives you access to a vast ecosystem of profilers, debuggers, monitoring tools, and libraries that ensure enterprise security and interoperability ), and much more beyond that, most of which have been tried and tested over the past two decades (sadly, Java turns 21 this year, and we're all getting old).

One of the main reasons for bombarding Java is that it is very verbose and lacks the REPL required for interactive development (R, Python, and Scala all have it). I've seen 10 lines of Scala-based Spark code quickly turn into a perverted 200 lines of code written in Java, with huge type statements that take up most of the screen. However, the new Lambda support feature in Java 8 goes a long way towards improving the situation. Java has never been as compact as Scala, but Java 8 does make developing in Java less painful.

As for the REPL? Well, not yet. Java 9, coming next year, will include JShell, which will hopefully satisfy all your REPL requirements.

Which language wins?

Which language should you use for a big data project? I'm afraid it depends on the situation. If you do heavy data analysis work with obscure statistical operations, then it's no wonder you don't like R. If you are doing NLP or intensive neural network processing across GPUs, Python is a great choice. If you want a hardened, production-ready dataflow solution with all the important operational tools, Java or Scala are definitely excellent choices.

Of course, it doesn't have to be either. For example, with Spark, you can train models and machine learning pipelines in R or Python with static data, then serialize that pipeline and dump it out to a storage system, where it's available for your production Used by Scala Spark Streaming applications. While you shouldn't be too obsessed with one language (or your team will quickly develop language fatigue), using a heterogeneous set of languages â€‹â€‹that leverage their strengths may be fruitful for big data projects.

Integrated Assembly Membrane Switches

Integrated Assembly Membrane Switches(also called a man-machine interface or human-machine interface) is a fully-featured, complete turn-key input device that incorporates a variety of electronic components that connect the operator to the machine. It is a value-added assembly of different components that allows the customer to simply [plug in" or [bolt on" to their equipment.

What are the Features & Benefits Integrated Assembly Membrane Switches?

A single-source solution from one supplier

Reduces your purchasing costs and simplifies your purchasing effort

erase multiple PO`s with multiple suppliers â€“ save time with less tracking

convenience of one PO for the entire user-interface assembly

eliminate minimum quantity requirements for numerous component parts

simplicity of one delivery time

Saves time and reduces your assembly costs with a [plug-in", [bolt-on" user-interface

Integrated Assembly Membrane Switches,Keypad Cable Membrane Switch,Membrane Keypad With Wiondws,Yellow Button Membrane Keypad

Dongguan Nanhuang Industry Co., Ltd , https://www.soushine-nanhuang.com