More

haifeng · on Nov 3, 2017

There is a java/scala project providing R like statistical computation:

http://haifengl.github.io/smile/linear-algebra.html

haifeng · on March 29, 2017

This software has its own visualization package. See it at

http://haifengl.github.io/smile/visualization.html

haifeng · on March 29, 2017

The project homepage says "Data scientists and developers can speak the same language now!". So it is surely easier to producitionize a ML project without rewriting the algorithms after the data scientists work out the model with R or Matlab.

madman2890 · on March 29, 2017

There are more python developers than scala developers. There are more python data scientists than scala data scientists. I like the project, though.

haifeng · on March 30, 2017

They are more Java developers than python developers :)

vkb · on March 30, 2017

I don't know that that's necessarily true. The most recent StackOverflow survey[1] shows a difference of 8%, which is not an overwhelming majority. Granted, that's not an unbiased sample size, but I think the OP above is correct...more data scientists use Python than Java.

So anyone wanting to use this library would have to think about tradeoffs: Are the efficiencies lost in data scientists learning to use Java for modeling worth the efficiencies gained in putting a model in production? For some, the answer may be yes, for some no.

[1]https://stackoverflow.com/insights/survey/2017#technology-pr...

haifeng · on March 29, 2017

Is there something wrong in your code? I don't see the effect.

smile> val x = Array(1.0, 2.0, 3.0, 4.0)

x: Array[Double] = Array(1.0, 2.0, 3.0, 4.0)

smile> val y = Array(4.0, 3.0, 2.0, 1.0)

y: Array[Double] = Array(4.0, 3.0, 2.0, 1.0)

smile> x + y

res2: smile.math.VectorAddVector = Array(5.0, 5.0, 5.0, 5.0)

sampo · on March 29, 2017

I just quoted the example in the webpage (section "Vector Operations"), I didn't run it myself.

haifeng · on March 29, 2017

Good eyes :) It should be a c&p error.

haifeng · on June 20, 2016

Just because you can?

larrydag · on June 20, 2016

My first thought was Why? But after thinking about it this actually could be helpful. There are still a lot of legacy systems that rely on Fortran. Perhaps this framework could help glue a lot of these legacy systems together over a network.

ronnier · on June 20, 2016

For some reason I've been seeing more and more Fortran lately. Maybe it's the trendy thing to do.

phamilton · on June 20, 2016

Honestly, High Performance Fortran has really nice support for parallel arrays. For computational programming I see a lot of value.

valarauca1 · on June 20, 2016

The issue is what you give up. Yes the FORTRAN compiler/runtime is fast but concurrent primitives weren't added until FORTRAN03, and recursion wasn't fully supported until FORTRAN90 [1]. It hides almost all the implementation details from the programmer.

Fortran is a really good language for number crunching. Academics who don't know how to code can build very fast number crunching tools without really any CS or hardware knowledge. This the target use case and target audience.

If you want to do low level system things with it, it'll be painful.

[1] Some fortran70 compilers had limited support for recursion it wasn't added to the official language standard until 20 years later tho.

colechristensen · on June 20, 2016

>The issue is what you give up.

This is the feature, not a bug. The purpose of fortran and why it is a fast efficient tool for scientists to write software is is exactly that it hides implementation details and is a bad systems language.

It allows you to define the important math parts of your program and hides the implementation part exactly so the compiler has a lot more power to optimize.

shepardrtc · on June 20, 2016

I got into it a couple years ago pretty heavily just for kicks and found that it was actually a pretty fun language if you're doing lots of math, especially matrices. I wish people would use it more for libraries, but I would be lying if I said you couldn't write something just as fast in C or C++ with a little bit more work. That being said, its actually a great language for people who just want to spit out some number crunching code.

haifeng · on May 23, 2016

It is on top of hbase, Cassandra, or accumulo. So yes, it is web scaled.

haifeng · on May 23, 2016

It is true. Unfortunately, the project was started several years ago and had nothing to do with the startup world. I would like to complain that VCs destroy another nice name with their hypes :(

haifeng · on May 23, 2016

Okay, everyone is hiding a unicorn in their closet :)

haifeng · on May 23, 2016

In most graph database, you find a vertex by filtering its properties, e.g. Gremlin graph query language. In Unicorn, you can do the similar with document vertices (it is, a vertex corresponding to a document in another table/collection). This is probably very nature in a business application. However, it is not very useful in your case as your vertices are abstract without any properties.

I guess what you want is some large scale graph analytics, which I suggest Spark GrpahX or other distributed graph computing engine.

Unicorn is designed for property directed multi-graphs.

rspeer · on May 23, 2016

I would say that what I have is a property-directed multi-graph, as I understand it. It's just that the properties are on the edges, and the nodes have no properties except for their ID.

The graph in question is ConceptNet, which in the version I'm working on has about 10 million edges and 3 million nodes. Let's be clear that, in computing, "million" is not a large number. I only said "large graph" to clarify that it's not a small toy graph. The data needs to be imported with some degree of efficiency. But I have a 3TB hard drive and 16 GB of RAM, and both of them can spare a few gigabytes for this task.

Before you throw me into the tarpit of distributed computing, like every other graph-DB provider does as an excuse for their terrible inefficiency, I would like to know if your graph database is appropriate to use with reasonable-sized graphs that fit easily on a single computer.

haifeng · on May 27, 2016

Check out this script https://github.com/haifengl/unicorn/blob/master/shell/src/un..., which loads dbpedia graph into unicorn. You should be able to load ConceptNet without minor modifications. Later, you can refer a vertex by its string id.

haifeng · on May 23, 2016

When adding an edge, the end vertices are assumed existed. In your case, we could add a helper function to import a list of edges, similar to Spark GraphX.