Hacker Newsnew | past | comments | ask | show | jobs | submit | haifeng's commentslogin

There is a java/scala project providing R like statistical computation:

http://haifengl.github.io/smile/linear-algebra.html


This software has its own visualization package. See it at

http://haifengl.github.io/smile/visualization.html


The project homepage says "Data scientists and developers can speak the same language now!". So it is surely easier to producitionize a ML project without rewriting the algorithms after the data scientists work out the model with R or Matlab.


There are more python developers than scala developers. There are more python data scientists than scala data scientists. I like the project, though.


They are more Java developers than python developers :)


I don't know that that's necessarily true. The most recent StackOverflow survey[1] shows a difference of 8%, which is not an overwhelming majority. Granted, that's not an unbiased sample size, but I think the OP above is correct...more data scientists use Python than Java.

So anyone wanting to use this library would have to think about tradeoffs: Are the efficiencies lost in data scientists learning to use Java for modeling worth the efficiencies gained in putting a model in production? For some, the answer may be yes, for some no.

[1]https://stackoverflow.com/insights/survey/2017#technology-pr...


Is there something wrong in your code? I don't see the effect.

smile> val x = Array(1.0, 2.0, 3.0, 4.0)

x: Array[Double] = Array(1.0, 2.0, 3.0, 4.0)

smile> val y = Array(4.0, 3.0, 2.0, 1.0)

y: Array[Double] = Array(4.0, 3.0, 2.0, 1.0)

smile> x + y

res2: smile.math.VectorAddVector = Array(5.0, 5.0, 5.0, 5.0)


I just quoted the example in the webpage (section "Vector Operations"), I didn't run it myself.


Good eyes :) It should be a c&p error.


Just because you can?


My first thought was Why? But after thinking about it this actually could be helpful. There are still a lot of legacy systems that rely on Fortran. Perhaps this framework could help glue a lot of these legacy systems together over a network.


For some reason I've been seeing more and more Fortran lately. Maybe it's the trendy thing to do.


Honestly, High Performance Fortran has really nice support for parallel arrays. For computational programming I see a lot of value.


The issue is what you give up. Yes the FORTRAN compiler/runtime is fast but concurrent primitives weren't added until FORTRAN03, and recursion wasn't fully supported until FORTRAN90 [1]. It hides almost all the implementation details from the programmer.

Fortran is a really good language for number crunching. Academics who don't know how to code can build very fast number crunching tools without really any CS or hardware knowledge. This the target use case and target audience.

If you want to do low level system things with it, it'll be painful.

[1] Some fortran70 compilers had limited support for recursion it wasn't added to the official language standard until 20 years later tho.


>The issue is what you give up.

This is the feature, not a bug. The purpose of fortran and why it is a fast efficient tool for scientists to write software is is exactly that it hides implementation details and is a bad systems language.

It allows you to define the important math parts of your program and hides the implementation part exactly so the compiler has a lot more power to optimize.


I got into it a couple years ago pretty heavily just for kicks and found that it was actually a pretty fun language if you're doing lots of math, especially matrices. I wish people would use it more for libraries, but I would be lying if I said you couldn't write something just as fast in C or C++ with a little bit more work. That being said, its actually a great language for people who just want to spit out some number crunching code.


It is on top of hbase, Cassandra, or accumulo. So yes, it is web scaled.


It is true. Unfortunately, the project was started several years ago and had nothing to do with the startup world. I would like to complain that VCs destroy another nice name with their hypes :(


Okay, everyone is hiding a unicorn in their closet :)


In most graph database, you find a vertex by filtering its properties, e.g. Gremlin graph query language. In Unicorn, you can do the similar with document vertices (it is, a vertex corresponding to a document in another table/collection). This is probably very nature in a business application. However, it is not very useful in your case as your vertices are abstract without any properties.

I guess what you want is some large scale graph analytics, which I suggest Spark GrpahX or other distributed graph computing engine.

Unicorn is designed for property directed multi-graphs.


I would say that what I have is a property-directed multi-graph, as I understand it. It's just that the properties are on the edges, and the nodes have no properties except for their ID.

The graph in question is ConceptNet, which in the version I'm working on has about 10 million edges and 3 million nodes. Let's be clear that, in computing, "million" is not a large number. I only said "large graph" to clarify that it's not a small toy graph. The data needs to be imported with some degree of efficiency. But I have a 3TB hard drive and 16 GB of RAM, and both of them can spare a few gigabytes for this task.

Before you throw me into the tarpit of distributed computing, like every other graph-DB provider does as an excuse for their terrible inefficiency, I would like to know if your graph database is appropriate to use with reasonable-sized graphs that fit easily on a single computer.


Check out this script https://github.com/haifengl/unicorn/blob/master/shell/src/un..., which loads dbpedia graph into unicorn. You should be able to load ConceptNet without minor modifications. Later, you can refer a vertex by its string id.


When adding an edge, the end vertices are assumed existed. In your case, we could add a helper function to import a list of edges, similar to Spark GraphX.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact