## Oj-Algo - Matrix exponential - java

### how to traverse a massive dataframe pair-wisely and store the value in a n*n matrix?

```Problem Description:
I have a dataset which is about 35 millons rows and 10 columns .
I want to calculate the distance between two rows, which the distancefunction like distance(row1,row2), and then store the value in a huge matrix.
The operations totally needed are nearly 6*10^15, which i think is very huge.
What I've tried :
upload datafile to HDFS
read data as dataframe
df.collect() and get a array1 :array[Row]
traverse array1 pair-wisely and calculate distance
store the distance(rowi,rowj) in matrix(i,j)
Scala code :
val array1 = df.collect()
val l = array1.length
for(i <-0 until array.length){
for(j <-i+1 until array.length){
val vd = Vectors.dense(i,j,distance(array(i),array(j)))
I want to save each value in Vector like above, and add it to RDD/Dataframe.
But the only way i've searched is by using union.I think it's not good enough.
So there are three questions need to be solved:
collect is an action function, df.collect() will throw Exception
java.lang.OutOf.MemoryError : Java heap space. Can this be avoided?
As soon as i get a distance(rowi,rowj), i want to store it, how?
Can I store the final matrix in HDFS and read it as a matrix in python?
ps: If above all can't be solved, which new idea can i use?
Any answer will help me a lot ,thank you!
```
```Check https://spark.apache.org/docs/latest/mllib-data-types.html#indexedrowmatrix IndexedRowMatrix. An IndexedRowMatrix is similar to a RowMatrix but with meaningful row indices.
You can design your algorithm based in this APi.```

### spelling matching using Jaro-Winkler Distance to calculate the similarity of two strings

```I'm tring to do auto-correct for spelling and using Jaro-Winkler strategy .
I have list of suggestions and the types word is ranked with the suggestion words.
The problem I'm facing, when word "ans"/"anf"/"anr" is types ,"an" is given the heights rank when compared. "and" is way back in the score list . Therefore "ans"/"anf"/"anr" are replaced with "an" instead of "and" .
Any suggestion how should I solve this, or are there any other algorithm to replace "ans"/"anf"/"anr" perfectly with "and" not "an" ?
```
```For general typos, weighting transpositions higher than deletions/additions seems like a good idea.
Assuming your entries are input with a standard keyboard layout(qwerty?), you could do an additional weight based on physical distance between keys. Not sure the best way to do that logically. Off the top of my head, you could create a 2d array containing the keyboard map, and compare actual(pythagorean) distance.
Given a map with "Q"=[0][0], "W"=[0][1], "A"=[1][0], the distance between A->Q would be 1, Q->W = 1, and Q->S = sqrt(2). That should give you something to weight distances with.
There's probably a much cleaner implementation of the distance calculation, but just spitballing here.```

### route in graph with specific length from one point

```I have a method "connection(int n)" which gives me all the cells number that have relation with cell number "n" now I want a method which gives me all the routes with a specific length "myLength" that start from cell number "start" and just in one direction (as it's usual) I mean we are not allowed to pass some cells more than one time
P.S. I can't use map tools, graph tools,... with basic tools please
```
```You are looking for BFS.
Model your problem as a graph G = (V,E) such that V = {1,...,n} [all possible values] and E = { (u,v) | connection(u) returns v } [there is a connection between u and v using your connection() method]
In addition to the standard BFS, you will need to add another stop condition when you reached the limited length.
EDIT:
Note that this solution assumes you are looking for a path up-to length, and not exactly length.
BFS doesn't work here for the counter example of a clique if you want exactly of length.
To get all vertices that have a simple path of exactly length - You will probably need a DFS that avoids loops [can be done by maintaining a set that is modified each iteration], but can explore each vertex more then once.```

### Neural networks and Python

```I'm trying to study about neural networks, following a great guide:
http://neuralnetworksanddeeplearning.com/chap1.html
Currently I've reached this code snippet which I'm trying to understand and write in Java:
class Network(object):
def __init__(self, sizes):
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:-1], sizes[1:])]
I managed to figure out what everything means except for the last line:
[np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
As far as I can understand: create a matrix with y rows and x columns, for each pair x,y which can be found in the matrix zip which is created by the merging of the two "sizes" arrays. I understand that sizes[1:] means taking all elements from sizes starting from index 1, but sizes[:-1] makes no sense to me.
I read online that s[::-1] means getting the reverse of the array, but in the above case we only have one colon, while in the formula for the reverse array there seems to be two colons.
Sadly, I have no idea how Python works and I got pretty far along with the online book to give it up now (I also truly like it), so can someone say if I'm right until now, correct me if needed, or straight out explaining that final line?
```
`sizes[:-1] is a list slice which returns a copy of the sizes list but without the last item.`

### Where can i get a Java implementation of Dijkstra's algorithm? [closed]

```I am looking for a generic Java implementation of Dijkstra's algorithm. I've tried coding this up on my own, but I keep running into problems. If it helps, I know for a fact that the graph is always connected. Does anyone know of such an implementation?
Thanks!
```
```This is totally shameless, but I coded up an implementation of Dijkstra's algorithm using Fibonacci heaps a while back and posted it to my personal website. You can find the code here:
Dijkstra's algorithm
Fibonacci heap
I've tried to comment the code to indicate how the algorithm works, what assumptions it's making, etc., so hopefully it's easy to read and understand. Let me know if there's anything about it I can clarify for you.
Hope this helps!
```
```JGrapht is a common Java library for graphs. dijkstra's algorithm is implemented too.
```
```do you mean this:
(the JAVA implementation can be found at mentioned link (see bottom of this answer)
// initialize d to infinity, π and Q to empty
d = ( ∞ )
π = ()
S = Q = ()
add s to Q
d(s) = 0
while Q is not empty
{
u = extract-minimum(Q)
add u to S
relax-neighbors(u)
}
relax-neighbors(u)
{
for each vertex v adjacent to u, v not in S
{
if d(v) > d(u) + [u,v] // a shorter distance exists
{
d(v) = d(u) + [u,v]
π(v) = u
add v to Q
}
}
}
extract-minimum(Q)
{
find the smallest (as defined by d) vertex in Q
remove it from Q and return it
}
edit: got this from http://renaud.waldura.com/doc/java/dijkstra/
```
```Good resources from Universities
From New York University at
http://www.cs.nyu.edu/~vs667/development/~DijkstraAlgorithm/
From Princton University at http://algs4.cs.princeton.edu/41undirected/Graph.java.html
Also vogella made a nice implementation at http://www.vogella.com/articles/JavaAlgorithmsDijkstra/article.html```