Free Open Source Language Translator Code [on hold] - java

Can anyone suggest a Free or cheaper open source language translator Code or API which could be used for translations between different languages?
My application needs to translate 1 billion characters each day so google API would be too costly to use as they charge around 20$ per million characters. what are the other options?
Libraries that I have tried
- Google API (too costly)
- Bing API (too costly)
- https://pypi.python.org/pypi/googletrans (not reliable for real time)
Thanks in Advance!

Related

High-quality open-source text-to-speech (TTS) engines written in Java

I'm looking for open-source text-to-speech (TTS) engines written purely in Java. That is no native calls or similar — pure Java only. Ideally with high-quality voices (see quality definition below), but also lower quality alternatives are okay as long as the source is available.
Does such an open source project exist?
By "high-quality" I mean "human sounding", "non-robotic" and with end results roughly on par with these two English language examples: Example 1, Example 2
The only open source java TTS engine I am aware of is MARY which came out of research labs and universities. It has not been active for a while and I have not heard the quality of the output.

Is there a viable handwriting recognition library / program? [closed]

I'm looking to process a bunch of scanned response postcards that have handwritten contact information on them (ie Name, Address, Phone, Email, etc).
I'm curious if there is a viable open-source library or piece of software to do this (ideally Java or R). In looking around a lot of the information is from 2009 or early and isn't very encouraging.
The language is English.
Any suggestions?
EDIT: I've looked at the OCRopus page but the latest version is from May 2009. Anyone have any experience with this or is there a more recent version?
To begin with, as far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/).
Next, you need to specify whether you look for handwritten or handprinted text. If you need handwriting text recognition - i don't beleive you'll be able to solve your tasks because of the reasons stated in other answers.
However, if you need ICR (that stands for intelligent character recognition) for handprinted text (rather clear letters used in surveys, forms, etc.) there could be a solution. While I beleive that tesseract (despite being considered the best among opensource engines) won't do the job for you here, you can look for more accurate SDKs.
Maybe this question would help: Handwritten scanned Doc to .txt File?
I am not aware about any working open source Handwriting recognition library, regardless I am in the OCR space for a while already. Typically handwriting is more difficult than OCR and I would say that there is no even decent commercial solution. All that exist have their own issues and can only work in very narrow applications like when dictionary is limited, text is well-written, etc. If you still interested I would recommend checking technology from french company I2IA
You may want to look at http://code.google.com/p/ocropus/, which is an open-source OCR system.
But, it appears to be written in C++ and python.
*UPDATE: *
Since one of the research projects is a handwritten analyzer I expect it may help.
The OCRopus engine is based on two research projects: a
high-performance handwriting recognizer developed in the mid-90's and
deployed by the US Census bureau, and novel high-performance layout
analysis methods.
And if you look at http://code.google.com/p/ocropus/source/browse/ the source files have been updated since 10/2011 (one of the three was from 3/2012), so it appears to be currently under development still.

Java: Text to Speech engines overview [closed]

I'm now in search for a Java Text to Speech (TTS) framework. During my investigations I've found several JSAPI1.0-(partially)-compatible frameworks listed on JSAPI Implementations page, as well as a pair of Java TTS frameworks which do not appear to follow JSAPI spec (Mary, Say-It-Now). I've also noted that currently no reference implementation exists for JSAPI.
Brief tests I've done for FreeTTS (first one listed in JSAPI impls page) show that it is far from reading simple and obvious words (examples: ABC, blackboard). Other tests are currently in progress.
And here goes the question (6, actually):
Which of the Java-based TTS frameworks have you used?
Which ones, by your opinion, are capable of reading the largest wordbase?
What about their voice quality?
What about their performance?
Which non-Java frameworks with Java bindings are there on the scene?
Which of them would you recommend?
Thank you in advance for your comments and suggestions.
I've actually had pretty good luck with FreeTTS
Google Translate has a secret tts api:
https://translate.google.com/translate_tts?ie=utf-8&tl=en&q=Hello%20World
I've used Mary before and I was very impressed with the quality of the voices. Unfortunately, I haven't used any of the other ones.
I've used AT&T Natural Voices which provides JSAPI and MS SAPI hooks. It provides excellent quality voices, a good "general" speech dictionary, many controls over pronunciation, and multiple languages. It's a little pricey, but works very well.
I used it to read important sensor telemetry to drivers in a mobile sensor application. We had no complaints about the voice quality. It had about 75% out-of-the-box accuracy with scientific terms and a much higher (maybe 90%+) with normal dialogue. We got it up to about 99+% accuracy by using markups (most errors were on scientific terms with unusual phoneme combinations).
It was a bit hard on the processor (we were running on a Pentium-III equivalent machine and it was pushing 50%-75% peak CPU). This uses a native speech engine (Windows, Linux, and Mac compatible) with a Java interface.
There's a huge variety of voices and languages...
Actually, there is not a big choice:
Festival, most old. Written in C++ but has bindings to Java.
eSpeak, qucik and simple, used by Google Translate
mbrola
Pure Java:
FreeTTS, which code was ported from Festival, and then was open-sourced and development was stopped.
MaryTTS - more powerful and looks production ready.
Also there is other proprietary programs like:
Acapella
Nuance Vocalizer
If your software is Windows only, you can use Microsoft Speech API.
I used FreeTTS but had a major problem getting the MBrola voices to run on My MacbookPro. I did get MBrola voices to run on Windows (painfully) and Linux. I've had no luck loading any other voice packages on FreeTTS which is a shame because the supplied voices are horrible IMO. Outside of that I had a little success with Cloudgarden as well but that only runs on Windows AFAIK. I'd be interested to hear others successes/failures with Voice engines as this type of work is particular challenging. I'm also toying a bit with Sphinx4. I just pulled down JVXML (which appears to be based on Sphinx4) last night but could not get it to run for some strange reason.
I've contributed to mary. I feel it has potential if someone smarter than me separated the HMM voices out of the core (those voices don't need large data sets and sound ok). I'm also trying to do a event system to freetts to send events when it says a word. I've had success, but it is broken in linux now. (probably because of a timer bug).
Thanks a lot everyone, the trick is in FreeTTS source. Briefly: if being run as java -jar freetts.jar some-more-args-here, it spells lesser words than when being executed in a manner of bin/Server.jar and bin/Client.jar.
I found little comfortable with MarryTTS It has multilanguage and clear voice to understand.
T convert speech to text, the better optiion is sphinx4-5prealpha.
I give one thumb, because it has adjustable, flexibility and modifiable recognizer and grammer.

Java OCR library recommendations? [duplicate]

This question already has an answer here:
Java OCR implementation [closed]
5 answers
I need to check a tonne of pictures to see if they have a keyword on them. Can anyone recommend a good, reliable OCR library? I'll happily sacrifice speed for accuracy.
There is no pure Java OCR libraries that have something to do with accuracy. Depending on your budget you may choose something that is not purely Java, but can be called from Java:
If you have plenty of time but zero budget - your choice is Tesseract. It is definetely the best among open source
If you have small budget to spend and you only need run this recognition once - Cloud OCR API service would be your best choice. It is based on leading commertial grade OCR engine and offers quite affordable per-project prices. Disclaimer: I work for ABBYY
In case you will need to run this recognition as ongoing process forever, then you may think that it is economically more efficient to purchase dedicated conversion software, for example this one, it has API and can be called from Java too. But there are actually lot of alternatives, if you are prepared to invest some budget in licensing.
If you have plans for recognize not Latin or digit symbols then better way find non java library, but select from some (external) tools and use other ways(1) for get your text.
On Linux I have used cuneiform(2) via command line interface.
command line interface and pipe, for example.
cuneiform have ported on Linux but I don't know about work command line interface for Windows

Java voice recognition

Is there Anyone that has experience with any open source, or relatively cheap voice recognition API for java? I'm pretty much looking for something that will turn spoken words into text.
From the java speech recognition page on sun, it seems that it is something that is rather dead. My requirements is something that at the least runs on linux.
Can anyone recommend something? Pure java would be a bonus, else a linux based solution could be considered. And since this is a home project... the cheaper the better.
Edit
CMU Sphinx
As Amit pointed out CMU Sphinx http://cmusphinx.sourceforge.net/html/cmusphinx.php
My problem is a massive word error rate. Training seems like a project all in itself, I'm hoping to gather some strength to try it this weekend.
IBM ViaVoice
There are news announcements floating around for 2004 about Via Voice being made open source. It seems the news release was premature and that it never happened. VIA Voice was released for linux at some point, but It seems they stopped. All that seems to be left on IBM's website is ViaVoice embedded.
IBM Websphere Voice
I imagine this is why ViaVoice (desktop) seems discontinued. IBM created this commercial solution which will cost allot more than an arm and a leg. And just using it will take the ones you have left, at least after my experience with websphere and their IDE.
Nuance
It seems they still might create products for linux. But I think they got lost and followed IBM into the server market. I'm not that sure about this one, their web-site is not that friendly in finding useful information.
Open Mind / Free Speech
These guys keep changing their project name. Probably some money hungry company keeps threatening them, but I dont know. The project looks a bit dead.
I might try training Sphinx this weekend to see if it wants to be friends. Else worse case, I'll be looking at using Microsoft's speech solution. It has worked well for me in the past, but it's not a great linux solution. I could probably use it through wine, but then I'll have two separate servers... messy messy.
Oh and what seems a good place to visit for voice/speech SpeechTechMag. They have a 'Anual Reference' that has a list of companies that somehow relates themselves to voice/speech.
Mostly Java: http://cmusphinx.sourceforge.net/html/cmusphinx.php
sphinx is by far the best option available if you are on a budget.
however it also makes a huge difference what models you use, how you tune them and how you tune your audio source. absolutely everything has to match otherwise it just wont work. given the problem you described id be willing to bet a substantial sum that you've got you got your models mixed up and your mic is not correctly calibrated. also, if you have an accent it probably will not work - this is not an issue with the decoder but with the acoustic models - if no one with a voice/accent similar to yours was included in the training data you'll get poor results.
that said, have you looked at their open source models page?
http://www.speech.cs.cmu.edu/sphinx/models/
depending on what you are trying to do you should be able to obtain about 90% accuracy on free speech with the 16kHz WSJ models and the gigaword LMs NVP. i caution however that ASR is a massive undertaking and hasn't yet reached commodity status.
you can download vPass (voice password) from http://www.basic-signalprocessing.com.
For (vText) voice to text, i can send the vText.jar file to your email. Pls notify enquiry#basic-signalprocessing.com
The components are designed for Java and .Net language. The recognition period is 5 seconds. VPass is well tested vText is not, still new, that's why not packaged yet.
regards,
Andreas
I have been looking for the same thing for a few days now. So far I have found Sphinx4 and FreeTTS. Both are java implementations and Sphinx seems like it is updated rather frequently unlike FreeTTS. The only problem that I am having is that Sphinx is having problems understanding me in an office environment, and I need a solution for a warehouse environment.
My group finished a mini program in Java to recognize spoken digits using Sphinx.

Resources