## Java: removing multiple occurrences of unprintable characters from group [on hold] - java

### Scan a string for certain characters and count them

In my AP Computer Science class, we have an assignment where I need to input a string (which in this case is always a tweet) and then check to see if it meets the charlimit, see if it is a retweet, and then count the Hashtags and Mentions. I have figured all of this out, but for a Hashtag or mention to be counted, it has to not have a space or a return following it. My current solution is this:
for(int i=0; i < tweetLength; i++) {
if((tweet.charAt(i) == '#')&&((tweet.charAt(i+1) != 0)||(tweet.charAt(i+1) != 32)||(tweet.charAt(i+1) != 13)) ) {
countMentions++;
} if((tweet.charAt(i) == '#')&&((tweet.charAt(i+1) != 0)||(tweet.charAt(i+1) != 32)||(tweet.charAt(i+1) != 13)) ) {
countHashtags++;
} if(((tweet.charAt(i) == 'R')||(tweet.charAt(i) == 'r'))&&((tweet.charAt(i + 1) == 'T')||(tweet.charAt(i + 1) == 't'))&&(tweet.charAt(i + 2) == ':')) {
retweet = true;
}
}
Note, 32, 13, and 0 are ascii values for the Space, Return, and Null (I think, lol) -- I used the numerical values in hopes that it would miraculously solve my problems, but alas, it has not.
It all works fine, but when there is an Ampersand or a Hash sign at the very end of the string, then it returns with this error:
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(String.java:686)
at Main.main(Main.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at edu.rice.cs.drjava.model.compiler.JavacCompiler.runCommand(JavacCompiler.java:272)
I know that this is caused because it is trying to read null, but I can't really find a solution and my teacher is the "Teach via. videos" kind, so not much help.

You need to be careful not to go outside of the bound when iterating over the characters in the string. You can achieve this by ensuring charAt(i) will not fail:
for(int i=0; i < tweetLength; i++)
{
if (i + 1 == tweetLength)
{
// avoid going out of bounds
// when i == last char of the string
break;
}
if (tweet.charAt(i) == '#')
countMentions++;
if (tweet.charAt(i) == '#')
countHashtags++;
if (i + 2 < tweetLength)
{
// search for i+2 only when
// i+2 is not outside of the string
if (((tweet.charAt(i) == 'R') || (tweet.charAt(i) == 'r')) &&
((tweet.charAt(i + 1) == 'T') || (tweet.charAt(i + 1) == 't')) &&
(tweet.charAt(i + 2) == ':'))
{
retweet = true;
}
}
}
As you can see I've added a break statement to exit the loop when we are positioned on the last character. We are also not going to check for ":" when we are on the last but one character from the end.
I hope you get the idea, but just FYI you're code doesn't "work all fine". You have some more corner cases for which you are not testing (e.g: "a b # c") it's not a valid hashtag so you also need to ensure that a valid letter comes after the hashtag sign. There maybe others like "##abc" but you're somehow on the right track so I'll let you do your assignment and not do it for you.
If you want to better understand why the exception was thrown look at the code that throws it:
public char charAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index];
}
Now that I hope you've understand the logic, think about modifying your for like this:
for(int i=0; i < tweetLength - 1; i++)
and removing the first if. We are doing that anyway but just in a more complicated way, right? :)

### (Java ) problems with myString.charAt(i) > “1” or myString.charAt(i) < “A”

I have a car Object which is supposed to have a characteristic. The characteristic is supposed to have the requirements: starts with two capital letter followed by a number from 1-9 followed by 4 numbers from 0-9.
public void writeCharacteristic(){
System.out.println("write down the characteristic for the car.");
String characteristic = kb.nextLine();
progress = false;
if (characteristic.length() != 7){
System.out.println("The string is not 7 letter/numbers long");
progress = false;
}
for(int i = 0; i < 2; ++i){
if (characteristic.charAt(i) < "A" || characteristic.charAt(i) > "Z"){
System.out.println(" character number " + i + " is invalid");
progress = false;
}
}
if (characteristic.charAt(3) < "1" || characteristic.charAt(3) > "9")
progress = false;
for (int j = 3; j < 7; ++j){
if (characteristic.charAt(j) < 0 || characteristic.charAt(j) > 9)
progress =false;
}
if (progress == false){
System.out.println("characteristic will have the value null.");
characteristic = null;
}
if (progress == true)
car.setCharacteristic(characteristic);
}
I'm having a problem at the lines "if (characteristic.charAt(i) < "A" || characteristic.charAt(i) > "Z"){"
The compiler is saying "The operator < is undefined for the argument type(s) char, String"
Any help is highly appreciated, thanks.

In Java, you can compare a character (char) to a character, but you can't compare a character to a String. charAt returns a character, so you must compare its result to a character.
These are String
"A" "Z" "1" "9"
And these are characters
'A' 'Z' '1' '9'
You can compare a character to an integer (int), but the result may not be what you want. So in the code below:
for (int j = 3; j < 7; ++j){
if (characteristic.charAt(j) < 0 || characteristic.charAt(j) > 9)
0 and 9 should be change to '0' and '9'.
Note: There is another unrelated logic error in your code:
String characteristic = kb.nextLine();
progress = false;
Shouldn't progress be set to true here?

I would certainly check out the other answers on this page re. character comparisons. However, I would perhaps suggest a different approach given:
starts with two capital letter followed by a number from 1-9 followed
by 4 numbers from 0-9
and investigate regular expressions. Something like:
[A-Z]{2}[1-9][0-9]{4}
would satisfy the above requirement.

Use single quotes for chars, double quotes for Strings.
characteristic.charAt(3) < '1'
there is meaning for single and double quotes in java
And for your situation best suits is a regex

Replace the double quotes with single quotes.
You'll also have to put single quotes around the numbers when comparing them with chars, even though the compiler doesn't complain.

Compare like this
characteristic.charAt(3) < '1'

First, you can achieve this goal with regexp:
[A-Z]{2}[1-9][0-9]{4}
(Read Pattern article to know how to use it).
If you want to do it as you started - use singleqoutes instead of doublequotes with characters. e.g. "a" -> 'a'.

If you want to assign value to char use single quote. If it is a String use double quote
char myChar='a';
String myString="a";
so
characteristic.charAt(3) < "1" should change as characteristic.charAt(3) < '1'

### How to split this “Tree-like” string in Java regex?

This is the string:
String str = "(S(B1)(B2(B21)(B22)(B23))(B3)())";
Content in a son-() may be "", or just the value of str, or like that pattern, recursively, so a sub-() is a sub-tree.
Expected result:
str1 is "(S(B1))"
str2 is "(B2(B21)(B22)(B23))" //don't expand sons of a son
str3 is "(B3)"
str4 is "()"
str1-4 are e.g. elements in an Array
How to split the string?
I have a fimiliar question: How to split this string in Java regex? But its answer is not good enough for this one.

Regexes do not have sufficient power to parse balanced/nested brackets. This is essentially the same problem as parsing markup languages such as HTML where the consistent advice is to use special parsers, not regexes.
You should parse this as a tree. In overall terms:
Create a stack.
when you hit a "(" push the next chunk onto the stack.
when you hit a ")" pop the stack.
This takes a few minutes to write and will check that your input is well-formed.
This will save you time almost immediately. Trying to manage regexes for this will become more and more complex and will almost inevitably break down.
UPDATE: If you are only concerned with one level then it can be simpler (NOT debugged):
List<String> subTreeList = new ArrayList<String>();
String s = getMyString();
int level = 0;
int lastOpenBracket = -1
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '(') {
level++;
if (level == 1) {
lastOpenBracket = i;
}
} else if (c == ')') {
if (level == 1) {
}
level--;
}
}
I haven't checked it works, and you should debug it. You should also put checks to make sure you
don't have hanging brackets at the end or strange characters at level == 1;

### Removing Characters within a string-Java

I keep getting an error with removing a character from within a string. I have tried everything that i could find on this site and nothing has worked. This is NOT a help post. Rather maybe an answer that explains why this shows up and how to fix it in case someone else encounters this issue. Without further a due, here is my code:
public JTextField Clean()
{
String Cleaner = TopField.getText();
Cleaner=Cleaner.toLowerCase();
int Length = Cleaner.length();
StringBuilder Combiner = new StringBuilder(Cleaner);
for (int x=0;x+1<Length;x++)
{
char c = Cleaner.charAt(x);
char c1 = Cleaner.charAt(x+1);
if(c==' ' && c1==' ')
{
Combiner.deleteCharAt(x);
Cleaner=Combiner.toString();
}
if(c!='a' && c=='b' && c!='c' && c!='d' && c!='f' && c!='g' && c!='h' && c!='i' && c!='j' && c!='k' && c!='l' && c!='m' && c!='n' && c!='o' && c!='p' && c!='q' && c!='r' && c!='s' && c!='t' && c!='u' && c!='v' && c!='w' && c!='x' && c!='y' && c!='z' && c!=' ')
{Combiner.deleteCharAt(x);
Cleaner=Combiner.toString();}
}
TopField.setText(Cleaner);
}
I receive an error that states that My value is out of bounds by the length of the string that i input. Please note that this is a method inside a class that i created that removes any character that is not an alphabet or space.

There are a number of things that pop out at me.
Your basing your loop on a fixed value (Length), but where the actual length of the String can decrease...
You are potentially removing 2 characters per loop (there are two deleteCharAt calls)
The loop doesn't take into account the shrinking size of the String. For example. x == 1, you remove the character at x, you increment x by 1 (x == 2), effectively skipping a character (the character at position 2 is now at position 1
Your if statement is unnecessarily long. In fact, depending on your needs, you could use Character.isDigit or Character.isLetter and Character.isWhiteSpace
String Cleaner = TopField.getText();
Cleaner = Cleaner.toLowerCase();
StringBuilder Combiner = new StringBuilder(Cleaner);
int x =0;
while (x < Combiner.length()) {
char c = Combiner.charAt(x);
if (c >= 'a' && c <= 'z' || c == ' ') {
Combiner.deleteCharAt(x);
} else {
x++;
}
}
From the looks of your code, you appear to wanting to filter a JTextField so it will only allow numeric values. It would be much better to use something like a JSpinner, JFormattedTextField or DocumentFilter and ensure the correctness of the data as it's entered...IMHO

As you remove characters, Cleaner becomes shorter, so you're likely to reach a point where x is too large.
I would suggest a different approach using regular expressions:
string cleaned = TopField.getText().toLowerCase().replaceAll("[^a-z ]", "");

I used a isDigit() function and found the output as incorrect. Look at the code I tested and found problem with the output. Any one explain.
public static void main(String[] args) {
// TODO Auto-generated method stub
String temp="you got 211111 out of 211111?";
StringBuilder cleaner=new StringBuilder(temp);
for(int i=0;i<cleaner.length();i++)
{
char c=cleaner.charAt(i);
if(Character.isDigit(c))
{
cleaner.deleteCharAt(i);
}
}
System.out.println(cleaner);
I am getting output as : you got 111 out of 111?
it is not removing some digits.
Also found that no function called replaceAll() is there in Java.

### Is there a regular expression way to replace a set of characters with another set (like shell tr command)?

The shell tr command support replace one set of characters with another set.
For example, echo hello | tr [a-z] [A-Z] will tranlate hello to HELLO.
In java, however, I must replace each character individually like the following
"10 Dogs Are Racing"
.replaceAll ("0", "０")
.replaceAll ("1", "１")
.replaceAll ("2", "２")
// ...
.replaceAll ("9", "９")
.replaceAll ("A", "Ａ")
// ...
;
The apache-commons-lang library provides a convenient replaceChars method to do such replacement.
// half-width to full-width
System.out.println
(
org.apache.commons.lang.StringUtils.replaceChars
(
"10 Dogs Are Racing",
"0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
"０１２３４５６７８９ＡＢＣＤＥＦＥＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚ"
)
);
// Result:
// １０ Ｄｏｇｓ Ａｒｅ Ｒａｃｉｎｇ
But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]/[０-９Ａ-Ｚａ-ｚ]. Is there a regular expression way to achieve that ?

While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars is relatively simple. The version below accepts simple character classes, without [ or ]; it does not do class negation ([^a-z]).
For your use case, you could do:
StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("０-９Ａ-Ｚａ-ｚ"))
Code:
public static String charRange(String str) {
StringBuilder ret = new StringBuilder();
char ch;
for(int index = 0; index < str.length(); index++) {
ch = str.charAt(index);
if(ch == '\\') {
if(index + 1 >= str.length()) {
throw new PatternSyntaxException(
"Malformed escape sequence.", str, index
);
}
// special case for escape character, consume next char:
index++;
ch = str.charAt(index);
}
if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {
// this was a single char, or the last char in the string
ret.append(ch);
} else {
if(index + 2 >= str.length()) {
throw new PatternSyntaxException(
"Malformed character range.", str, index + 1
);
}
// this char was the beginning of a range
for(char r = ch; r <= str.charAt(index + 2); r++) {
ret.append(r);
}
index = index + 2;
}
}
return ret.toString();
}
Produces:
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
０-９Ａ-Ｚａ-ｚ : ０１２３４５６７８９ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚ

No.
(some extra characters so that SO will allow me to post my otherwise succinct answer)