How To Convert String To UTF-8 In Java

The string is one of the data types in programming languages which is used commonly. String stores multiple data sets in it. It is widely used to store primary data. Java provides a different package containing methods and other string data libraries. On the other hand, UTF-8 is a web encoding standard that appears in major web applications. This encoding standard defines every character in binary language that helps communicate between different electronic devices and programming languages.

String automatically converts characters into Unicode. But, sometimes it becomes necessary to convert the string into UTF-8. This conversion helps the programmer to connect with a non-Unicode system or establish a connection with external systems that use UTF-8 encoding.

All Unicode characters are supported by the commonly used encoding standard UTF-8, which is also recognised by ASCII. The string can be converted into UTF-8 in Java by using different approaches which will be discussed with code, its output, and method explanation. This will make you confident in “ How to convert string to UTF 8 in Java”.

Why Is There A Need To Convert  String Into UTF-8?

Here are a few of the reasons why there is a need to convert strings to UTF-8 in java.

  1. Communication: String data must be converted to UTF-8 when the programmer is attempting to communicate with an external application or device that uses that encoding format. Because in Java, the format of the string encoding is Unicode by default.
  2. Storage optimization: UTF-8 consumes less storage comparing Unicode characters. UTF-8 usually takes one to four bytes. By converting string data type to UTF-8, we can save the storage and help in storing data without occupying large memory.
  3. Compatibility with web applications: UTF-8 is the default encoding format used by the web and is supported by the latest and updated technologies like web applications, browsers, and servers. Hence, it is necessary to convert the string into UTF-8 as incorrect display of characters and other errors can be faced by users if correction is not done properly. Converting a string to UTF-8 ensures compatibility with web applications and avoids these issues.
  4. Data security: UTF-8 comes with default security features that help to secure data. This helps in ensuring that the data cannot be easily intercepted or manipulated. Encryption and checksums are some major data security features of UTF-8. This feature ensures data security by avoiding unauthorized data manipulation or other malpractices.
  5. More facilities: UTF-8 provides more facilities than Unicode. It helps in saving the storage, is the standard web encode, and helps in the cyber security of the applications. Hence, it is recommended by senior programmers.

These benefits can be achieved by the programmer by converting the string into UTF-8 using different approaches which will be discussed below.

Different Approaches To Convert String To UTF-8 In Java

This section will examine five distinct methods for converting strings in Java to UTF-8. This conversion is implemented using different methods and classes. A detailed explanation, including a code explanation and a sample problem, is provided below. The five approaches are:

  1. Using the getBytes() method
  2. Using the Charset class
  3. Using the OutputStreamWriter class
  4. Using the PrintWriter class
  5. Using the DataOutputStream class

Following that, we will delve into the mentioned approaches in greater depth by providing method explanations, sample code, code explanations, sample problems, their solutions, and so on. This can be used for projects in real life and will aid in a better grasp of the subject. Also helps to understand “How to convert string to UTF 8 in Java”.

Approach 1 : Convert String into UTF-8 by using the getBytes() method

We send a string parameter to the getBytes() method, and getBytes() returns a string with UTF-8 encoding. The getBytes() method is simple and concise to use as most of its structure is pre-defined.

Sample Code:

public class Main {
   public static void main(String args[]) throws Exception {
      // define an string
      String s = "Hello!";

      // use the method for conversion
      byte arr[] = s.getBytes("UTF8");

      // print the output
      System.out.print("UTF-8 code is : ");
      for (byte x: arr) {
         System.out.print(x+" ");
      }
   }
}

Output:

UTF-8 code is : 72 101 108 108 111 33 

Explanation :

  1. Programmer has to define a variable containing a string.
  2. Then pass the string to the getBytes() method.
  3. getBytes() method will convert it into UTF-8.
  4. After conversion, print the output on the compiler.

Approach 2 : Convert String into UTF-8 by using the Charset class

The Charset class in Java defines the java.nio.charset package, which includes numerous methods for encoding character data using various methods, including UTF-8. To achieve the conversion, we create an instance of the Charset Class and use the encode() method and array() method.

Sample Code:

// import the necessary libraries
import java.nio.charset.Charset;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) {
      // define the string variable
      String s = "Hello!";
      
      // achieve conversion using the method
      byte[] utf8Bytes = s.getBytes(Charset.forName("UTF-8"));
      
      // print the output on console
      System.out.println("Input string: " + s);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output:

Input string: Hello!UTF-8 bytes: [72, 101, 108, 108, 111, 33]

Explanation :

  1. Programmer has to define a variable containing a string.
  2. Then pass the string to the  s.getBytes(Charset.forName()) method.
  3. Then the method will convert it into UTF-8.
  4. After conversion, print the output on the compiler.

Approach 3 : Convert String into UTF-8 by using the OutputStreamWriter class

The OutputStreamWriter class in the java.io package contains methods such as write()  and toByteArray()  that can be used to convert a string to UTF-8 encoding. The programmer has to create the instance of the class and apply the method for conversion.

Sample Code:

// import the necessary libraries
import java.io.ByteArrayOutputStream;
import java.io.OutputStreamWriter;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) throws Exception {
       // define the string variable
      String inputString = "Hello!";
      
      // create an instance
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
      OutputStreamWriter writer = new OutputStreamWriter(outputStream, StandardCharsets.UTF_8);
      
      // use the method for conversion
      writer.write(inputString);
      writer.flush();
      byte[] utf8Bytes = outputStream.toByteArray();
      
      // print the output
      System.out.println("Input string: " + inputString + “\n”);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output :

Input string: Hello!
UTF-8 bytes: [72, 101, 108, 108, 111, 33]

Explanation :

  1. Import the Stream and Charset class to implement the methods.
  2. Define the string variable.
  3. Create the instance of the class.
  4. write() method will create a new Character stream writer.
  5. The flush() method will clear the stream if it consists of any character
  6. toByteArray() method will perform the conversion.
  7. Then print the output on the console.

Approach 4 : Convert String into UTF-8 by using the PrintWriter class

In Java, the java.io package consists of PrintWriter class which can be used for the conversion. To convert a string to UTF-8, programmers must first construct a class instance and then print the string to a stream using the print() method. The toByteArray() method is used to perform the conversion in the class and stream.

Sample Code:

// import the necessary libraries
import java.io.ByteArrayOutputStream;
import java.io.PrintWriter;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) throws Exception {
      // Create a string variable
      String inputString = "Hello!";
      
      // Create a class instance
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
      PrintWriter writer = new PrintWriter(outputStream, true, StandardCharsets.UTF_8);
      
      // apply the methods on string variable
      writer.write(inputString);
      writer.flush();
      byte[] utf8Bytes = outputStream.toByteArray();
      
      // print the output on the console
      System.out.println("Input string: " + inputString);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output :

Input string: Hello!
UTF-8 bytes: [72, 101, 108, 108, 111, 33]

Explanation :

  1. Import the Stream and Charset class to implement the methods.
  2. Define the string variable.
  3. Create the instance of the class.
  4. write() method will create a new Character stream writer.
  5. The flush() method will clear the stream if it consists of any character
  6. toByteArray() method will perform the conversion.
  7. Then print the output on the console.

Approach 5 : Convert String into UTF-8 by using the DataOutputStream class

The DataOutputStream class in the java.io package is used to write data to OutputStream. To perform the conversion, create a class DataOutputStream instance and use the writeUTF() method to convert the string into UTF-8 encoding. This will perform the conversion.

Sample Code:

// import the necessary libraries
import java.io.ByteArrayOutputStream;
import java.io.DataOutputStream;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) throws Exception {
      // Create a string variable
      String inputString = "Hello!";
      
      // Create a class instance
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
      DataOutputStream writer = new DataOutputStream(outputStream);
      
    // apply the methods on string variable
      writer.writeUTF(inputString);
      writer.flush();
      byte[] utf8Bytes = outputStream.toByteArray();
      
      // print the output on the console
      System.out.println("Input string: " + inputString + “\n”);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output :

Input string: Hello!
UTF-8 bytes: [0, 6, 72, 101, 108, 108, 111, 33]

Explanation :

  1. Import the Stream and Charset class to implement the methods.
  2. Define the string variable.
  3. Create the instance of the class.
  4. write() method will create a new Character stream writer.
  5. The flush() method will clear the stream if it consists of any character
  6. toByteArray() method will perform the conversion.
  7. Then print the output on the console.

Best Approach Out Of Five for Converting String To UTF-8 In Java

Here, five different approaches are discussed for the programmers to convert a string into UTF-8 encoding. From the five, we recommend the first approach which states “Convert String into UTF-8 by using the getBytes() method”. getBytes() method is the best approach for the conversion as it is straightforward, simple and easy to use in any program.

Here are some points to demonstrate why it is the best approach –

  1. Simple approach – getBytes() method is the simplest method among the five. As no extra library is imported and easy to apply in the code.
  2. Less line of code – Compared with other approaches, the getBytes() method has fewer lines of code which helps to save data and time. Also, less line of code helps in understanding the programmer immediately.
  3. Saves memory – getBytes() method saves the memory in the storage as UTF-8 takes less memory comparing the default encoding Unicode. Also, fewer lines of code take less space than other approaches.
  4. Implemented efficiently – This method is efficient in all ways for conversion. getBytes() method converts Unicode encoding into UTF-8 encoding efficiently without any external liability.
  5. Beginner friendly – This approach is absolutely beginner friendly but also helpful for experienced programmers. It is beginner friendly as it is easy to understand and easy to apply in application development.

This point proves that the getBytes() method is the best approach for the conversion of the string Unicode encoding into UTF-8 encoding. But, programmers have to decide to try and practice all approaches to get more knowledge by doing.

Also, the best approach can be changed according to the problem statement. This will help them to practice “How to convert string to UTF-8 in Java”.

Sample Problems

Sample problem 1:

Convert String into UTF-8 by using the getBytes() method. Display the username on the web by taking the user input as a string. Then print the output on the console with UTF-8 encoding.

Solution :

1.	Programmer has to take the username as a string and save it in a variable.
2.	Then pass the string to the getBytes() method.
3.	getBytes() method will convert it into UTF-8.
4.	After conversion, print the output on the compiler.

Code :

import java.util.Scanner;
public class Main {
   public static void main(String args[]) throws Exception {
      // take the user input
      Scanner sc = new Scanner(System.in);
      System.out.print("Enter a username: ");
      String s = sc.nextLine();

      // use the method for conversion
      byte arr[] = s.getBytes("UTF8");

      // print the output
      System.out.print("UTF-8 code is : ");
      for (byte x: arr) {
         System.out.print(x+" ");
      }
   }
}

Output :

Enter a username: Sujal Jain
UTF-8 code is : 83 117 106 97 108 32 74 97 105 110 

Sample problem 2:

Convert String into UTF-8 by using the Charset class. Get the input from the customer to display his name on online e-commerce websites’ invoices. Then print the output on the console with UTF-8 encoding.

Solution :

1.	Programmer has to get the string from the customer containing a string.
2.	Then pass the string to the  s.getBytes(Charset.forName()) method.
3.	Then the method will convert it into UTF-8.
4.	After conversion, print the output on the compiler.

Code :

// import the necessary libraries
import java.util.Scanner;
import java.nio.charset.Charset;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) {
       // take the user input
      Scanner sc = new Scanner(System.in);
      System.out.print("Enter a customer name: ");
      String s = sc.nextLine();

      
      // achieve conversion using the method
      byte[] utf8Bytes = s.getBytes(Charset.forName("UTF-8"));
      
      // print the output on console
      System.out.println("Customer name is: " + s);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output :

Enter a customer name: Rohit
Customer name is: Rohit
UTF-8 bytes: [82, 111, 104, 105, 116]

Sample problem 3:

Convert String into UTF-8 by using the OutputStreamWriter class. Get the candidate’s name to display his name on the application form of the college and display it on the console with UTF-8 encoding.

Solution :

1.	Import the Stream and Charset class to implement the methods.
2.	Get the candidate's name to display on the application form.
3.	Create the instance of the class.
4.	write() method will create a new Character stream writer.
5.	The flush() method will clear the stream if it consists of any character
6.	toByteArray() method will perform the conversion.
7.	Then print the output on the console.

Code :

// import the necessary libraries
import java.util.Scanner;
import java.io.ByteArrayOutputStream;
import java.io.OutputStreamWriter;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) throws Exception {
      // take the user input
      Scanner sc = new Scanner(System.in);
      System.out.print("Enter a candidate name: ");
      String inputString = sc.nextLine();
      
      // create an instance
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
      OutputStreamWriter writer = new OutputStreamWriter(outputStream, StandardCharsets.UTF_8);
      
      // use the method for conversion
      writer.write(inputString);
      writer.flush();
      byte[] utf8Bytes = outputStream.toByteArray();
      
      // print the output
      System.out.println("Candidate name is: " + inputString + “\n”);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output

Enter a candidate name: Vaishnav
Candidate name is: Vaishnav

UTF-8 bytes: [86, 97, 105, 115, 104, 110, 97, 118]

Sample problem 4:

Convert String into UTF-8 by using the PrintWriter class. Get the city name from the user who wants to participate in the state-level chess competition. Print the city name on the console with UTF-8 encoding.

Solution :

1.	Import the Stream and Charset class to implement the methods.
2.	Get the city name from the participant using Scanner class.
3.	Create the instance of the class.
4.	write() method will create a new Character stream writer.
5.	The flush() method will clear the stream if it consists of any character
6.	toByteArray() method will perform the conversion.
7.	Then print the output on the console.

Code :

// import the necessary libraries
import java.util.Scanner;
import java.io.ByteArrayOutputStream;
import java.io.PrintWriter;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) throws Exception {
      // take the user input
      Scanner sc = new Scanner(System.in);
      System.out.print("Enter your home town: ");
      String inputString = sc.nextLine();
      
      // Create a class instance
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
      PrintWriter writer = new PrintWriter(outputStream, true, StandardCharsets.UTF_8);
      
      // apply the methods on string variable
      writer.write(inputString);
      writer.flush();
      byte[] utf8Bytes = outputStream.toByteArray();
      
      // print the output on the console
      System.out.println("Participant's home town is: " + inputString + “\n”);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output :

Enter your home town: Jaipur
Participant's home town is: Jaipur
UTF-8 bytes: [74, 97, 105, 112, 117, 114]

Sample problem 5:

Convert String into UTF-8 by using the DataOutputStream class. Get the highest qualification of the applicant teacher to apply for the position of school Principal. Display the qualification on the console with the UTF-8 code.

Solution :

1.	Import the Stream and Charset class to implement the methods.
2.	Get the highest qualification from the teacher.
3.	Create the instance of the class.
4.	write() method will create a new Character stream writer.
5.	The flush() method will clear the stream if it consists of any character
6.	toByteArray() method will perform the conversion.
7.	Then print the output on the console.

Code :

// import the necessary libraries
import java.util.Scanner;
import java.io.ByteArrayOutputStream;
import java.io.DataOutputStream;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Main {
   public static void main(String[] args) throws Exception {
      // take the user input
      Scanner sc = new Scanner(System.in);
      System.out.print("Enter your highest qualification: ");
      String inputString = sc.nextLine();
      
      // Create a class instance
      ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
      DataOutputStream writer = new DataOutputStream(outputStream);
      
    // apply the methods on string variable
      writer.writeUTF(inputString);
      writer.flush();
      byte[] utf8Bytes = outputStream.toByteArray();
      
      // print the output on the console
      System.out.println("Your highest qualification is: " + inputString + “\n”);
      System.out.println("UTF-8 bytes: " + Arrays.toString(utf8Bytes));
   }
}

Output :

Enter your highest qualification: M.A. B.Ed. with 35+ years of experience in the education sector.
Your highest qualification is: M.A. B.Ed. with 35+ years of experience in the education sector.
UTF-8 bytes: [0, 60, 77, 46, 65, 46, 32, 66, 46, 69, 100, 46, 32, 119, 105, 116, 104, 32, 51, 53, 43, 32, 121, 101, 97, 114, 115, 32, 111, 102, 32, 101, 120, 112, 101, 114, 105, 101, 110, 99, 101, 32, 105, 110, 32, 101, 100, 117, 99, 97, 116, 105, 111, 110, 32, 115, 101, 99, 116, 111, 114, 46]

Conclusion

Converting a string into UTF-8 encoding is an essential task in Java programming as Java’s default encoding is Unicode but modern web technologies use UTF-8. Also, UTF-8 has many other advantages which are discussed above. It can be helpful in many factors.

Conversion of String into UTF-8 can be done in 5 ways which are already discussed. The getBytes() method is the best approach for the conversion for many reasons mainly, it is simple and straightforward. Also other stream classes in java.nio.charset.Charsets are helpful to convert the string using the write() and toByteArray() methods.

In this section, we discussed converting the string into UTF-8 using various approaches and we delved into the topic using code, their output, and more explanations about each related topic. This information is enough to get a clear image and advanced knowledge of “How to convert string to UTF-8 in Java”.