Android developers and AI enthusiasts are exploring the prospect of running powerful language models like GPT-2 directly on your Android device. The KerasNLP workshop from IO2023 has all the insights one might need to make it happen. Here’s a detailed guide to integrating GPT-2 as an On-Device Machine Learning (ODML) model on Android using KerasNLP.
Why use ODML on Android?
On-device machine learning offers several benefits:
- Latency: No need to wait for server responses.
 - Privacy: Data stays on the device.
 - Offline Access: Works without internet connectivity.
 - Reduced Costs: Lower server and bandwidth costs.
 
Setting up the environment:
The first requirement in setting up an environment is the need for a robust setup on your development machine. Developers need to make sure they have Python installed along with TensorFlow and KerasNLP. Install KerasNLP using:
pip install keras-nlp
Loading and Preparing GPT-2 with KerasNLP
KerasNLP simplifies the process of loading pre-trained models. For the developers’ purposes, they should load GPT-2 and prepare it for ODML.
from keras_nlp.models import GPT2
 model = GPT2.from_pretrained(‘gpt2’)
 
 Fine-tuning GPT-2:
To make the model more relevant for one’s Android application, fine-tuning on a specific dataset is recommended.
# Example of fine-tuning the model
 model.fit(dataset, epochs=3)
Converting the model for Android:
Once the model is fine-tuned, the next step is to convert it into a TensorFlow Lite (TFLite) format, which is optimized for mobile devices.
import tensorflow as tf
 converter = tf.lite.TFLiteConverter.from_keras_model(model)
 tflite_model = converter.convert()
# Save the model to a file
with open(‘model.tflite’, ‘wb’) as f:
 f.write(tflite_model)
Integrating the TFLite model in Android:
Step 1: Add TensorFlow Lite dependency
 Add the TensorFlow Lite library to your build.gradle file.
 implementation ‘org.tensorflow:tensorflow-lite:2.7.0’
Step 2: Load the model in the Android app
Place the model.tflite file in the assets directory and write code to load and run the model using Kotlin.
suspend fun initModel(){
 withContext(dispatcher) {
 val loadResult = loadModelFile(context) // Load the model file
 // Check if loading was successful
 if (loadResult.isFailure) {
 val exception = loadResult.exceptionOrNull()
 return@withContext when (exception) {
 is FileNotFoundException ->
 //Handle FileNotFoundException
 else ->
 //Handle Exception
 }
 }
 // Initialize the interpreter with the loaded model
 val model = loadResult.getOrNull()
 isInitialized = model?.let {
 interpreter = Interpreter(it)
 }
 }
 }
 
 Running inference:
Prepare your input data and call the runInterpreter method to get predictions.
@WorkerThread
 private fun runInterpreter(input: String): String {
 private val outputBuffer = ByteBuffer.allocateDirect(OUTPUT_BUFFER_SIZE)
  // Run interpreter, which will generate text into outputBuffer
 interpreter.run(input, outputBuffer)
 
  // Set output buffer limit to current position & position to 0
 outputBuffer.flip()
 
  // Get bytes from output buffer
 val bytes = ByteArray(outputBuffer.remaining())
 outputBuffer.get(bytes)
 outputBuffer.clear()
 // Return bytes converted to String
 return String(bytes, Charsets.UTF_8)
 }
Final thoughts
Integrating ODML with KerasNLP and TensorFlow Lite can transform one’s Android device into a powerhouse for real-time NLP tasks. Whether it’s for chatbots, language translation, or content generation, the capabilities are now in the palm of your hand.