Since the age of artificial intelligence and machine learning, one of the most important applications to which they have been applied is for text detection and image processing, creating the field of computer vision. While computer vision was almost arcane in the old days and developing applications related to the same required much effort and deeper understanding of many mathematical concepts, nowadays it is used almost everywhere, thanks to the development of the famous OpenCV library by Intel.

Most of the applications for image processing can be handled well with the OpenCV library, which is commonly used with Python to handle complex cases, and the source for the library is also available in other platforms. In the context of mobile application development, OpenCV has provided libraries for both Android and iOS.

One of the widely used applications of OpenCV is to detect a document and possibly apply perspective correction to get a clear view of the document. This is quite easy in OpenCV and we could use built-in functions such as getPerspectiveTransform(src, dst). While this works out great when developing native applications, if the app is developed in a cross-platform ecosystem using frameworks such as React Native, this won’t be ideal. This is because we will have to integrate OpenCV in the native source files of the React Native project and write native code to implement the same which is used in React Native. This defeats the entire purpose of cross platform development and requires specialized knowledge in native development which brings us back to square one. Thankfully due to the massive adoption of React Native over the recent years and the open source community we now have alternatives available as plugins to use like react-native-rectangle-scanner. So, let’s see how to use this plugin to implement a simple app in React Native.

Getting Started

Let’s create a sample application to detect a rectangle document and show it to the user with the perspective correction applied as shown above.

Using the react-native-cli tool let’s initialize an empty project. This can also be done using expo and really is up to personal preference. The npx command allows you to use the latest version of the React Native.

npx react-native init RectangleScanner

In the package.json file add the plugin along with two more dependencies which will be used to render the UI icons later for our app as shown below.

..
"dependencies": {
   "react": "16.11.0",
   "react-native": "0.62.2",
   "react-native-rectangle-scanner": "^1.0.10",
   "react-native-svg": "^12.1.0",
   "react-native-vector-icons": "^6.6.0"
	..

Run yarn install to pull in the dependencies added above.

Configuring Android Settings

Now there are some slight additions to be made in the Android source directory to get the icons working properly.

In android/app/src/main/AndroidManifest.xml

Add camera permissions request:

<uses-permission android:name="android.permission.CAMERA" />

Update settings.gradle file as the following to link openCV and Vector Icons.

..
include ':app'
 
include ':react-native-vector-icons'
project(':react-native-vector-icons').projectDir = new File(rootProject.projectDir,
 '../node_modules/react-native-vector-icons/android')
 
include ':openCVLibrary310'
project(':openCVLibrary310').projectDir = new File(rootProject.projectDir,'../node_modules/react-native-rectangle-scanne
r/android/openCVLibrary310')

Diving into Code

We will begin by initializing a stateful React Component and define the propTypes for the props that we will be using later and initialize the state variables. We then utilize the createRef provided by React to initialize a reference and assign it to the camera variable, which will be later used for camera actions.

import { PropTypes } from 'prop-types';
 
export default class DocumentScanner extends React.Component {
 static propTypes = {
   cameraIsOn: PropTypes.bool,
   onLayout: PropTypes.func,
   onPictureTaken: PropTypes.func,
   onPictureProcessed: PropTypes.func
 }
 
 static defaultProps = {
   cameraIsOn: undefined, // Whether camera is on or off
   onLayout: () => { }, // Invokes when the camera layout is initialized
   onPictureTaken: () => { }, // Invokes when the picture is taken
   onPictureProcessed: () => { } // Invokes when the picture is taken and cached.
 }
 
 constructor(props) {
   super(props);
   this.state = {
     flashEnabled: false,
     showScannerView: false,
     didLoadInitialLayout: false,
     detectedRectangle: false,
     isMultiTasking: false,
     loadingCamera: true,
     processingImage: false,
     takingPicture: false,
     overlayFlashOpacity: new Animated.Value(0),
     device: {
       initialized: false,
       hasCamera: false,
       permissionToUseCamera: false,
       flashIsAvailable: false,
       previewHeightPercent: 1,
       previewWidthPercent: 1,
     },
   };
 
   this.camera = React.createRef();
   this.imageProcessorTimeout = null;
 }
..

We define a function called onDeviceSetup() which basically retrieves information from the platform such as whether the device permissions for the camera are set, the aspect ratio when the preview is generated, etc.

 onDeviceSetup = (deviceDetails) => {
   const {
     hasCamera, permissionToUseCamera, flashIsAvailable, previewHeightPercent, previewWidthPercent,
   } = deviceDetails;
   this.setState({
     loadingCamera: false,
     device: {
       initialized: true,
       hasCamera,
       permissionToUseCamera,
       flashIsAvailable,
       previewHeightPercent: previewHeightPercent || 1,
       previewWidthPercent: previewWidthPercent || 1,
     },
   });
 }

To notify various errors that may arise when accessing the camera  getCameraDisabledMessage() is used.

getCameraDisabledMessage() {
   if (this.state.isMultiTasking) {
     return 'Camera is not allowed in multi tasking mode.';
   }
 
   const { device } = this.state;
   if (device.initialized) {
     if (!device.hasCamera) {
       return 'Could not find a camera on the device.';
     }
     if (!device.permissionToUseCamera) {
       return 'Permission to use camera has not been granted.';
     }
   }
   return 'Failed to set up the camera.';
 }

Create a function turnOnCamera() which will open the scanner view. The turnOffCamera() function similarly hides the camera view, and also if the camera view is on but no camera was found in the device after calling the onDeviceSetup() method, it can optionally uninitialize the camera. The turnOnCamera() will be called each time immediately after a view update occurs.

turnOnCamera() {
   if (!this.state.showScannerView) {
     this.setState({
       showScannerView: true,
       loadingCamera: true,
     });
   }
 }
 
 turnOffCamera(shouldUninitializeCamera = false) {
   if (shouldUninitializeCamera && this.state.device.initialized) {
     this.setState(({ device }) => ({
       showScannerView: false,
       device: { ...device, initialized: false },
     }));
   } else if (this.state.showScannerView) {
     this.setState({ showScannerView: false });
   }
 }
 

The turnOnCamera() and turnOffCamera() methods are invoked using lifecycle methods.

The camera will be turned on inside componentDidMount() only after loading the initial layout when multi-tasking mode is off on iOS devices. Otherwise, the turnOffCamera() function will get invoked from componentDidUpdate(). Also, the imageProcessorTimeout timer which will be set at the time of capture failure should be cleared inside the componentWillUnmount() function.

componentDidMount() {
   if (this.state.didLoadInitialLayout && !this.state.isMultiTasking) {
     this.turnOnCamera();
   }
 }
 
 componentDidUpdate() {
   if (this.state.didLoadInitialLayout) {
     if (this.state.isMultiTasking) return this.turnOffCamera(true);
     if (this.state.device.initialized) {
       if (!this.state.device.hasCamera) return this.turnOffCamera();
       if (!this.state.device.permissionToUseCamera) return this.turnOffCamera();
     }
     if (this.props.cameraIsOn === true && !this.state.showScannerView) {
       return this.turnOnCamera();
     }
     if (this.props.cameraIsOn === false && this.state.showScannerView) {
       return this.turnOffCamera(true);
     }
     if (this.props.cameraIsOn === undefined) {
       return this.turnOnCamera();
     }
   }
   return null;
 }
 
 componentWillUnmount() {
   clearTimeout(this.imageProcessorTimeout);
 }

In some Android devices, the aspect ratio of the preview is different than the screen size which may lead to a distorted camera preview. To deal with this issue write a utility function getPreviewSize()which will take the device height and width into account and return an appropriate preview size for the same.

getPreviewSize() {
   const dimensions = Dimensions.get('window');
   // We use set margin amounts because for some reasons the percentage values don't align the camera preview in the center correctly.
   const heightMargin = (1 - this.state.device.previewHeightPercent) * dimensions.height / 2;
   const widthMargin = (1 - this.state.device.previewWidthPercent) * dimensions.width / 2;
   if (dimensions.height > dimensions.width) {
     // Portrait
     return {
       height: this.state.device.previewHeightPercent,
       width: this.state.device.previewWidthPercent,
       marginTop: heightMargin,
       marginLeft: widthMargin,
     };
   }
   // Landscape
   return {
     width: this.state.device.previewHeightPercent,
     height: this.state.device.previewWidthPercent,
     marginTop: widthMargin,
     marginLeft: heightMargin,
   };
 }
 

The function triggerSnapAnimation() is used to show the flash animation when the user captures an image.

triggerSnapAnimation() {
   Animated.sequence([
     Animated.timing(this.state.overlayFlashOpacity, { toValue: 0.2, duration: 100, useNativeDriver: true }),
     Animated.timing(this.state.overlayFlashOpacity, { toValue: 0, duration: 50, useNativeDriver: true }),
     Animated.timing(this.state.overlayFlashOpacity, { toValue: 0.6, delay: 100, duration: 120, useNativeDriver: true }),
     Animated.timing(this.state.overlayFlashOpacity, { toValue: 0, duration: 90, useNativeDriver: true }),
   ]).start();
 }

The capture() function is used to capture the current frame or the identified rectangle region. The loading or processing state will be set at the time of capturing to prevent any further capture triggers.

capture = () => {
   if (this.state.takingPicture) return;
   if (this.state.processingImage) return;
   this.setState({ takingPicture: true, processingImage: true });
   this.camera.current.capture();
   this.triggerSnapAnimation();
 
   // If capture failed, allow for additional captures
   this.imageProcessorTimeout = setTimeout(() => {
     if (this.state.takingPicture) {
       this.setState({ takingPicture: false });
     }
   }, 100);
 }

We will be using the props provided by the plugin to process and cache the image as shown below. Here the image state will be set with the cached image url to be used in the preview.

// The picture was captured but still needs to be processed.
 onPictureTaken = (event) => {
   this.setState({ takingPicture: false });
   this.props.onPictureTaken(event);
 }
 
 // The picture was taken and cached. You can now go on to using it.
 onPictureProcessed = (event) => {
   this.props.onPictureProcessed(event);
   this.setState({
     image: event,
     takingPicture: false,
     processingImage: false,
     showScannerView: this.props.cameraIsOn || false,
   });
 }

Depending on whether the device has a flashlight or not, the renderFlashControl() function will return a flash icon. To use the icons for the UI we will have to import vector icons as follows.

import Icon from 'react-native-vector-icons/Ionicons';
renderFlashControl() {
   const { flashEnabled, device } = this.state;
   if (!device.flashIsAvailable) return null;
   return (
     <TouchableOpacity
       style={[styles.flashControl, { backgroundColor: flashEnabled ? '#FFFFFF80' : '#00000080' }]}
       activeOpacity={0.8}
       onPress={() => this.setState({ flashEnabled: !flashEnabled })}
     >
       <Icon name="ios-flashlight" style={[styles.buttonIcon, { fontSize: 28, color: flashEnabled ? '#333' : '#FFF' }]} />
     </TouchableOpacity>
   );
 }

renderCameraControls() returns the camera capture button along with the flash button only when the loading or processing states are off.

renderCameraControls() {
   const cameraIsDisabled = this.state.takingPicture || this.state.processingImage;
   const disabledStyle = { opacity: cameraIsDisabled ? 0.8 : 1 };
 
   return (
     <>
       <View style={styles.buttonBottomContainer}>
         <View style={styles.cameracontainer}>
           <View style={[styles.cameraOutline, disabledStyle]}>
             <TouchableOpacity
               activeOpacity={0.8}
               style={styles.cameraButton}
               onPress={this.capture}
             />
           </View>
         </View>
         <View>
           {this.renderFlashControl()}
         </View>
       </View>
     </>
   );
 }

The function renderCameraOverlay() is used to conditionally display a loading screen or processing screen and the camera controls.

renderCameraOverlay() {
   let loadingState = null;
   if (this.state.loadingCamera) {
     loadingState = (
       <View style={styles.overlay}>
         <View style={styles.loadingContainer}>
           <ActivityIndicator color="white" />
           <Text style={styles.loadingCameraMessage}>Loading Camera</Text>
         </View>
       </View>
     );
   } else if (this.state.processingImage) {
     loadingState = (
       <View style={styles.overlay}>
         <View style={styles.loadingContainer}>
           <View style={styles.processingContainer}>
             <ActivityIndicator color="#333333" size="large" />
             <Text style={{ color: '#333333', fontSize: 30, marginTop: 10 }}>Processing</Text>
           </View>
         </View>
       </View>
     );
   }
 
   return (
     <>
       {loadingState}
       <SafeAreaView style={[styles.overlay]}>
         {this.renderCameraControls()}
       </SafeAreaView>
     </>
   );
 }

The renderCameraView() function  is rendering either the camera view or a loading state, depending or an error message. Here the allowDetection prop is set true which allows automatic detection of an identifiable rectangular region and then triggers the onDetectedCapture prop where we extract and process the detected document.

You have to import the Scanner and RectangleOverlay from the react-native-rectangle-scanner package.

import Scanner, { RectangleOverlay } from 'react-native-rectangle-scanner';

renderCameraView() {
   if (this.state.showScannerView) {
     const previewSize = this.getPreviewSize();
     let rectangleOverlay = null;
     if (!this.state.loadingCamera && !this.state.processingImage) {
       rectangleOverlay = (
         <RectangleOverlay
           detectedRectangle={this.state.detectedRectangle}
           backgroundColor="rgba(255,181,6, 0.2)"
           borderColor="rgb(255,181,6)"
           borderWidth={4}
           detectedBackgroundColor="rgba(255,181,6, 0.3)"
           detectedBorderWidth={6}
           detectedBorderColor="rgb(255,218,124)"
           onDetectedCapture={this.capture}
           allowDetection
         />
       );
     }
     return (
       <View style={{ backgroundColor: 'rgba(0, 0, 0, 0)', position: 'relative', marginTop: previewSize.marginTop, marginLeft: previewSize.marginLeft, height: `${previewSize.height * 100}%`, width: `${previewSize.width * 100}%` }}>
         <Scanner
           onPictureTaken={this.onPictureTaken}
           onPictureProcessed={this.onPictureProcessed}
           enableTorch={this.state.flashEnabled}
           ref={this.camera}
           capturedQuality={0.6}
           onRectangleDetected={({ detectedRectangle }) => this.setState({ detectedRectangle })}
           onDeviceSetup={this.onDeviceSetup}
           onTorchChanged={({ enabled }) => this.setState({ flashEnabled: enabled })}
           style={styles.scanner}
           onErrorProcessingImage={(err) => console.log('error', err)}
         />
         {rectangleOverlay}
         <Animated.View style={{ ...styles.overlay, backgroundColor: 'white', opacity: this.state.overlayFlashOpacity }} />
         {this.renderCameraOverlay()}
       </View>
     );
   }
 
   let message = null;
   if (this.state.loadingCamera) {
     message = (
       <View style={styles.overlay}>
         <View style={styles.loadingContainer}>
           <ActivityIndicator color="white" />
           <Text style={styles.loadingCameraMessage}>Loading Camera</Text>
         </View>
       </View>
     );
   } else {
     message = (
       <Text style={styles.cameraNotAvailableText}>
         {this.getCameraDisabledMessage()}
       </Text>
     );
   }
   return (
     <View style={styles.cameraNotAvailableContainer}>
       {message}
     </View>
 
   );
 }

Now we can piece this all together to render the final UI. Here, if the image state is set, we will be redirected to the preview page and if not, the camera view will be rendered from which we can capture the image.

render() {
   if (this.state.image) {
     return (
       <View style={styles.previewContainer}>
         <View style={styles.previewBox}>
           <Image source={{ uri: this.state.image.croppedImage }} style={styles.preview} />
         </View>
         <TouchableOpacity style={styles.buttonContainer} onPress={this.retryCapture}>
           <Text style={styles.buttonText}>Retry</Text>
         </TouchableOpacity>
       </View>
     )
   } else {
     return (
       <View
         style={styles.container}
         onLayout={(event) => {
           // This is used to detect multi tasking mode on iOS/iPad
           // Camera use is not allowed
           this.props.onLayout(event);
           if (this.state.didLoadInitialLayout && Platform.OS === 'ios') {
             const screenWidth = Dimensions.get('screen').width;
             const isMultiTasking = (
               Math.round(event.nativeEvent.layout.width) < Math.round(screenWidth)
             );
             if (isMultiTasking) {
               this.setState({ isMultiTasking: true, loadingCamera: false });
             } else {
               this.setState({ isMultiTasking: false });
             }
           } else {
             this.setState({ didLoadInitialLayout: true });
           }
         }}
       >
         <StatusBar backgroundColor="black" barStyle="light-content" hidden={Platform.OS !== 'android'} />
         {this.renderCameraView()}
       </View>
     );
   }
 }
 
 retryCapture = () => {
   this.setState({
     image: null
   });
 }

You might have noticed that there are a lot of custom style references which scaffolds out the views properly which is defined below.

const styles = StyleSheet.create({
 preview: {
   flex: 1, width: null, height: null, resizeMode: 'contain'
 },
 previewBox: {
   width: 350, height: 350
 },
 previewContainer: {
   justifyContent: 'center', alignItems: 'center', flex:1
 },
 buttonBottomContainer: {
   display:'flex', bottom:40, flexDirection:'row', position:'absolute',
 },
 buttonContainer: {
   position: 'relative', backgroundColor: '#000000', alignSelf: 'center', alignItems: 'center', borderRadius: 10, marginTop: 40, padding: 10, width: 100
 },
 buttonGroup: {
   backgroundColor: '#00000080', borderRadius: 17,
 },
 buttonIcon: {
   color: 'white', fontSize: 22, marginBottom: 3, textAlign: 'center',
 },
 buttonText: {
   color: 'white', fontSize: 13,
 },
 cameraButton: {
   backgroundColor: 'white', borderRadius: 50, flex: 1, margin: 3
 },
 cameraNotAvailableContainer: {
   alignItems: 'center', flex: 1, justifyContent: 'center',  marginHorizontal: 15,
 },
 cameraNotAvailableText: {
   color: 'white', fontSize: 25, textAlign: 'center',
 },
 cameracontainer: {
   flex: 1, display: 'flex', justifyContent: 'center',
 },
 cameraOutline: {
   alignSelf: "center", left: 30, borderColor: 'white', borderRadius: 50,
   borderWidth: 3, height: 70, width: 70,
 },
 container: {
   backgroundColor: 'black', flex: 1,
 },
 flashControl: {
   alignItems: 'center', borderRadius: 30, height: 50, justifyContent: 'center', margin: 8, paddingTop: 7, width: 50
 },
 loadingCameraMessage: {
   color: 'white', fontSize: 18, marginTop: 10, textAlign: 'center'
 },
 loadingContainer: {
   alignItems: 'center', flex: 1, justifyContent: 'center'
 },
 overlay: {
   bottom: 0, flex: 1, left: 0, position: 'absolute', right: 0, top: 0,
 },
 processingContainer: {
   alignItems: 'center', backgroundColor: 'rgba(220, 220, 220, 0.7)', borderRadius: 16,height: 140,justifyContent: 'center',width: 200,
 },
 scanner: {
   flex: 1,
 },
});

Conclusion

The field of software engineering is vast and diverse but gets simplified with the passage of time and advancing technology. Every day we see even simpler solutions to problems that were impossible or difficult previously. Writing such an application for Android or iOS earlier would have involved using the native libraries and wiring up much more code for even working out a basic implementation which also requires a fundamental understanding of OpenCV library. This would be much more tedious to maintain across the two platforms.

With the intelligent use of open source libraries and React Native we can scaffold out a relatively robust rectangle detection app without much effort. The is just a simple demonstration using a commonly available plugin for rectangle detection. Keep in mind this is just a barebones implementation of the plugin and while it does the job quite well, diving into the plugin documentation and experimenting should allow you to build up a better version.