
本文共 3684 字,大约阅读时间需要 12 分钟。
Hive Query SQL Guide
Table of Contents
Using Regular Expressions to Specify Columns
In Hive, you can specify columns using regular expressions. For example:
select age, `name.*` from test_stu_3;
If you encounter errors, ensure Hive支持的参数设置如下:
set hive.support.quoted_identifiers=none;
Arithmetic Operators
Hive supports all typical arithmetic operators. Below is a quick reference:
Operator | Type | Description | |
---|---|---|---|
A + B | Number | Sum of A and B | |
A - B | Number | Difference of A and B | |
A * B | Number | Product of A and B | |
A / B | Number | Division of A by B | |
A % B | Number | Remainder of A divides B | |
A & B | Number | Bitwise AND of A and B | |
A | B | Number | Bitwise OR of A and B |
A ^ B | Number | Bitwise XOR of A and B | |
~A | Number | Bitwise NOT of A |
Arithmetic operations accept any numeric type. If the data types differ, the type with the narrower range is cast to the wider range (e.g., INT to BIGINT). Hive does not automatically convert narrower types to wider ones during arithmetic operations.
Mathematical Functions
Hive provides various mathematical functions for data analysis. For instance:
floor()
,round()
, andceil()
accept DOUBLE values and return BIGINT.- Use these functions for type conversion instead of type casting operations.
Aggregate Functions
Aggregate functions operate on multiple rows to produce a single result. Two common examples are COUNT()
and AVG()
.
Table-Generating Functions
Hive provides functions that expand single columns into multiple columns or rows. For example:
explode()
: Converts a string column into multiple rows.posexplode()
: Similar toexplode()
, but requries a specific format.
When to Avoid MapReduce
In Hive, MapReduce tasks are typically triggered for most queries. However, the following scenarios do not require it:
- Selecting all columns without WHERE conditions.
- WHERE conditions only involve partition columns, regardless of LIMIT.
You可以通过设置如下参数优化性能:
set hive.exec.mode.local.auto=true;
Predicates Operators
Like Operator
The LIKE
operator matches strings using specific patterns. The syntax is A LIKE B
, where B
can use %
and _
for wildcards. The comparison is character-by-character, so mismatched characters result in FALSE
.
Example | Result |
---|---|
'abcde' LIKE 'abc' | FALSE |
'abcde' LIKE 'abc__' | TRUE |
'abcde' LIKE 'abc%' | TRUE |
'abcde' LIKE '%abc%' | TRUE |
'abcde' LIKE 'bc%' | FALSE |
'abcde' LIKE '_bc%' | TRUE |
'abcde' LIKE '_b%' | TRUE |
RLike Operator
The RLIKE
operator checks if a regex pattern exists within a string. The syntax is A RLIKE B
, where B
is a valid regex.
Example | Result |
---|---|
'foobar' RLIKE 'oo%' | FALSE |
'foobar' RLIKE 'foo' | TRUE |
'foobar' RLIKE '.oo.*' | TRUE |
Comparing Floating-Point Numbers
Comparing floats in Hive can lead to precision issues due to how numbers are stored in memory. For example, 0.2 cannot be represented exactly as a floating-point number. This can cause unexpected results, like:
SELECT * FROM employeesWHERE deductions['FederalTaxes'] >= 0.2;
might return rows where FederalTaxes
is 0.2 due to precision differences.
To avoid issues:
DOUBLE
.SELECT cast(FederalTaxes as FLOAT) >= 0.2 AS boolean;
Performance Optimization
To avoid MapReduce tasks:
- Use
SELECT *
on tables without filters. - Use partition columns in
WHERE
clauses. - Set
hive.exec.mode.local.auto=true;
for local execution mode.
Examples of MapReduce Scenarios
- Queries without partition filters.
- Combinations of filters on non-partition columns with
LIMIT
.
Conclusion
Hive provides powerful SQL capabilities, including support for regular expressions, arithmetic operations, and various functions. By understanding these features and optimizing query execution, you can maximize efficiency and avoid unnecessary MapReduce tasks.
发表评论
最新留言
关于作者
